Binary Embedding: Fundamental Limits and Fast Algorithm
published: Dec. 5, 2015, recorded: October 2015, views: 1717
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in Sp−1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δ uniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m=Ω(1δ2logN) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity m=O(1δ2logN) and near linear running time O(plogp) whenever logN≪δp√, with a slightly worse running time for larger logN; (3) we also provide an analytic result about embedding a general set of points K⊆Sp−1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !