The complexity of learning halfspaces using generalized linear methods

author: Amit Daniely, Einstein Institute of Mathematics, The Hebrew University of Jerusalem
published: July 15, 2014,   recorded: June 2014,   views: 2421
Categories

Slides

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Delicious Bibliography

Description

Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a set of functions. These methods offer the currently best approach to several central problems such as learning half spaces and learning DNF’s. In addition they are widely used in numerous application domains. Despite their importance, there are still very few proof techniques to show limits on the power of these algorithms.

We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin γ. Let D be a distribution over labeled examples. The γ-margin error of a hyperplane h is the probability of an example to fall on the wrong side of h or at a distance ≤γ from it. The γ-margin error of the best h is denoted Errγ(D). An α(γ)-approximation algorithm receives γ,ϵ as input and, using i.i.d. samples of D, outputs a classifier with error rate ≤ α(γ)Errγ(D)+ϵ. Such an algorithm is efficient if it uses poly(1γ,1ϵ) samples and runs in time polynomial in the sample size.

The best approximation ratio achievable by an efficient algorithm is O(1/γlog(1/γ)√) and is achieved using an algorithm from the above class. Our main result shows that the approximation ratio of every efficient algorithm from this family must be ≥ Ω(1/γ poly(log(1/γ))), essentially matching the best known upper bound.

See Also:

Download slides icon Download slides: colt2014_daniely_methods.pdf (1.3 MB)


Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: