Semi-Supervised Learning in Gigantic Image Collections

author: Rob Fergus, New York University (NYU)
published: Jan. 19, 2010, recorded: December 2009, views: 4044

Slides

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Lecture popularity: You need to login to cast your vote.

Description

With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. "Clean labels'' can be manually obtained on a small fraction, "noisy labels'' may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. We combine this with a label sharing framework obtained from Wordnet to propagate label information to classes lacking manual annotations. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images with 74 thousand classes.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

1 Vlad, October 6, 2010 at 10:24 a.m.:

Indeed, very interesting approach! It is interesting to see that Graph Laplacian method is of use on very large datasets. We used Laplacian Eigenmap on our videos and faced exactly the problem addressed here - scalability.
The only point to worry - how this class separability will impact the performance on real datasets? We are surely having class overlap using at least our description of images being classified.
Anyway, good work!

Write your own review or comment:

Comment:
Name:
Email address:
URL:

make sure you have javascript enabled or clear this field: