Big Data Clustering

author: Anil K. Jain, Department of Computer Science and Engineering, Michigan State University
published: Jan. 28, 2013,   recorded: November 2012,   views: 15087
Categories

See Also:

Download slides icon Download slides: clusteringtalk_notredame.pptx (14.4 MB)


Help icon Streaming Video Help

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Delicious Bibliography

Description

The goal of data clustering is to organize a set of n objects into k clusters such that objects in the same cluster are more similar to each other than objects in different clusters. Clustering is one of the most popular tools for data exploration and data organization that has been widely used in almost every scientific discipline that collects data. Given the exponential growth in data generation (estimated to be over 35 trillion gigabytes by the year 2020), clustering is receiving renewed interest and use in applications such as social networks, image retrieval, web search and gene expression analysis. In this talk I will introduce the data clustering problem and discuss the challenges and opportunities in the research on large-scale clustering, with the focus on two main issues: (i) how to define pairwise similarity between objects? and (ii) how to efficiently cluster hundreds of millions of objects? I will present our recent work in approximation of the well known kernel k-means clustering algorithm. I show both analytically and empirically that the performance of approximate kernel k-means is similar to that of the kernel k-means algorithm, but with significantly lower run-time complexity and memory requirements.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Darlene, December 15, 2023 at 5:24 p.m.:

Putting a collection of n objects into k clusters with a greater degree of similarity between them than between them is the aim of data clustering. See: https://fencingwellington.co.nz

Write your own review or comment:

make sure you have javascript enabled or clear this field: