Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI)

author: Jasminka Dobša, Faculty of Organization and Informatics, Varazdin, University of Zagreb
published: Feb. 25, 2007,   recorded: November 2003,   views: 10063
Categories

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Delicious Bibliography

Description

Information retrieval in the vector space model is based on literal matching of terms in the documents and the queries. The model is implemented by creating the term-document matrix, which is formed on the base of frequencies of terms in documents. Literal matching of terms does not necessarily retrieve all relevant documents. Synonymy (multiple words having the same meaning) and polysemy (words having multiple meaning) are two major obstacles for efficient information retrieval. Latent semantic indexing (LSI) and concept indexing (CI) are information retrieval techniques embedded in the vector space model, which address the problem of synonymy and polysemy. The method of LSI is an information retrieval technique using a low-rank singular value decomposition (SVD) of the term-document matrix. Although the LSI method has empirical success, it suffers from the lack of interpretation for the low-rank approximation and, consequently, the lack of controls for accomplishing specific tasks in information retrieval. The method of CI uses centroids of clusters or so-called concept decomposition (CD) for lowering the rank of the term-document matrix. Here we compare SVD/LSI and CD/CI in terms of matrix approximations and precision of information retrieval.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 dpn, May 7, 2007 at 9:36 a.m.:

Hi, is it possible to get a direct link to the stream? so that those of us not crippled with windows can attempt to get them working in something else? (like vlc)
cheers!

dpn


Comment2 Ned, January 28, 2009 at 4:39 p.m.:

It is impossible to understand what is going on in this lecture. There is a baby making noises and the speaker is drowned out by a buzzing noise of some recording equipment. Any value it has is lost because it is so badly recorded.


Comment3 andraz, May 4, 2009 at 11:59 p.m.:

Indeed, sound is so badly recorded it should probably be removed. Or maybe adding original slides would help


Comment4 Andrew Polar, November 2, 2011 at 3:56 p.m.:

You are wrong. Synonymy and polysemy is not the major problem. People run multiple queries with little variation of terms and get what they want. The major problem is common words in documents that make specific terms simply lost. The patent on database and patent on windshield wipers contain more common words than different. The other major problem is inability to distinguish documents by the meaning of what they say. Inventor needs to find prior art and even perfect search engine returns 2000 documents and they all are about same inventions, they all have same correct terms.
In my experiments LSA and PSLA show no advantage over Hierarchical Clustering or Naive Bayes. Details on semanticsearchart.com

Write your own review or comment:

make sure you have javascript enabled or clear this field: