Using linguistic information as features for text categorization

author: Arturo Montejo Ráez, University of Jaén
published: Nov. 26, 2007,   recorded: September 2007,   views: 4823
Categories

Slides

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Delicious Bibliography

Description

We report on some experiences using linguistic information as additional features in a classical Vector Space Model[10]. Extracted information of every word like the Part Of Speech and stem, lexical root have been combined in different ways for experimenting on a possible improvement in the classification performance and on several algorithms, like SVM [3], BBR [] and PLAUM [6]. Automatic Text Classification, or Automatic Text Categorization as is also known, tries to related documents to predefined set of classes. Extensive research has been carried out on this subject [11] and a wide range of techniques are appliable to solve this task: feature extraction [5], feature weighting, dimensionality reduction [4], machine learning algorithms and more. Besides, the classification task can be either binary (one out of two possible classes to select), multi-class (one out of set of possible classes) or multi-label (a set of classes from a larger set of potential candidates). In most cases, the latter two can be reduced to binary decisions [1], as the used algorithm does in our experiments [8]. In order to verify the contribution of the new features, we have combined them to be included into the vector space model by preprocessing the Reuters- 215781 collection, a well known set of data by the research community devoted to text categorization problems [2].

See Also:

Download slides icon Download slides: mmdss07_raez_uli_01.pdf (3.2 MB)


Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: