17th International Semantic Web Conference (ISWC), Monterey 2018

TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications

author: Christoph Lofi, Delft University of Technology (TU Delft)
published: Nov. 22, 2018, recorded: October 2018, views: 2287

Slides

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Lecture popularity: You need to login to cast your vote.

Description

Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities – e.g. "WebKB", "StatSnowball", etc. – are rare, often relevant only in specific knowledge domains, but are yet important for retrieval and exploration purposes. State-of-the-artNER approaches employ supervised machine learning models, trained on expensive type-labeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs.This paper presents an iterative approach for training NER and NET classifiers for long-tail entity types in scientific publications that relies on minimal human input, namely a small seed set of instances for the targeted entity type. We introduce different strategies for training data extraction, semantic expansion, and result entity filtering. We evaluate our approach on scientific publications, focusing on the long-tail entities typesDatasets, Methods in computer science publications, and Proteins in biomedical publications.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

Comment:
Name:
Email address:
URL:

make sure you have javascript enabled or clear this field: