Open-domain Quantity Queries on Web Tables: Annotation, Response, and Consensus Models
published: Oct. 7, 2014, recorded: August 2014, views: 1765
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Over 40% of columns in hundreds of millions of Web tables contain numeric quantities. Tables are a richer source of structured knowledge than free text. We harness Web tables to answer queries whose target is a quantity with natural variation, such as net worth of zuckerburg, battery life of ipad, half life of plutonium, and calories in pizza. Our goal is to respond to such queries with a ranked list of quantity distributions, suitably represented. Apart from the challenges of informal schema and noisy extractions, which have been known since tables were used for non-quantity information extraction, we face additional problems of noisy number formats, as well as unit specifications that are often contextual and ambiguous.
Early "hardening" of extraction decisions at a table level leads to poor accuracy. Instead, we use a probabilistic context free grammar (PCFG) based unit extractor on the tables, and retain several top-scoring extractions of quantity and numerals. Then we inject these into a new collective inference framework that makes global decisions about the relevance of candidate table snippets, the interpretation of the query's target quantity type, the value distributions to be ranked and presented, and the degree of consensus that can be built to support the proposed quantity distributions. Experiments with over 25 million Web tables and 350 diverse queries show robust, large benefits from our quantity catalog, unit extractor, and collective inference.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !