Language data for digital natives: old wine in a new bottle or...?
published: Dec. 2, 2011, recorded: November 2011, views: 3833
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
If the eighties brought the first extensive use of digitized dictionaries for linguistic querying, and the nineties were dedicated to collecting and exploring huge amounts of language data in digital format, such as corpora, lexicons, ontologies, lexical databases etc., the first decade in the new century saw the explosion of freely available (crowd-sourced) web contents such as Wikipedia and online dictionaries, every day use of NLP technologies, and the first move towards the abandonment of paper as the primary medium of written language transmission. At the beginning of the present decade it is therefore reasonable to ask what will fulfil the persisting human need to understand difficult parts of one's native language, or contribute to maintaining a common language standard. On the multilingual side, there is an equally important need to communicate with people or understand texts in languages other than one's own, the area where free web content and freely available statistical machine translation tools made a considerable step forward in recent years. While dictionaries on paper will continue to have an important role in the digitally underdeveloped environments, it is clear that in those parts of the world where access to the internet and mobile telephony is beginning to be understood as one or the basic human rights, paper format may be abandoned. Consequently, it is necessary to conceptualize a new format which will satisfy the same needs, but will deliberately break away from the 18th- and 19th-century dictionary concept and the codex format. We will try to guesstimate what the new format could be, taking into account language data and NLP technologies already available, as well as the maturing technologies. The format will be conceptualized as an interactive web portal where reliable information on all aspects of a particular language is available – an "all-about".
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !