Preserving Metadata from Parliamentary Debates
author: Alina Karakanta,
Department of Language Science and Technology, Saarland University
published: May 30, 2018, recorded: May 2018, views: 585
released under terms of: Creative Commons Attribution (CC-BY)
published: May 30, 2018, recorded: May 2018, views: 585
released under terms of: Creative Commons Attribution (CC-BY)
Slides
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, it is often the case that extra-textual information about speakers or the original language of the sentences is absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish. The paper documents all necessary (pre- and post-) processing steps for creating such a valuable resource. In addition to the parallel corpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !
Write your own review or comment: