Slovene and the South Slavic Language Family - the Obligation of Expressing Duality and Other Stories from the South

author: Simon Krek, Artificial Intelligence Laboratory, Jožef Stefan Institute
published: July 28, 2016,   recorded: May 2016,   views: 1589
Categories

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Delicious Bibliography

Description

Slovene is one of the rare languages exhibiting the category of the dual grammatical number, and the only Indo-European one among the official languages of European Union (the other being Maltese from the Semitic family). Being a morphologically rich language, which is a common characteristic of Slavic languages, this unique feature contributes no less than 613 tags to the tagset of 1,902 possible combinations commonly used for PoS-tagging of Slovene (Erjavec and Krek 2008). In the tagset, the dual can be attributed to verbs, nouns, pronouns, adjectives and numerals. In the standard variant of the language Slovene speakers are obliged to use the dual in all cases where two objects, people or other entities are referred to. They are, therefore, confronted both with possibilities and obligations that are not available to speakers of other languages. We will explore some of the more interesting ones. We will also place Slovene in the wider context of South Slavic languages and address the issue of natural language processing of very similar languages (e.g. Ljubešić and Kranjčić). Since the breakup of the former Federal Republic of Yugoslavia where Serbo-Croatian, Slovene and Macedonian were defined as official languages in the constitution, four official standards originating from the former Serbo-Croatian (Bosnian, Croatian, Montenegrin and Serbian) are now used in the newly-formed states. Discriminating between these standards is not an easy task and poses some interesting challenges to the natural language processing community, taking into account also rather sensitive socio-linguistic aspects of the situation.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: