Prague hosts machine translation marathon

09-03-2010

Prague’s Charles University recently hosted an unusual marathon which tested the capacity of various machine translating systems. The annual event is part of the Euromatrix project, which aims to establish machine translation systems for all European languages. The participants had a week to translate some 12,000 sentences from various newspapers and news sites. In the coming weeks their output will be confronted with translations done by professional „human” translators. Ruth Fraňková spoke to Ondřej Bojar from the Institute of Formal and Applied Linguistics, which is taking part in the Euromatrix project:

Ondřej BojarOndřej Bojar “The project involves, if I am correct, seven universities and two companies, and we are one of those universities. Our main focus is translation from English to Czech and also from Czech to English, but we prefer Czech as the target language. We are working with the deep syntactic representations of the sentence which means that we aim at a translation where linguistics is applied. We have built collections of hundreds of thousands of sentences that are manually annotated with the syntactic representation of the sentence and we are now transferring the knowledge we have about Czech syntax into English.”

Is Czech a difficult language compared to other European languages?

“Czech has some specific properties that make it particularly difficult for translation, for example from English, and the difference between Czech and English is the rich morphology in Czech. While in English you have just a single form of a word, say green for the colour, in Czech you have seven cases, four genders and two numbers. Not all these combinations are different on the surface level but the number of possible Czech word forms is much higher and the system has to choose a correct one so this is a challenge.”

What about word order?

“The word order actually helps us when we are translating from English to Czech because Czech allows nearly any permutation of words in the sentence as a correct word order provided that the case markings and things like that are correct. When we are translating back from Czech into English and the Czech is produced by native speakers the situation is much more difficult. You have to identify where is the subject, where is the verb, where is the object, and these have to be in the canonical English order otherwise the sentence wouldn’t be comprehensible for a speaker of English.”

What is the future of machine translation systems? Do you think they can replace humans?

“I do believe that machine translation systems can replace humans in case of repetitive texts. For example weather reports were translated from English to French already in the 1970s. Now I think we are moving towards European legislation and I estimate that 60 percent of the texts or even more can be automatically translated with no human intervention.”

09-03-2010