Researchers at the University of Tartu Institute of Computer Science have added Livonian, Komi, Veps and 14 other low-resource Finno-Ugric languages to Neurotõlge, the University’s machine translation engine. Most of these languages are added to a public translation engine for the first time, as they are not part of Google Translate and similar services.
In total, the translation engine supports 23 Finno-Ugric languages: in addition to the more commonly supported Estonian, Finnish and Hungarian, it now includes Livonian, Võro, Proper Karelian, Livvi Karelian, Ludian, Veps, Northern Sami, Southern Sami, Inari Sami, Skolt Sami, Lule Sami, Komi, Komi-Permyak, Udmurt, Hill Mari and Meadow Mari, Erzya, Moksha, Mansi and Khanty.
The research group now invites the speakers and researchers of these languages to contribute corrected translations to improve translation quality. This can be done by editing translations here at translate.ut.ee. Texts like poems, articles, books and others in these languages are also of great help and can be sent to ping@tartunlp.ai.
Lisa Yankovskaya, a Research Fellow in Natural Language Processing at the University of Tartu Institute of Computer Science, explained that feedback is needed to improve the translation quality because many of these languages have extremely scarce resources for creating such translation systems. “This means two things -- first, the translation quality can vary a lot, and it can be especially low when translating into low-resourced languages. Secondly, we need the help of speakers of those languages via contributing correct translations on our platform,” noted Yankovskaya.
“There are several reasons for developing machine translation for low-resource languages. For example, philologists and other interested parties need the translation from these languages to understand texts, folklore, etc., without learning the language. Translating into these languages is a way of preserving endangered languages and supporting the speakers,” said Yankovskaya. She added that this is why our translation system is unrestricted and open to all users, and the software and created models are open-source.
“We started working with Finno-Ugric languages in 2021, with the first system supporting Võro, Northern Sami and Southern Sami,” said Maali Tars, Scientific Programmer at the Institute of Computer Science. According to her, we added Livonian, an extremely endangered language with just about 20 near-native speakers in the same year. In the future, we will continue to improve the quality of the current machine translation system and intend to include more Finno-Ugric languages and dialects.
This collaboration was done with the Livonian Institute at the University of Latvia, Võro Institute, the University of Eastern Finland, Karelian language revitalization programme of the University of Eastern Finland and the Arctic University of Norway.
The work is funded by the National Programme of Estonian Language Technology.