Arvi Johannes Hurskainen (born January 25, 1941, in Kitee) is a Finnish scholar of language technology and linguistics. Since 1985, he has developed rule-based language technology mainly for Swahili, but also for other languages, including machine translation from English to Finnish. He has created a development environment called SALAMA (acronym for Swahili Language Manager), but it suits to any language. The major applications developed so far include the following: the spell checker for Swahili,[1] the annotator of corpus texts,[2] an advanced dictionary between Swahili and English [3] and translators [4] from Swahili to English, from English to Swahili, and from English to Finnish. He has also developed an advanced learning system for Swahili[5] and a system for producing targeted vocabularies for language learners.[2] Hurskainen has compiled two annotated corpora, Helsinki Corpus of Swahili 1.0 and Helsinki Corpus of Swahili 2.0.[6]
Arvi Hurskainen | |
---|---|
Born | Kitee, Finland | January 25, 1941
Nationality | Finnish |
Known for | developing SALAMA, a computational environment for language technology |
Scientific career | |
Fields | Linguistics Language technology, Machine translation |
Institutions | University of Helsinki |
Doctoral advisors | Juha Pentikäinen and Marja-Liisa Swantz |
Study and work history
editHe first studied theology at the University of Helsinki. Later, after having worked in Tanzania, he studied anthropology and published his PhD dissertation Cattle and Culture. The Structure of a Pastoral Parakujo Society.[7] In 1976, he worked as a researcher in Jipemoyo Project, sponsored by the Academy of Finland in Tanzania, and in 1977–1980, in the service of the Finnish Lutheran Mission in Helsinki.
Hurskainen worked at the University of Helsinki, first as a lecturer in 1981–1989 and then as a professor in 1989–2006. In between, in 1984–1985, he worked at Tumaini University in Tanzania. Before the university career, he worked in Tanzania for eight years in various teaching tasks. He was the director of the Department of Asian and African Studies in 1999–2001. He retired in 2006.
In 1988–1992, he directed the fieldwork project Swahili Language and Folklore, sponsored by the Ministry of Foreign Affairs, Finland and the University of Dar-es-Salaam. The project produced the speech corpus DAHE (Dar-es-Salaam - Helsinki), which was later digitized.
Language technology
editHurskainen has developed language technology by making use of detailed language analysis. The basic description of language is made using the finite-state transducers, first developed by Kimmo Koskenniemi. The individual words are then disambiguated using constraint grammar technology. Also, the syntactic mapping is performed in this phase. Disambiguation and syntactic mapping are performed using Constraint Grammar 3.0, originally developed by Fred Karlsson and implemented by Pasi Tapanainen from Connexor.[8]
The rule-based approach developed by Hurskainen has similarities with other rule-based systems, such as Grammatical Framework[9] and Nooj.[10] Rule-based approaches to language technology, especially as they apply to machine translation, are considered suitable for low-resource languages with rich morphology, such as Bantu languages.[11]
Production
editWeb material
editReferences
edit- ^ "Zana za Uhakiki za Microsoft Office 2013 – Swahili". Microsoft. Retrieved 16 April 2018.
- ^ a b Hurskainen, Arvi. "Tagger". Retrieved 16 April 2018.
- ^ Hurskainen, Arvi. "Dictionary". Retrieved 16 April 2018.
- ^ Hurskainen, Arvi. "Translator". Retrieved 16 April 2018.
- ^ Hurskainen, Arvi. "Learn Swahili". Retrieved 16 April 2018.
- ^ "Hcs2-group | Kielipankki".
- ^ Suomen professorit 1640–2007. Jyväskylä: Professoriliitto. 2008.
- ^ "Natural Knowledge". Connexor. 2011–2016. Retrieved 16 April 2018.
- ^ "GF – Grammatical Framework - A programming language for multilingual grammar applications". GF – Grammatical Framework. Retrieved 16 April 2018.
- ^ "A Linguistic Development Environment". NooJ. Retrieved 16 April 2018.
- ^ Hurskainen, Arvi. 2018. Sustainable language technology for African languages. In Agwuele, Augustine and Bodomo, Adams (eds), The Routledge Handbook of African Linguistics, 359-375. London: Routledge Publishers. ISBN 978-1-138-22829-0
- ^ Hurskainen, Arvi. "Welcome to Salama". Retrieved 25 June 2018.
Salama (Swahili Language Manager) is an environment for language technology applications. All applications in Salama make use of rule-based language technology, started in 1985.
- ^ Hurskainen, Arvi. "Technical reports on LT". Salama - Swahili Language Manager. Retrieved 25 June 2018.