Machine Translation of Syllabi and Study Programmes
Project overview
The project is part of a major initiative taken by Uppsala University to store all syllabi and course programmes in a common database, the SELMA course database, and to make them available both in Swedish and English. The machine translation project aims at providing a translation support for use by SELMA administrators. The support includes a spell checking function and a translation function. Having corrected writing and spelling errors marked by the word control function, the administrator invokes the translation function. It returns an English translation that has to be checked and accordingly revised. After proper revision, the edited version of the translation is stored in SELMA and in the translation memory of the system. The translation proceeds segment (sentence/head line) by segment, and the next time a segment appears its translation will be retrieved from the memory. The first time a segment appears in a syllabus, it will be translated by means of the machine translation function. Revisions made by the users and stored in the memory will gradually improve the translation quality of the system. Initially, the memory will be loaded with manual translations that were available at the start of the project.
The translation function is based on research on machine translation that has been carried out at the department for more than ten years. The translation system has a linguistically inspired transfer-based back-bone consisting of three main modules: analysis, transfer and generation. Each module has its own language resources in terms of dictionaries and grammars. The main part of the project has been devoted to tuning the system for the translation of syllabi by building dictionaries and grammars for this domain. The dictionaries are based on available syllabi and should cover the full set of syllabi that was delivered to the project by October 2006. Translations of words and phrases for the dictionary are retrieved from human translations, where available, and from various electronic resources. All in all, the project corpus consists of 3 950 syllabi in Swedish. Human translations were provided for quite a few of them but in most cases the translations are incomplete. By January 2007, the domain dictionary comprises 27 579 lexical units with translations, distributed over sub-domains of the three disciplinary domains of the university (Humanities and Social Sciences, Medicine and Pharmacy, and Science and Technology), and the Faculty of Educational Science. A thousand words are still missing in the translation dictionary. Also the grammars should cover the syntactic structures present in the project corpus. This is a very high demand in view of the structural variation. To compensate for shortcomings in grammatical coverage, the system makes use of various strategies, among them ways of using partial parsings (analyses). After the completion of the dictionary by March 2007, user training and evaluation will follow before the system will be launched for daily use.
Participants
Prof. Anna Sågvall HeinPer Weijnitz
Eva Pettersson
Ebba Gustavii
Data
Reports
Pettersson, Eva. 2005. Pilotstudie om maskinöversättning inom ramen för Projekt Kursdatabas – Utveckling av språkliga resurser för ett vetenskapsområde [Pilot Study on Machine Translation in the Project Course Database – The Development of Language Resources for One Disciplinary Domain]. Uppsala University. Department of Linguistics and Philology.
Pettersson, Eva. 2007. Kursplanestatistik [Quantitative data on syllabi]. Uppsala University. Department of Linguistics and Philology.
Pettersson, Eva & Gustavii, Ebba. 2007. Specifikation för utprovning av Kursplaneöversättaren [Specification of the Evaluation of the Syllabus Translator]. Uppsala University. Department of Linguistics and Philology.
Sågvall Hein, Anna. 2005. Kursdatabas – Projekt maskinöversättning [Course Database – Project Machine Translation]. Uppsala University. Department of Linguistics and Philology
Sågvall Hein, Anna, Weijnitz, Per, Pettersson, Eva & Gustavii, Ebba. 2007. Inför leverans av maskinöversättningstjänst - Kursplaneöversättaren [In View of the Delivery of a Machine Translation Service – The Syllabus Translator]. Uppsala University. Department of Linguistics and Philology
Weijnitz, Per. 2007. Användning av kursplaneöversättaren [Using the Syllabus Translator]. Uppsala University. Department of Linguistics and Philology.
Events
Department
of Lingustics and Philology, Uppsala
University, Sweden
Visiting address: Engelska parken, Humanistiskt
centrum, Thunbergsvägen 3
Postal address: Department of Linguistics and Philology, Box 635,
SE-751 26 Uppsala, Sweden.
E-Mail: info@lingfil.uu.se
Telephone: +46 (0)18 471 22 52
Fax: +46 (0)18 471 10 94
