Denna webbplats fungerar bättre i en webbläsare som stöder webbstandards, men är tillgänglig i alla webbläsare och andra typer av Interntapparater.

UPPSALA UNIVERSITY : Dept of Linguistics and Philology : Computational Linguistics
Uppsala universitet
Hoppa över länkar


  Computational Linguistics homepage


    Project overview

  Participants

  Data

  Reports

  Events

  Internal pages

  På svenska



Machine Translation of Syllabi and Study Programmes



Project overview

The project is part of a major initiative taken by Uppsala University to store all syllabi and course programmes in a common database, the SELMA course database, and to make them available both in Swedish and English. The machine translation project aims at providing a translation support for use by SELMA administrators. The support includes a spell checking function and a translation function. Having corrected writing and spelling errors marked by the word control function, the administrator invokes the translation function. It returns an English translation that has to be checked and accordingly revised. After proper revision, the edited version of the translation is stored in SELMA and in the translation memory of the system. The translation proceeds segment (sentence/head line) by segment, and the next time a segment appears its translation will be retrieved from the memory. The first time a segment appears in a syllabus, it will be translated by means of the machine translation function. Revisions made by the users and stored in the memory will gradually improve the translation quality of the system. Initially, the memory will be loaded with manual translations that were available at the start of the project.

The translation function is based on research on machine translation that has been carried out at the department for more than ten years. The translation system has a linguistically inspired transfer-based back-bone consisting of three main modules: analysis, transfer and generation. Each module has its own language resources in terms of dictionaries and grammars. The main part of the project has been devoted to tuning the system for the translation of syllabi by building dictionaries and grammars for this domain. The dictionaries are based on available syllabi and should cover the full set of syllabi that was delivered to the project by October 2006. Translations of words and phrases for the dictionary are retrieved from human translations, where available, and from various electronic resources. All in all, the project corpus consists of 3 950 syllabi in Swedish. Human translations were provided for quite a few of them but in most cases the translations are incomplete. By January 2007, the domain dictionary comprises 27 579 lexical units with translations, distributed over sub-domains of the three disciplinary domains of the university (Humanities and Social Sciences, Medicine and Pharmacy, and Science and Technology), and the Faculty of Educational Science. A thousand words are still missing in the translation dictionary. Also the grammars should cover the syntactic structures present in the project corpus. This is a very high demand in view of the structural variation. To compensate for shortcomings in grammatical coverage, the system makes use of various strategies, among them ways of using partial parsings (analyses). After the completion of the dictionary by March 2007, user training and evaluation will follow before the system will be launched for daily use.

Participants

Prof. Anna Sågvall Hein
Per Weijnitz
Eva Pettersson
Ebba Gustavii

Data

Reports

Pettersson, Eva. 2005. Pilotstudie om maskinöversättning inom ramen för Projekt Kursdatabas – Utveckling av språkliga resurser för ett vetenskapsområde [Pilot Study on Machine Translation in the Project Course Database – The Development of Language Resources for One Disciplinary Domain]. Uppsala University. Department of Linguistics and Philology.

Pettersson, Eva. 2007. Kursplanestatistik [Quantitative data on syllabi]. Uppsala University. Department of Linguistics and Philology.

Pettersson, Eva & Gustavii, Ebba. 2007. Specifikation för utprovning av Kursplaneöversättaren [Specification of the Evaluation of the Syllabus Translator]. Uppsala University. Department of Linguistics and Philology.

Sågvall Hein, Anna. 2005. Kursdatabas – Projekt maskinöversättning [Course Database – Project Machine Translation]. Uppsala University. Department of Linguistics and Philology

Sågvall Hein, Anna, Weijnitz, Per, Pettersson, Eva & Gustavii, Ebba. 2007. Inför leverans av maskinöversättningstjänst - Kursplaneöversättaren [In View of the Delivery of a Machine Translation Service – The Syllabus Translator]. Uppsala University. Department of Linguistics and Philology

Weijnitz, Per. 2007. Användning av kursplaneöversättaren [Using the Syllabus Translator]. Uppsala University. Department of Linguistics and Philology.

Events


Department of Lingustics and Philology, Uppsala University, Sweden
Visiting address: Engelska parken, Humanistiskt centrum, Thunbergsvägen 3
Postal address: Department of Linguistics and Philology, Box 635, SE-751 26 Uppsala, Sweden.
E-Mail: info@lingfil.uu.se
Telephone: +46 (0)18 471 22 52
Fax: +46 (0)18 471 10 94