Language Technology Project (5LN706), 7.5hp
Course Syllabus: 5LN706
News
- 2012-03-07: Course homepage is on-line
(Preliminary) Schedule
The overall goal of this course is to independently carry out work related to a scientific research project. The students are advised to actively participate in the project and to interact with members of the project. There are no lectures scheduled within this course but the following meetings are planned with responsible teachers and relevant project supervisors:
Datum |
Time | Place |
Description |
|---|---|---|---|
| 2012-03-13 | 10-12 | 9-2029 | Introduction and Motivation |
| 2012-03-29 | 15-16 | 9-2029 | Progress meeting |
| 2012-04-17 | 10-12 | 9-2029 | Progress meeting |
| 2012-05-08 | 10-12 | 9-2029 | Progress meeting |
| 2012-05-31 | 10-12 | 9-2029 | Seminar with project presentations |
| 2012-06-17 | Deadline for project reports |
Furthermore, there will be meetings with supervisors on a regular basis.
Intended Learning Outcomes
In order to pass the course, a student must be able to- independently carry out work related to the goals of the overall project
- independently and creatively identify and formulate research
questions and issues related to the project,
plan and carry out and evaluate a chosen sub-project in a timely manner using adequate and sound methods,
thus contributing to the scientific development of the project goals - give an overview over research touched by the project, describe the current state-of-the-art in this subject and identify issues that are most relevant for future developments (according to the research community)
- present and discuss the goals, contributions and motivations of the project
Examination and Grading Criteria
The course is examined by means of three assignments:- Project report: A detailed scientific report describing the contributions to the project
- Popular science report: A report describing the outcome in a way that is understandable by a wider audience
- Presentation: A presentation describing the project including an popular science introduction/overview
Project Proposals
Course projects this year will be related to the OPUS project. Indivudal projects should be related to one of the following three tasks.- Identification and correction of OCR-related errors in OpenSubtitles
- Tasks:
- develop methods for identifying possible OCR-errors
- develop methods for correcting errors
- support various languages (completely language-independent)
- Examples:
Alright, I'il count to three. I'il get you a new set. THE BALTlC STATES 1919/20missing token boundaries:Ijust call it believin' in myself I squeezedyou, and I heldyou Tincque qualificar aquests exàmens. - Challenges: Some misspellings are intentional:
Jåg vill hå dig. Ni hår bådå boxåts i Philådelphiå. Ni kån reglernå. lngå lågå slåg.... but not "lngå" in:Se upp med huvudenå. lngå stångningår.
- Tasks:
- Visualization and annotation of parallel treebanks
- Task: develop a graphical tool for visualization and correction of aligned and syntactically annotated parallel corpora (sentence/word-aligned and dependency trees)
- Challenges:
- various formats (parse information, sentence alignment, word alignment)
- graphical representation (a prototype exists)
- user interface/management etc
- Mining parallel data from WikiSource
- Task: Develop tools to mine parallel data from open content (WikiSource)
- Challenges:
- identify parallel documents
- remove extra content (non-parallel parts)
- convert and align (using existing tools)
Links
Writing popular science reports:- http://henrikbranden.se/vetenskapsskribent/att-skriva-popularvetenskap/
- http://www.lth.se/forskning/popularvet
- The OPUS project: a collection of parallel corpora and tools
- Open texts at the Internet Archive
- The EU Bookshop
- Projekt Runeberg
- Projekt Gutenberg
- Another Movie Subtitle Collection
