Annotated sets of compounds

Annotated compounds in German and Swedish. The zipped file contains four compound data sets, two in German and two in Swedish. In all cases the sets are running words from Europarl, annotated with compounds in two manners.

  • 1-to-1: compounds only annotated if the parts are in 1-to-1 corresponednce with the English Europarl translation (see Koehn and Knight, EACL 2003).
  • No suffix: compounds annotated based on linguistic intuition.
Nouns, adjectives, verbs and adverbs have been annotated as compounds when relevant.

For references see:

