Principles for the Manual Revision of Syntactic Annotation in SUC

The syntactic annotation was manually corrected in the gold standard section of the SUC part of the Swedish Treebank according to the trees in the Talbanken part. This means that the functional categories were checked and corrected according to the MAMBA annotation scheme (Teleman, 1983), and the structural categories were corrected according to the derived structural categories in the Talbanken part.

In the revision work two approaches were used: i) sentences with frequent errors, possible to find automatically, were identified and corrected (a transversal revision over error types) and ii) sentence by sentence was checked and corrected (a longitudinal approach). We started with the first approach. However, in addition to the identified error type, also the remaining part of the sentence was checked and corrected. When the most frequent error types were checked, the revision work continued sentence by sentence.

Frequent error types were:

Error types like these often indicated more errors higher up in the trees, i.e., closer to the root node. In addition to the error types concerning grammatical functions above, we have also checked the expansions of the syntactic categories, e.g., that an NP or a PP has plausible daughters.

Principles for Structural Categories

In the principles below we have in general described which functional categories a phrase minimally must contain in order to be of a given phrase type. Thus, in case of S a minimal requirement is a finite verb (with one exception). This means that a phrase of type S of course could contain other categories, such as a subject, a direct object, etc. However, such an extended list has not been made here.