Software by Eva Forsbom
Tagger models
Tagger models: Various part-of-speech tagger models for Swedish for the TnT and HunPos taggers.
BaseVocabulary
BaseVocabulary: A package with scripts for creating a base vocabulary pool from a lemmatised and categorised corpus, including a Swedish and an English pool.
BaseModel
BaseModel: A package with a wordform-baseform mapper, sample server and client scripts, and a set of Swedish models. (Try a demo.)
MT Quality Evaluation Toolbox
MT Quality Evaluation Toolbox: A Java program for evaluation of translation quality, and meta-evaluation of evaluation measures.
parole2xml.pl
parole2xml.pl: Perl script for converting SUC2.0-PAROLE file(s) into XML compliant file(s)). Short usage information is included in the head of the file, or printed via standard output with parole2xml.pl -h. (New version 2008-03-12.)
parole2xml.pl.v0.011: Previous version, depending on dbcentx.mod from DocBook.
xces2r.xsl
xces2r.xsl: XSLT template for extracting token info from XCES file (output of parole2xml.pl -h, but should work for similarly formatted XCES files as well). Short usage information is included in the head of the file. (Originally for input into R, hence the name.) New (refactored) version 2009-05-10 (added switches).
partition_suc.pl
partition_suc.pl: Perl script for making 10 different selections of 10 divisions from the output of xces2r.xsl. The selections are randomly chosen, given the number of tokens in each text, so that the divisions contain approximately the same number of tokens. Short usage information is included in the head of the file, or printed via standard output with partition_suc.pl -h.
An extract of the output (the first set) is given in cross_validation_sets.txt. This is the set I have used for some of my experiments
regex.pl
regex.pl: Perl script for checking if a string (and any information given in extra fields) is covered by any of the Perl regular expression submitted (with any information given in extra fields). Short usage information is included in the head of the file, or printed via standard output with regex.pl -h.
Examples
- Case 1: Only strings and regular expressions (no other fields):
- Regular expressions
- http://stp.lingfil.uu.se/~evafo/software/regex.dat
- Strings to check
- http://stp.lingfil.uu.se/~evafo/software/strings.dat
-
Check if the strings are covered by any regular expression (print the ones that are):
regex.pl -r regex.dat strings.dat -
Check if the strings are covered by any regular expression (print the ones that are not):
regex.pl -n -r regex.dat strings.dat
- Case 2: Strings and regular expressions with other field(s):
- Regular expressions
- http://stp.lingfil.uu.se/~evafo/software/regexF.dat
- Strings to check
- http://stp.lingfil.uu.se/~evafo/software/stringsF.dat
-
Check if the strings (with information) are covered by any regular expression (with information) (print the ones that are):
regex.pl -r regexF.dat stringsF.dat -
Check if the strings (with information) are covered by any regular expression (with information) (print the ones that are not):
regex.pl -n -r regexF.dat stringsF.dat
