MaltParser 0.23

This is the home page for MaltParser, Version 0.23, a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. Besides instructions for downloading the system for various platforms, this page also contains a user guide.

Relation to older versions:

Download MaltParser 0.23

MaltParser 0.23 can be downloaded as binaries and is available on three platforms: The software can be used freely for non-commercial research and educational purposes. It comes with no warranty, but we welcome all comments, bug reports, and suggestions for improvements.

MaltParser 0.23 uses libTimbl, part of TiMBL (Tilburg Memory-Based Learner), Version 5.1, and LIBSVM, Version 2.8, in order to learn parsing models from treebanks, and we gratefully acknowledge the use of these software packages. However, MaltParser 0.23 is a standalone application, so there is no need to install either TiMBL or LIBSVM separately.

Shortcut

For users who only want to have a decent robust dependency parser (and who are not interested in experimenting with different parsing algorithms, learning algorithms and feature models), we provide pretrained parsing models for a selection of languages: These models can be used together with MaltParser to get a running parser without having access to training data from a treebank. Note, however, that the parser presupposes that the input has been segmented and tagged for parts-of-speech in accordance with the format of the original treebank (see details for each language).

User Guide for MaltParser 0.23

This is a short user guide for MaltParser 0.23, a data-driven parser that uses dependency-based syntactic representations and treebank-induced classifiers to guide the parser at nondeterministic choice points. The guide includes basic instructions for running the system, including a specification of the option file used to control the system's behavior, and is followed by a short presentation of the parsing methodology implemented in the system. More information about parsing algorithms, learning algorithms and feature models can be found in the following publications:

Running MaltParser

The parser can be run in two basic modes, learning (inducing a parsing model from a treebank) and parsing (using the parsing model to parse new data). In the current version of the parser, new data must be tokenized and part-of-speech tagged in the Malt-TAB format. Regardless of mode, MaltParser is normally run by executing the following command at the command line prompt:

> ./maltparser -f file

where file is the name of an option file, specifying all the parameters needed. From version 0.22 it is also possible to specify options using command line flags, which will override any settings included in the option file. (Although it is possible to run the program using only flags and no option file, it is usually more convenient to combine the two methods.) The option file and the flags are described in detail below. A list of all flags and options can be obtained by running MaltParser with the -h flag:

> ./maltparser -h

Option File

The option file contains a sequence of parameter specifications with the following simple syntax:

$PARAMETER$
VALUE

In addition, the option file may contain comment lines starting with "--". The following table lists all the available parameters with their permissible values. Default values are marked with "*". Parameters that lack a default value must be specified in the option file (if they are required by the particular configuration of modules invoked). An example option file can be found here.

In the following table, we describe the different options that can be specified in the option file. Options that have no default value must be specify. For each option we also give the flag that can be used for specifying the option directly in the command line (overriding any specifications in the option file).

Global Parameters Flag Description Values Description
INFILE -i Input file Filename The input (for both learning and parsing) must be in the Malt-TAB format. During learning the four columns form, postag, head, deprel are required; during parsing only the first two (form, postag) are required. An example input file can be found here.
OUTFILE -o Output file Filename
OUTFORMAT -O Output data format TAB
MALTXML*
TIGERXML
Malt-TAB
Malt-XML
TIGER-XML
CHARSET -C Character set ISO-8859-1*
UTF-8
...
NB: The user must verify that the character encoding in the input file conforms to the specification of this option. The parser does not check that the encoding is correct and only escapes the characters ", ', &, <, > if the output format is MALTXML or TIGERXML.
VERBOSE -v Output to terminal YES*
NO
MAXSENTENCELENGTH -z Maximum number of tokens per sentence IntegerDefault = 512
MAXTOKENLENGTH -y Maximum number of characters per token IntegerDefault = 256
Tagset Parameters Flag Description Values Description
MAXTAGLENGTH -w Maximum number of characters per tag name IntegerDefault = 128
POSSET -P Part-of-speech tagset Filename The part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines). An example file can be found here.
DEPSET -D Dependency type tagset Filename The dependency type tagset must be specified in a text file with one tag per line (and no blank lines). An example file can be found here.
ROOTLABEL -R Dependency type label used for roots String Default = ROOT
Parser Parameters Flag Description Values Description
MODE -m Mode (learning or parsing) PARSE* Parsing (using an induced model to parse new data)
LEARNLearning (inducing a model from treebank data)
ALGORITHM -a Parsing algorithm
(see description below)
NIVRE*
COVINGTON
Nivre (2003, 2004)
Covington (2001) (incremental)
PARSEROPTIONS -p Parser options (algorithm specific) -a [ES]Arc order (NIVRE):
E(ager), S(tandard)

NB: In flag, whitespace is replaced by underscore (e.g. "-a_E" instead of "-a E").

-g [NP]Graph condition (COVINGTON):
N(on-Projective), P(rojective)
Guide Parameters Flag Description Values Description
MAXFEATURES -c Maximum number of features of each type IntegerDefault = 30
FEATURES -F Feature model specification
(see description below)
Filename Model specified in Filename.par

NB: If no feature model specification can be loaded, a default specification equivalent to m3.par is used for parsing.

Learner Parameters Flag Description Values Description
LEARNER -l Learner type
(see description below)
MBL*
SVM
Memory-based learning (TiMBL)
Support vector machine (LIBSVM)
LEARNEROPTIONS -L Parameter settings (learner specific) String TiMBL example: "-m M -k 5 -w 0 -d ID -L 3" (see TiMBL Documentation)

NB: In flag, whitespace is replaced by underscore (e.g. "-m_M_-k_5" instead of "-m M -k 5").

LIBSVM example: "-t 0" (see LIBSVM Documentation)

Inductive Dependency Parsing

MaltParser can be characterized as a data-driven parser-generator. While a traditional parser-generator constructs a parser given a grammar, a data-driven parser-generator constructs a parser given a treebank. MaltParser is an implementation of inductive dependency parsing, where the syntactic analysis of a sentence amounts to the derivation of a dependency structure, and where inductive machine learning is used to guide the parser at nondeterministic choice points. This parsing methodology is based on three essential components:
  1. Deterministic parsing algorithms for building dependency graphs.
  2. History-based feature models for predicting the next parser action.
  3. Discriminative machine learning to map histories to parser actions
Given the restrictions imposed by these components, MaltParser has been designed to give maximum flexibility in the way components can be varied independently of each other. We now describe the functionality for each of the components in turn.

Parsing Algorithms

Any deterministic parsing algorithm compatible with the MaltParser architecture has to operate with the following set of data structures, which also provide the interface to the feature model: An algorithm builds dependency structures incrementally by updating HEAD and DEP, but it can only add a dependency arc between the top of the stack (STACK[0]) and the next input token (INPUT[0]) in the current configuration. (The context stack CONTEXT is therefore only used by algorithms that allow non-projective dependency structures, since unattached tokens under a dependency arc are ruled out in projective dependency structures.) MaltParser provides two basic parsing algorithms, each with two options:

Feature Models

MaltParser uses history-based feature models for predicting the next action in the deterministic derivation of a dependency structure, which means that it uses features of the partially built dependency structure together with features of the (tagged) input string. More precisely, features are defined in terms of the word form (LEX), part-of-speech (POS) or dependency type (DEP) of a token defined relative to one of the data structures STACK, INPUT and CONTEXT, using the auxiliary functions HEAD, LC, RC, LS and RS. A feature model is defined in an external feature specification with the following syntax:

<fspec>  ::= <feat>+ 

<feat>   ::= <lfeat> | <nlfeat>

<lfeat>  ::= LEX \t <dstruc> \t <off> \t <suff> \n

<nlfeat> ::= (POS|DEP) \t <dstruc> \t <off> \n

<dstruc> ::= (STACK|INPUT|CONTEXT)

<off>    ::= <nnint> \t <int> \t <nnint> \t <int> \t <int>

<suff>   ::= <nnint> 

<int>    ::= (...|-2|-1|0|1|2|...)

<nnint>  ::= (0|1|2|...)

As syntactic sugar, any <lfeat> or <nlfeat> can be truncated if all remaining integer values are zero. An example feature specification can be found here. Each feature is specified on a single line, consisting of at least two tab-separated columns. The first column defines the feature type to be lexical (LEX), part-of-speech (POS), or dependency (DEP). The second column identifies one of the main data structures in the parser configuration, usually the stack (STACK) or the list of remaining input tokens (INPUT), as the ``base address'' of the feature. (The third alternative, CONTEXT, is relevant only together with Covington's algorithm in non-projective mode.) The actual address is then specified by a series of ``offsets'' with respect to the base address as follows:

Let us consider a few examples:
POS     STACK   0       0       0       0       0
POS     INPUT   1       0       0       0       0
POS     INPUT   0       -1      0       0       0
DEP     STACK   0       0       1       0       0
DEP     STACK   0       0       0       -1      0
The feature defined on the first line is simply the part-of-speech of the token on top of the stack (TOP). The second feature is the part-of-speech of the token immediately after the next input token in the input list (NEXT), while the third feature is the part-of-speech of the token immediately before NEXT in the original input string (which may not be present either in the INPUT list or the STACK anymore). The fourth feature is the dependency type of the head of TOP (zero steps down the stack, zero steps forward/backward in the input string, one step up to the head). The fifth and final feature is the dependency type of the leftmost dependent of TOP (zero steps down the stack, zero steps forward/backward in the input string, zero steps up through heads, one step down to the leftmost dependent). Using the syntactic sugar of truncating all remaining zeros, these five features can also be specified more succintly:
POS     STACK
POS     INPUT   1
POS     INPUT   0       -1
DEP     STACK   0       0        1
DEP     STACK   0       0        0       -1
The only difference between lexical and non-lexical features is that the specification of lexical features may contain an eighth column specifying a suffix length n. By convention, if n = 0, the entire word form is included; otherwise only the n last characters are included in the feature value. Thus, the following specification defines a feature the value of which is the four-character suffix of the word form of the next left sibling of the rightmost dependent of the head of the token immediately below TOP.
LEX     STACK   1       0        1       1       -1      4
Finally, it is worth noting that if any of the offsets is undefined in a given configuration, the feature is automatically assigned a null value.

Feature Model Examples

Some of the features regularly used in feature models are depicted below. Red features are lexical features (LEX); blue features are part-of-speech features (POS); and green features are dependency features (DEP).

The following table shows three of the models provided with Version 0.1 of MaltParser (there called MBL2, MBL3 and MBL4 because MBL was the only learner type supported in that version). For each model we also give a link to the feature specification for that model.

Models Top Next T N TH TL TR NL TH TL TR NL L1 L2 L3 Feature specification
M2 +++++++m2.par
M3 +++++++++m3.par
M4 +++++++++++m4.par

Learning Algorithms

Inductive dependency parsing requires a learning algorithm to induce a mapping from parser histories, relative to a given feature model, to parser actions, relative to a given parsing algorithm. MaltParser comes with two different learning algorithms, each with a wide variety of parameters: