MaltParser 1.0.0

MaltParser 1.0.0 is now available. This is a complete reimplementation of MaltParser in Java, released under an open source license, which replaces the older versions 0.1-0.4 found below.

MaltParser 0.4

This is the home page for MaltParser, Version 0.4, a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. Besides instructions for downloading the system for various platforms, this page also contains a user guide.

MaltParser 0.4 in the CoNLL 2007 Shared Task

MaltParser 0.4 was used in the multilingual track of the CoNLL 2007 Shared Task in the systems that obtained the first and fifth best overall scores. The fifth best system was a single-parser system, called Single Malt, while the top scoring system was an ensemble system, called Blended, incorporating six incarnations of MaltParser.

The systems are described in Hall et al. (2007). More information is available at:

MaltParser 0.4 in the CoNLL-X Shared Task

MaltParser 0.4 was used in the CoNLL-X Shared Task on multi-lingual dependency parsing in the system that obtained the second best overall score, not significantly worse than the best score, and that achieved top results for nine languages out of thirteen (with results significantly better than any other system for Japanese, Swedish and Turkish). In this system, MaltParser was combined with pseudo-projective parsing, which requires preprocessing of training data and post-processing of parser output (Nivre and Nilsson 2005). The complete system is described in Nivre et al. (2006). More information is available at:

Relation to older versions

Download MaltParser 0.4

MaltParser 0.4 can be downloaded as binaries and is available on three platforms: The software can be used freely for non-commercial research and educational purposes. It comes with no warranty, but we welcome all comments, bug reports, and suggestions for improvements.

MaltParser 0.4 uses libTimbl, part of TiMBL (Tilburg Memory-Based Learner), Version 5.1, and LIBSVM, Version 2.8, in order to learn parsing models from treebanks, and we gratefully acknowledge the use of these software packages. However, MaltParser 0.4 is a standalone application, so there is no need to install either TiMBL or LIBSVM separately.

Pretrained Memory-Based Parsers for Swedish and English (Malt-TAB format)

For users who only want to have a decent robust dependency parser (and who are not interested in experimenting with different parsing algorithms, learning algorithms and feature models), we provide pretrained parsing models for a selection of languages, based on the memory-based version of MaltParser: These models, originally developed for MaltParser 0.2, can be used together with MaltParser 0.4 to get a running parser without having access to training data from a treebank. Note, however, that the parser presupposes that the input has been segmented and tagged for parts-of-speech in accordance with the format of the original treebank (see details for each language). Note also that these parsers can only handle the Malt-TAB format (not the CoNLL-X shared task format).

Pretrained SVM Parsers for Swedish, English and Chinese (Malt-TAB format)

We now also have pretrained parsers using support vector machines for three languages: Note, again, that the parser presupposes that the input has been segmented and tagged for parts-of-speech in accordance with the format of the original treebank (see details for each language) and that these parsers can only handle the Malt-TAB format (not the CoNLL-X shared task format).

User Guide for MaltParser 0.4

This is a short user guide for MaltParser 0.4, a data-driven parser that uses dependency-based syntactic representations and treebank-induced classifiers to guide the parser at nondeterministic choice points. The guide includes basic instructions for running the system, including a specification of the option file used to control the system's behavior, and is followed by a short presentation of the parsing methodology implemented in the system. More information about parsing algorithms, learning algorithms and feature models can be found in the following publications:

Running MaltParser

The parser can be run in two basic modes, learning (inducing a parsing model from a treebank) and parsing (using the parsing model to parse new data). In the current version of the parser, new data must be tokenized and part-of-speech tagged in the Malt-TAB format. Regardless of mode, MaltParser is normally run by executing the following command at the command line prompt:

> ./maltparser -f file

where file is the name of an option file, specifying all the parameters needed. From version 0.22 it is also possible to specify options using command line flags, which will override any settings included in the option file. (Although it is possible to run the program using only flags and no option file, it is usually more convenient to combine the two methods.) The option file and the flags are described in detail below. A list of all flags and options can be obtained by running MaltParser with the -h flag:

> ./maltparser -h

Option File

The option file contains a sequence of parameter specifications with the following simple syntax:

$PARAMETER$
VALUE

In addition, the option file may contain comment lines starting with "--". The following table lists all the available parameters with their permissible values. Default values are marked with "*". Parameters that lack a default value must be specified in the option file (if they are required by the particular configuration of modules invoked). An example option file can be found here.

In the following table, we describe the different options that can be specified in the option file. Options that have no default value must be specify. For each option we also give the flag that can be used for specifying the option directly in the command line (overriding any specifications in the option file).

Global Parameters Flag Description Values Description
INFORMAT -I Input data format MALTTAB*
CONLLTAB
Malt-TAB
CoNLL-X shared task format (tab separated)
OUTFORMAT -O Output data format MALTTAB
CONLLTAB
MALTXML*
CONLLXML
TIGERXML
Malt-TAB
CoNLL-X shared task format (tab separated)
Malt-XML
CoNLL-X shared task format (XML version)
TIGER-XML
INFILE -i Input file Filename The input (for both learning and parsing) must be in the Malt-TAB or CoNLL-X shared task format format, as specified in the INFORMAT option (see above). An example input file in Malt-TAB format can be found here.
OUTFILE -o Output file Filename
CHARSET -C Character set ISO-8859-1*
UTF-8
...
NB: The user must verify that the character encoding in the input file conforms to the specification of this option. The parser does not check that the encoding is correct and only escapes the characters ", ', &, <, > if the output format is MALTXML or TIGERXML.
VERBOSE -v Output to terminal YES*
NO
MAXSENTENCELENGTH -z Maximum number of tokens per sentence IntegerDefault = 512
MAXTOKENLENGTH -y Maximum number of characters per token IntegerDefault = 256
Tagset Parameters Flag Description Values Description
MAXTAGLENGTH -w Maximum number of characters per tag name IntegerDefault = 128
POSSET -P Part-of-speech tagset Filename The part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines). An example file can be found here.
CPOSSET -Q Coarse-grained part-of-speech tagset (CoNLL format only) Filename The coarse-grained part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines). An example file can be found here.
DEPSET -D Dependency type tagset Filename The dependency type tagset must be specified in a text file with one tag per line (and no blank lines). An example file can be found here.
ROOTLABEL -R Dependency type label used for unattached tokens String Default = ROOT
INTRALABEL -G Dependency type label used for non-head intraword tokens (FORM = "_") String Default = DERIV (cf. Turkish in CoNLL-X)
Parser Parameters Flag Description Values Description
MODE -m Mode (learning or parsing) PARSE* Parsing (using an induced model to parse new data)
LEARNLearning (inducing a model from treebank data)
ALGORITHM -a Parsing algorithm
(see description below)
NIVRE*
COVINGTON
Nivre (2003, 2004)
Covington (2001) (incremental)
PARSEROPTIONS -p Parser options (algorithm specific) -a [ES]Arc order (NIVRE):
E(ager), S(tandard)

NB: In flag, whitespace is replaced by underscore (e.g. "-a_E" instead of "-a E").

-o [0123]Oracle (NIVRE):
0 = default (MaltParser 0.2)
1 = always shift before reduce
2 = 1 + allow reduction of unattached tokens (HEAD = 0)
3 = 2 + allow roots to be labeled with DEPREL ≠ ROOTLABEL
-g [NP]Graph condition (COVINGTON):
N(on-Projective), P(rojective)
INTRAHEAD -g Position of intraword head (if allowed); non-head intraword always gets DEPREL = INTRALABEL and HEAD = ID-1 (LEFT) or HEAD = ID+1 (RIGHT) NONE*
LEFT
RIGHT
Not allowed
Head left (initial)
Head right (final) (cf. Turkish in CoNLL-X)
Guide Parameters Flag Description Values Description
MAXFEATURES -c Maximum number of features of each type IntegerDefault = 30
FEATURES -F Feature model specification
(see description below)
Filename Model specified in Filename.par

NB: If no feature model specification can be loaded, a default specification equivalent to m3.par is used for parsing.

Learner Parameters Flag Description Values Description
LEARNER -l Learner type
(see description below)
MBL*
SVM
Memory-based learning (TiMBL)
Support vector machine (LIBSVM)
LEARNEROPTIONS -L Parameter settings (learner specific) String TiMBL example: "-m M -k 5 -w 0 -d ID -L 3" (see TiMBL Documentation)

NB: In flag, whitespace is replaced by underscore (e.g. "-m_M_-k_5" instead of "-m M -k 5").

LIBSVM example: "-t 0" (see LIBSVM Documentation)
Extra options for SVM-S [0123]Strategy for splitting training data to train separate SVM classifiers
0 = no split (default)
1 = binary split according to whether the token on top of the stack has HEAD = 0
2 = split according to the value of a feature in the feature model (specified by the -F flag)
3 = combination of 1 and 2
-F [PCD][n]Model feature for splitting data
P = POS, C = CPOS, D = DEP
n = position in feature specification file, first feature of specified type being indexed 0)
default = P0
-T [n]Frequency threshold for training separate classifier if -S > 0 (default = 1)
-M [n]Maximum number of classifiers if -S > 1 (default = 400)
-N [n]Maximum number of threads for parallel processing during training if -S > 1 (default = 20)
-A [n]Split FEATS values into atomic components?
0 = no
1 = yes (default)

Inductive Dependency Parsing

MaltParser can be characterized as a data-driven parser-generator. While a traditional parser-generator constructs a parser given a grammar, a data-driven parser-generator constructs a parser given a treebank. MaltParser is an implementation of inductive dependency parsing, where the syntactic analysis of a sentence amounts to the derivation of a dependency structure, and where inductive machine learning is used to guide the parser at nondeterministic choice points. This parsing methodology is based on three essential components:
  1. Deterministic parsing algorithms for building dependency graphs.
  2. History-based feature models for predicting the next parser action.
  3. Discriminative machine learning to map histories to parser actions
Given the restrictions imposed by these components, MaltParser has been designed to give maximum flexibility in the way components can be varied independently of each other. We now describe the functionality for each of the components in turn.

Parsing Algorithms

Any deterministic parsing algorithm compatible with the MaltParser architecture has to operate with the following set of data structures, which also provide the interface to the feature model: An algorithm builds dependency structures incrementally by updating HEAD and DEP, but it can only add a dependency arc between the top of the stack (STACK[0]) and the next input token (INPUT[0]) in the current configuration. (The context stack CONTEXT is therefore only used by algorithms that allow non-projective dependency structures, since unattached tokens under a dependency arc are ruled out in projective dependency structures.) MaltParser provides two basic parsing algorithms, each with two options: NB: The strictly projective algorithms (Nivre's algorithm with -o 0 or -o 1 and Covington's algorithm with the -g P option) will only perform well if the training data does not contain unattached nodes (HEAD = 0) with arcs covering them, since the projectivity constraint will prohibit the reproduction of such covering arcs during parsing. For example, if internal punctuation is left unattached in the treebank data, performance can usually be improved considerably by attaching these punctuation tokens to the nearest words (either left or right).

Feature Models

MaltParser uses history-based feature models for predicting the next action in the deterministic derivation of a dependency structure, which means that it uses features of the partially built dependency structure together with features of the (tagged) input string. More precisely, features are defined in terms of the word form (LEX), part-of-speech (POS) or dependency type (DEP) of a token defined relative to one of the data structures STACK, INPUT and CONTEXT, using the auxiliary functions HEAD, LC, RC, LS and RS. A feature model is defined in an external feature specification with the following syntax:

<fspec>  ::= <feat>+ 

<feat>   ::= <lfeat> | <nlfeat>

<lfeat>  ::= (LEX|LEMMA|FEATS) \t <dstruc> \t <off> \t <suff> \n

<nlfeat> ::= (POS|DEP|CPOS) \t <dstruc> \t <off> \n

<dstruc> ::= (STACK|INPUT|CONTEXT)

<off>    ::= <nnint> \t <int> \t <nnint> \t <int> \t <int>

<suff>   ::= <nnint> 

<int>    ::= (...|-2|-1|0|1|2|...)

<nnint>  ::= (0|1|2|...)

As syntactic sugar, any <lfeat> or <nlfeat> can be truncated if all remaining integer values are zero. An example feature specification can be found here. Each feature is specified on a single line, consisting of at least two tab-separated columns. The first column defines the feature type to be lexical (LEX), part-of-speech (POS), dependency (DEP), lemma (LEM), coarse part-of-speech (CPOS), or morphosyntactic features (FEATS). (Note that the latter three types are only available in the CONLLTAB format.) The second column identifies one of the main data structures in the parser configuration, usually the stack (STACK) or the list of remaining input tokens (INPUT), as the ``base address'' of the feature. (The third alternative, CONTEXT, is relevant only together with Covington's algorithm in non-projective modes.) The actual address is then specified by a series of ``offsets'' with respect to the base address as follows:

Let us consider a few examples:
POS     STACK   0       0       0       0       0
POS     INPUT   1       0       0       0       0
POS	INPUT	0	-1	0	0	0
DEP     STACK   0       0       1       0       0
DEP     STACK   0       0       0       -1      0
The feature defined on the first line is simply the part-of-speech of the token on top of the stack (TOP). The second feature is the part-of-speech of the token immediately after the next input token in the input list (NEXT), while the third feature is the part-of-speech of the token immediately before NEXT in the original input string (which may not be present either in the INPUT list or the STACK anymore). The fourth feature is the dependency type of the head of TOP (zero steps down the stack, zero steps forward/backward in the input string, one step up to the head). The fifth and final feature is the dependency type of the leftmost dependent of TOP (zero steps down the stack, zero steps forward/backward in the input string, zero steps up through heads, one step down to the leftmost dependent). Using the syntactic sugar of truncating all remaining zeros, these five features can also be specified more succintly:
POS     STACK
POS     INPUT   1
POS     INPUT   0       -1
DEP     STACK   0       0        1
DEP     STACK   0       0        0       -1
The only difference between lexical (LEX, LEMMA and [for the time being] FEATS) and non-lexical features (POS, DEP, CPOS) is that the specification of lexical features may contain an eighth column specifying a suffix length n. By convention, if n = 0, the entire word form is included; otherwise only the n last characters are included in the feature value. (Currently, only the MBL learner can handle suffixes.) Thus, the following specification defines a feature the value of which is the four-character suffix of the word form of the next left sibling of the rightmost dependent of the head of the token immediately below TOP.
LEX     STACK   1       0        1       1       -1      4
Finally, it is worth noting that if any of the offsets is undefined in a given configuration, the feature is automatically assigned a null value.

Feature Model Examples

Some of the features regularly used in feature models are depicted below. Red features are lexical features (LEX); blue features are part-of-speech features (POS); and green features are dependency features (DEP).

The following table shows three of the models provided with Version 0.1 of MaltParser (there called MBL2, MBL3 and MBL4 because MBL was the only learner type supported in that version). For each model we also give a link to the feature specification for that model.

Models Top Next T N TH TL TR NL TH TL TR NL L1 L2 L3 Feature specification
M2 +++++++m2.par
M3 +++++++++m3.par
M4 +++++++++++m4.par

Learning Algorithms

Inductive dependency parsing requires a learning algorithm to induce a mapping from parser histories, relative to a given feature model, to parser actions, relative to a given parsing algorithm. MaltParser 0.2 comes with two different learning algorithms, each with a wide variety of parameters: