Lab 4 - Statistical Machine Translation

Aim

In this lab, you will have the opportunity to train and examine a phrase-based SMT system and to explore how the dynamic programming beam search algorithm for phrase-based statistical machine translation works.

Assignments

Model training

First, you will train a complete SMT system with the familiar Blockworld corpus to familiarise yourself with the Moses training pipeline. You can still find the data in /local/kurs/mt/lab3/data.

Copy it to your home directory if you haven't already done so.

Choose which language (English or Swedish) you want to use as your source and target language, respectively.

To begin with, you need a language model for the target language. You may still have one lying around from your earlier lab assignments, if not follow the instructions from part 2 of lab 2 to train a model with SRILM. Use an n-gram order that you found worked well in the experiments of lab 2.

Next, use the training scripts provided with Moses to train your model:

/local/kurs/mt/mosesdecoder/scripts/training/train-model.perl --corpus corpus \
    --f src --e trg --root-dir moses.output --lm 0:order:lm-file \
    --external-bin-dir /local/kurs/mt/bin/ > logfile 2>& 1
Here, --root-dir is the name of a directory that will be created to hold the models. The other placeholders in italics are for the corpus name, source and target language suffixes, LM order, and LM file. If your corpus is stored in corpus.parallel.swe and corpus.parallel.eng, you should give corpus as corpus.parallel, and then src and trg are eng and swe or vice-versa depending on which direction you want to translate. Note that you must give the full path to the LM file, i.e. /home/stp15/YOURNAME/Documents/....

Once the training is done, take a look at the training log to see what happened and if everything went well.

- Models

Examine the files generated by the training process. Try to figure out what information they contain by looking at them. You may consult the Moses webpage to read about the training process. The training log may help you understand what goes on during training. Try to relate the training log output to the training pipeline on lecture slides.

1) Make a list of all the files generated and briefly describe (1-3 lines per file) their contents.

- Phrase table

For the following assignments, locate the phrase table and the decoder configuration file.

The phrase table contains five fields separated by " ||| " marks: Source phrase, target phrase, feature values, word alignments and some counts from the training corpus. Some of the feature values are probabilities summing to 1 over a certain set of alternatives, some are not.

2) One feature is the probability of the source phrase given the target, and one is the probability of the target phrase given the source. Can you spot these two?

- Decoder configuration file

The Moses configuration file contains different sections. The [feature] section contains pointers to the phrase table file and language model file.

The configuration file also contains some feature weights. Note that the phrase table has 4 weights, one for each feature contained in the phrase table.

There are no questions to answer about this file, but take a good look at it and make yourself familiar with the main parameters defining a phrase-based SMT model.

Testing your Blockworld model

Try running Moses with the model you've just trained:
/local/kurs/mt/mosesdecoder/bin/moses -f moses.ini
You can find the test sentences from lab 2 in /local/kurs/mt/lab2/data/test_meningar.language.

3) Feed them into the decoder and examine the output. How does it compare to the word-based systems you used in earlier labs?

Exploring the search algorithm

For the rest of the assignments, we're going to use a real-world Swedish-English model trained on Europarl data. You can find the model in /local/kurs/mt/lab-moses/europarl.sv-en. It's substantially larger than the Blockworld model. There's a ready-made moses.ini for you to use.

Copy it to your directory. Note that this configuration file was made with an earlier version of Moses so probably looks a bit different to the one you created in the previous section.

The model works with lowercased and tokenised text. You can use the script preprocess.sh in the model directory to preprocess your test sentences in the same way.

Start the decoder with this model, try entering a few sentences and look at the translations you get. You can make up your own sentences or copy some sentences from a newspaper website such as DN or Svenska Dagbladet. You can quit the decoder by pressing Control-D.

Look at the BEST TRANSLATION line to see the scores. The decoder outputs the total score as well as the vector of the individual core feature scores. If you wonder which score corresponds to which feature, stop the decoder and run it again as

/local/kurs/mt/mosesdecoder/bin/moses -f moses.ini -show-weights

This will output the feature names and their corresponding weights.

You can increase the decoder's verbosity level to see what it does. If you run the decoder with the -v 2 option, it will tell you how many hypotheses were expanded, recombined, etc. With the -v 3 option, the decoder will dump information about all the hypotheses it expands to standard error. The -report-segmentation option will show you how the input sentence was segmented into phrases.

Another way to gather information about how decoder parameters affect the output is by looking at n-best-lists containing the n best hypotheses in the part of the search space explored by the decoder. To generate n-best output, start the decoder with the -n-best-list file size option. This will output n-best-lists of the given size to the file you specify. Use an n-best size of around 100 to obtain a fair impression of the best output hypotheses.

Here are some options you can use to influence the search:

-stack S sets the stack size S for histogram pruning (default: 100)
-beam-threshold eta sets the beam threshold eta for threshold pruning (default: 0.00001, which effectively disables threshold pruning in most cases!)
-max-phrase-length p segments the input into phrases of length at most p (default: 10, which is more than the maximum phrase length in our phrase table!)
-distortion-limit d sets the distortion limit (maximum jump) to d (default: 6; 0 means no reordering permitted, -1 means unlimited)

You can also change the ttable-limit directly in moses.ini - this affects how many translation options are loaded for each span.

4) Try experimenting with these options, with input sentences of varying length, and find out how they interact with the number of hypotheses expanded by the decoder and with your subjective perception of translation quality. Again, you can make up your own sentences or copy them from a newspaper website. Report your observations.

5) Pick an input sentence of 4-5 tokens, adjust the search parameters so that the decoder only expands around 15-25 hypotheses. You will need heavy pruning and/or strict limits to achieve this. Report what settings you used. Use the -v 3 flag to output all hypotheses and draw the search graph explored by the decoder on a sheet of paper (handwriting is quite ok for this task). The option --output-search-graph can be useful to navigate through the output!!!

6) Using a longer input sentence and more normal parameter settings, consider two different decoder configurations that produce different output. Use two configurations that are not too similar. They should differ substantially in several parameters. Report your test sentence of choice.

7) Compare the model scores of the two translations of your test sentence. Since the models are the same, the one with the lower score is the manifestation of a search error. Let's call the system with the lower score the target system and the system with the higher score the reference system.

8) Use the -v 3 output of the target system to find out where the search error occurs. Then try to adjust the target system's search parameters in such a way that the better solution output by the reference system is found, while expanding as few additional hypotheses as possible. Report how you proceeded and what you were able to achieve.

9) Then try translating some 10-20 different sentences with this optimised target system and your original reference system and find out how these two systems compare with each other in terms of model scores.

As for finding the search error (assignment 8), what you should do is find out what the best solution in your reference system looks like (segmentation, phrase translations, ordering). Then look at the search log (the -v 3 output) of the target system, starting with the empty hypothesis at the beginning (number 0) and try to follow the search path that would generate the same solution. I suggest you load the search log into a text editor, so you can use the search function to search for hypothesis numbers to see how they're expanded. The search error occurs at the point where the last hypothesis that is a prefix of the correct solution stops being expanded, because it's pruned or removed from the stack in another way. Depending on how exactly you set up your target system, it may also fail to generate the best solution in the first place, e.g., because the ttable-limit or the distortion limit prevents it. Then that would be the source of the search error.

Overall impression of phrase-based SMT

10) After doing this lab, what is your overall impression of phrase-based SMT? What are its advantages and disadvantages?

11) How does it compare to what you know about word-based SMT? What's the main advantage of the phrase-based approach over the word-based? Does word-based SMT have any advantages?

12) Where would you start if you wanted to improve phrase-based SMT?

Guidelines for lab report

Report your results for the above assignments 1 to 12.
Include your name and the name of your lab partner in the report.
Upload your report in English and PDF format to Studentportalen.
You should also hand in your search graph drawing. Don't forget to put your name on it!
Deadline for handing in the report: May 26th, 2017
Last possible deadline for handing in the report: June 2nd, 2017