Finding hyphenations

find-hyphs.py is a Python program for finding hyphenations in a text file. The output is a word list showing what hyphenation points have been used, where - indicates one instance, = two and # more than two.

To use with a pdf file, combine with pdftotext, for instance like

pdftotext -layout foo.pdf  - | python find-hyphs.py

or in a Makefile with

%-hy.txt : %.txt
	python find-hyphs.py <$^ | sort >$@

%.txt : %.pdf
	pdftotext -layout $^

The -layout option is needed to keep hyphenations. (It won't work with two-column text.)

Per Starb├Ąck, starback@stp.lingfil.uu.se