Thousands of enciphered historical manuscripts are buried in libraries and archives. Examples of such material are diplomatic correspondence and intelligence reports, private letters and diaries as well as manuscripts related to secret societies, or other (religious) groups in the margins of society. The bulk of these historical manuscripts will remain undeciphered unless we can automate - in part or in full - the processes involved in decoding them.
Our aim is to develop computer-aided tools for automatic and semi-automatic decoding of historical source material by cross-disciplinary research involving computer science, language technology, linguistics and philology. DECODE involves the collection of ciphertexts and keys from Early Modern times, the systematic automatic detection of various cipher types, the development of algorithms for (semi-)automatic decryption of different types of ciphers, and the creation of language models and pattern dictionaries for early variants of fifteen European languages: Czech, Dutch, English, Finnish, French, German, Hungarian, Icelandic, Italian, Latin, Polish, Portuguese, Russian, Spanish, and Swedish.
The cipher database and the historical language collections and language models will be available soon...
Principal Investigator: Beáta Megyesi, Department of Linguistics and Philology, Uppsala University
Participants: Kevin Knight and Nada Aldarrab, ISI, University of Southern California and Nicklas Bergman, Nils Blomqvist, and Eva Pettersson, Department of Linguistics and Philology, Uppsala University