Thesis-related stuff
Motivation
The purpose of my thesis is to investigate if and how some automated textlinguistic methods can give more relevant hits in information retrieval, and give coherent summaries that are more query and user adapted than those usually given in information systems.
A lexical cohesion analysis is used as a basis for indexing, searching and a short summary in an information system. The analysis is based on a number of knowledge bases containing linguistic or world knowledge, and the result will mainly depend on what knowledge is available.
By combining the lexical cohesion analysis with a Rhetorical Structure Theory analysis, it should be possible to come to terms with some coherence problems in summaries only based on lexical cohesion analysis. At the same time, the less computationally costly lexical cohesion analysis could reduce the number of possible RST analyses, since it also gives an estimate on how closely sentences are related.
Unfortunately, the RST part turned out to be too wieldy to fit into the thesis, and had to be put off to a later date.
Some of the resources developed for the thesis are also relevant for other projects, e.g. Text and language assessment of mathematics and science and SNK/BLARK - Svensk nationell korpus [Swedish National Corpus]/Basic LAnguage Resource Kit.
Lexical cohesion analysis
Some resources developed for using in lexical cohesion analysis, and related papers and presentations:
- General presentations:
- Eva Forsbom. 2008. Text-specific thesauri for information access. PhD seminar at Uppsala, November 28. (Abstract txt.)
- Eva Forsbom. 2007. Dynamic Text-Centered Thesaurus. Poster presentation in connection with research evaluation panel visit Kvalitet och förnyelse [Quality and Renewal]. Uppsala, May 8-10.
- Eva Forsbom. 2006. Dynamic text-specific thesauri for information access. PhD seminar at GSLT. Göteborg, September 13. (Abstract txt.)
- Eva Forsbom. 2006. Dynamiska textspecifika tesaurer för informationsåtkomst. PhD seminar at Uppsala, September 1. (See above.)
- Swedish part-of-speech tagging models:
- Resources: tagger models
- Eva Forsbom. 2009. Extending the View: Explorations in Bootstrapping a Swedish PoS Tagger. In Proceedings of the 17th Nordic Conference on Computational Linguistics NODALIDA 2009, Odense, Denmark, May 15-16. NEALT Proceedings Series, Vol. 4, pp. 34-40. (URI)
- Eva Forsbom. 2008. Size is not Everything. Genre Balance in Bootstrapping a Swedish PoS Tagger. In Proceedings of the Swedish Language Technology Conference (SLTC'08), pp. 43-44. Stockholm, November 20-21. (abstract)
- Eva Forsbom. 2008. Good Tag Hunting: Tagability of Granska Tags. In Joakim Nivre, Mats Dahllöf och Beáta Megyesi (ed.), Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein, ACTA UNIVERSITATIS UPSALIENSIS Studia Linguistica Upsaliensia 7, pp. 77-85. (URN and pdf)
- Eva Forsbom. 2006. Big is beautiful: Bootstrapping a PoS tagger for Swedish. Poster presentation at the GSLT retreat. Gullmarsstrand, January 27-29. (pdf, cross_validation_sets.txt)
- A base vocabulary pool derived from the Stockholm-Umeå corpus:
- Software: BaseVocabulary package
- Eva Forsbom. 2006. A Swedish Base Vocabulary Pool. Presentation at the Swedish Language Technology Conference. Göteborg, October 27-28. (Extended abstract pdf.)
- Statistical data analysis term paper: Deriving a base vocabulary pool from the Stockholm-Umeå Corpus (pdf)
- Wordform-baseform mapper models based on the base vocabulary
pool:
- Software: BaseModel package
- Demo
- Eva Forsbom. 2007. Deriving a base vocabulary pool from a categorised corpus. Presentation at Corpus Linguistics Workshop. Department of English, Uppsala, December 6-7.
- Eva Forsbom. 2007. Inducing Baseform Models from a Swedish Vocabulary Pool. In Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007, pp. 51-58. Tartu, Estonia, May 25-26. (pdf)
- Minimally supervised induction of morphology term paper: Inducing baseform models from a Swedish vocabulary pool (pdf, longer, and older, version of the above paper)
- A morphological classifier:
- Java development for HLT term paper (literature review): Återanvändbarhet för språkvara [Reusability of lingware] (gzipped postscript, in Swedish)
Rhetorical Structure Theory and Veins Theory
Some resources developed for using in RST and VT analysis, and related course papers:
- General presentations:
- Eva Forsbom. 2005. Rhetorical structure analysis as a basis for summarisation in information retrieval. PhD seminar at GSLT. Göteborg, January 31. (Abstract txt.)
- Eva Forsbom. 2004. Analys av retorisk struktur som grund för sammanfattningar vid informationssökning. PhD seminar at Uppsala, December 10. (Abstract txt.)
- Genre classifiers (decision trees):
- Statistical methods term paper: Feature Extraction for Genre Classification (pdf)
- Genre classifiers (artificial neural networks):
- Eva Forsbom. 2007. Feature Combination for Genre Classification. Poster presentation at the GSLT retreat. Gullmarsstrand, January 27.
- Artificial neural networks term paper: Feature Combination for Genre Classification (continuation of decision tree genre classification, pdf, cross_validation_sets.txt)
- Theory of science term paper: Tycker du som jag? eller Vem ska man tro på? Om mätning av samstämmighet vid annotering [Whom should you believe? On agreement in annotation] (pdf, on the disadvantages of Kappa statistics, in Swedish)
- Natural language generation assignment (literature review): Rhetorical Structure Theory in Natural Language Generation (pdf)
- Natural language generation term paper: Focussing Subject-Specific Summaries from RST and VT Trees (pdf)
- Rhetorical parser (for manual annotations):
- Parsing methods term paper: Rhetorical parsing (available on request)
- Machine learning 2 project: Clause Segmentation and Classification (work in progress)
