Allomorfessor: Towards Unsupervised Morpheme Analysis

Reference:

Oskar Kohonen, Sami Virpioja, and Mikaela Klami. Allomorfessor: Towards unsupervised morpheme analysis. In In Working Notes of the CLEF 2008 Workshop, Aarhus, Denmark, 2008.

Abstract:

Many modern natural language processing applications would benefit from automatic morphological analysis of words, especially when dealing with morphologically rich languages. Consequently, there has been an increasing amount of research on the task of unsupervised segmentation of word forms into smaller useful units, i.e. morphs or morphemes. The linguistic phenomenon of allomorphy, where one morpheme has several different surface forms, places limits on the quality of morpheme analysis achievable by segmentation alone. We extend the morphological segmentation method, Morfessor Baseline, to model allomorphy. Our unsupervised method discovers common base forms for allomorphs from an unannotated corpus. We evaluate the method by participating in the Morpho Challenge 2008 competition, where automatic morphological analyses of corpora in English, German, Turkish and Finnish are compared against a linguistic gold standard. Our method achieves high precision, but low recall, and therefore low F-measure scores. We conclude that our method currently undersegments, but that the main approach is promising.

Suggested BibTeX entry:

@inproceedings{okohonenvirpiojaklami_2008,
    address = {Aarhus, Denmark},
    author = {Oskar Kohonen and Sami Virpioja and Mikaela Klami},
    booktitle = {In Working Notes of the CLEF 2008 Workshop},
    title = {Allomorfessor: Towards Unsupervised Morpheme Analysis},
    year = {2008},
}

PDF (148 kB)
See www.clef-campaign.org ...