Discriminative clustering of text documents

Reference:

Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski. Discriminative clustering of text documents. In Lipo Wang, Jagath C. Rajapakse, Kunihiko Fukushima, Soo-Young Lee, and Xin Yao, editors, Proceedings of ICONIP'02, 9th International Conference on Neural Information Processing, pages 1956–1960, Piscataway, NJ, 2002. IEEE. Preprint postscript at http://www.cis.hut.fi/projects/mi/papers/iconip02.ps.gz.

Abstract:

Vector-space and distributional methods for text document clustering are discussed. Discriminative clustering, a recently proposed method, uses external data to find task-relevant characteristics of the documents, yet the clustering is defined even with no external data. We introduce a distributional version of discriminative clustering that represents text documents as probability distributions. The methods are tested in the task of clustering scientific document abstracts, and the ability of the methods to predict an independent topical classification of the abstracts is compared. The discriminative methods found topically more meaningful clusters than the vector space and distributional clustering models.

Suggested BibTeX entry:

@inproceedings{Peltonen02,
    address = {Piscataway, NJ},
    author = {Jaakko Peltonen and Janne Sinkkonen and Samuel Kaski},
    booktitle = {Proceedings of ICONIP'02, 9th International Conference on Neural Information Processing},
    editor = {Lipo Wang and Jagath C. Rajapakse and Kunihiko Fukushima and Soo-Young Lee and Xin Yao},
    note = {Preprint postscript at \url{http://www.cis.hut.fi/projects/mi/papers/iconip02.ps.gz}},
    pages = {1956-1960},
    publisher = {IEEE},
    title = {Discriminative clustering of text documents},
    year = {2002},
}

See ieeexplore.ieee.org ...