Pagerank based clustering of hypertext document collections

Abstract : Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering.
Type de document :
Communication dans un congrès
International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul 2008, Singapore, Singapore. ACM, pp.873--874, 2008, SIGIR'08: the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 〈10.1145/1390334.1390549〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00565355
Contributeur : Konstantin Avrachenkov <>
Soumis le : vendredi 11 février 2011 - 18:43:34
Dernière modification le : dimanche 25 février 2018 - 14:48:02

Lien texte intégral

Identifiants

Collections

Citation

Konstantin Avrachenkov, Vladimir Dobrynin, Danil Nemirovsky, Son Kim Pham, Elena Smirnova. Pagerank based clustering of hypertext document collections. International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul 2008, Singapore, Singapore. ACM, pp.873--874, 2008, SIGIR'08: the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 〈10.1145/1390334.1390549〉. 〈inria-00565355〉

Partager

Métriques

Consultations de la notice

130