PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce

Abstract : PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement.
Type de document :
Communication dans un congrès
Zhongzhi Shi; David Leake; Sunil Vadera. 7th International Conference on Intelligent Information Processing (IIP), Oct 2012, Guilin, China. Springer, IFIP Advances in Information and Communication Technology, AICT-385, pp.40-49, 2012, Intelligent Information Processing VI. 〈10.1007/978-3-642-32891-6_8〉
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01524958
Contributeur : Hal Ifip <>
Soumis le : vendredi 19 mai 2017 - 10:43:18
Dernière modification le : vendredi 3 novembre 2017 - 22:24:07

Fichier

978-3-642-32891-6_8_Chapter.pd...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Ning Li, Fuzhen Zhuang, Qing He, Zhongzhi Shi. PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce. Zhongzhi Shi; David Leake; Sunil Vadera. 7th International Conference on Intelligent Information Processing (IIP), Oct 2012, Guilin, China. Springer, IFIP Advances in Information and Communication Technology, AICT-385, pp.40-49, 2012, Intelligent Information Processing VI. 〈10.1007/978-3-642-32891-6_8〉. 〈hal-01524958〉

Partager

Métriques

Consultations de la notice

31

Téléchargements de fichiers

58