PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce - Archive ouverte HAL Access content directly
Conference Papers Year : 2012

PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce

(1, 2) , (2) , (2) , (2)
1
2

Abstract

PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement.
Fichier principal
Vignette du fichier
978-3-642-32891-6_8_Chapter.pdf (327.95 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01524958 , version 1 (19-05-2017)

Licence

Attribution - CC BY 4.0

Identifiers

Cite

Ning Li, Fuzhen Zhuang, Qing He, Zhongzhi Shi. PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce. 7th International Conference on Intelligent Information Processing (IIP), Oct 2012, Guilin, China. pp.40-49, ⟨10.1007/978-3-642-32891-6_8⟩. ⟨hal-01524958⟩
50 View
148 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More