Skip to Main content Skip to Navigation
Conference papers

Short Text Feature Extension Based on Improved Frequent Term Sets

Abstract : A short text feature extension algorithm based on improved frequent word set is proposed. By calculating support and confidence, the same category tendencies of frequent term sets are extracted. Correlations based frequent term sets are defined to further extend the term set. Meanwhile, information gain is introduced to traditional TF-IDF, better expressing the category distribution information and the weight of word for each category is enhanced. All term pairs with external relations are extracted and the frequent term set is expanded. Finally, the word similarity matrix is constructed via the frequent word set, and the symmetric non-negative matrix factorization technique is applied to extend the feature space. Experiments show that the constructed short text model can improve the performance of short text clustering.
Document type :
Conference papers
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Wednesday, October 11, 2017 - 4:57:54 PM
Last modification on : Thursday, March 5, 2020 - 5:43:16 PM
Long-term archiving on: : Friday, January 12, 2018 - 3:25:22 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Huifang Ma, Lei Di, Xiantao Zeng, Li Yan, Yuyi Ma. Short Text Feature Extension Based on Improved Frequent Term Sets. 9th International Conference on Intelligent Information Processing (IIP), Nov 2016, Melbourne, VIC, Australia. pp.169-178, ⟨10.1007/978-3-319-48390-0_18⟩. ⟨hal-01614992⟩



Record views


Files downloads