Short Text Feature Extension Based on Improved Frequent Term Sets

Abstract : A short text feature extension algorithm based on improved frequent word set is proposed. By calculating support and confidence, the same category tendencies of frequent term sets are extracted. Correlations based frequent term sets are defined to further extend the term set. Meanwhile, information gain is introduced to traditional TF-IDF, better expressing the category distribution information and the weight of word for each category is enhanced. All term pairs with external relations are extracted and the frequent term set is expanded. Finally, the word similarity matrix is constructed via the frequent word set, and the symmetric non-negative matrix factorization technique is applied to extend the feature space. Experiments show that the constructed short text model can improve the performance of short text clustering.
Document type :
Conference papers
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://hal.inria.fr/hal-01614992
Contributor : Hal Ifip <>
Submitted on : Wednesday, October 11, 2017 - 4:57:54 PM
Last modification on : Wednesday, October 11, 2017 - 5:00:31 PM
Long-term archiving on: Friday, January 12, 2018 - 3:25:22 PM

File

433802_1_En_18_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Huifang Ma, Lei Di, Xiantao Zeng, Li Yan, Yuyi Ma. Short Text Feature Extension Based on Improved Frequent Term Sets. 9th International Conference on Intelligent Information Processing (IIP), Nov 2016, Melbourne, VIC, Australia. pp.169-178, ⟨10.1007/978-3-319-48390-0_18⟩. ⟨hal-01614992⟩

Share

Metrics

Record views

67

Files downloads

73