Short Text Feature Extension Based on Improved Frequent Term Sets

Abstract : A short text feature extension algorithm based on improved frequent word set is proposed. By calculating support and confidence, the same category tendencies of frequent term sets are extracted. Correlations based frequent term sets are defined to further extend the term set. Meanwhile, information gain is introduced to traditional TF-IDF, better expressing the category distribution information and the weight of word for each category is enhanced. All term pairs with external relations are extracted and the frequent term set is expanded. Finally, the word similarity matrix is constructed via the frequent word set, and the symmetric non-negative matrix factorization technique is applied to extend the feature space. Experiments show that the constructed short text model can improve the performance of short text clustering.
Type de document :
Communication dans un congrès
9th International Conference on Intelligent Information Processing (IIP), Nov 2016, Melbourne, VIC, Australia. IFIP Advances in Information and Communication Technology, AICT-486, pp.169-178, 2016, Intelligent Information Processing VIII. 〈10.1007/978-3-319-48390-0_18〉
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01614992
Contributeur : Hal Ifip <>
Soumis le : mercredi 11 octobre 2017 - 16:57:54
Dernière modification le : mercredi 11 octobre 2017 - 17:00:31
Document(s) archivé(s) le : vendredi 12 janvier 2018 - 15:25:22

Fichier

 Accès restreint
Fichier visible le : 2019-01-01

Connectez-vous pour demander l'accès au fichier

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Huifang Ma, Lei Di, Xiantao Zeng, Li Yan, Yuyi Ma. Short Text Feature Extension Based on Improved Frequent Term Sets. 9th International Conference on Intelligent Information Processing (IIP), Nov 2016, Melbourne, VIC, Australia. IFIP Advances in Information and Communication Technology, AICT-486, pp.169-178, 2016, Intelligent Information Processing VIII. 〈10.1007/978-3-319-48390-0_18〉. 〈hal-01614992〉

Partager

Métriques

Consultations de la notice

31