Effectively Constructing Reliable Data for Cross-Domain Text Classification

Abstract : Traditional classification algorithms often fail when the independent and identical distributed (i.i.d.) assumption does not hold, and the cross-domain learning emerges recently is to deal with this problem. Actually, we observe that though the trained model from training data may not perform well over all test data, it can give much better prediction results on a subset of the test data with high prediction confidence. Also this subset of data from test data set may have more similar distribution with the test data. In this study, we propose to construct the reliable data set with high prediction confidence, and use this reliable data as training data. Furthermore, we develop an EM algorithm to refine the model trained from the reliable data. The extensive experiments on text classification verify the effectiveness and efficiency of our methods. It is worth to mention that the model trained from the reliable data achieves a significant performance improvement compared with the one trained from the original training data, and our methods outperform all the baseline algorithms.
Type de document :
Communication dans un congrès
Zhongzhi Shi; David Leake; Sunil Vadera. 7th International Conference on Intelligent Information Processing (IIP), Oct 2012, Guilin, China. Springer, IFIP Advances in Information and Communication Technology, AICT-385, pp.16-27, 2012, Intelligent Information Processing VI. 〈10.1007/978-3-642-32891-6_6〉
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01524990
Contributeur : Hal Ifip <>
Soumis le : vendredi 19 mai 2017 - 10:43:43
Dernière modification le : vendredi 3 novembre 2017 - 22:24:07

Fichier

978-3-642-32891-6_6_Chapter.pd...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Fuzhen Zhuang, Qing He, Zhongzhi Shi. Effectively Constructing Reliable Data for Cross-Domain Text Classification. Zhongzhi Shi; David Leake; Sunil Vadera. 7th International Conference on Intelligent Information Processing (IIP), Oct 2012, Guilin, China. Springer, IFIP Advances in Information and Communication Technology, AICT-385, pp.16-27, 2012, Intelligent Information Processing VI. 〈10.1007/978-3-642-32891-6_6〉. 〈hal-01524990〉

Partager

Métriques

Consultations de la notice

66

Téléchargements de fichiers

17