Effectively Constructing Reliable Data for Cross-Domain Text Classification

Fuzhen Zhuang; Qing He; Zhongzhi Shi

doi:10.1007/978-3-642-32891-6_6

Communication Dans Un Congrès Année : 2012

Effectively Constructing Reliable Data for Cross-Domain Text Classification

(1) , (1) , (1)

Fuzhen Zhuang

Fonction : Auteur
PersonId : 1008459

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology [Beijing]

Qing He

Fonction : Auteur
PersonId : 1008482

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology [Beijing]

Zhongzhi Shi

Fonction : Auteur
PersonId : 990754

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology [Beijing]

Résumé

Traditional classification algorithms often fail when the independent and identical distributed (i.i.d.) assumption does not hold, and the cross-domain learning emerges recently is to deal with this problem. Actually, we observe that though the trained model from training data may not perform well over all test data, it can give much better prediction results on a subset of the test data with high prediction confidence. Also this subset of data from test data set may have more similar distribution with the test data. In this study, we propose to construct the reliable data set with high prediction confidence, and use this reliable data as training data. Furthermore, we develop an EM algorithm to refine the model trained from the reliable data. The extensive experiments on text classification verify the effectiveness and efficiency of our methods. It is worth to mention that the model trained from the reliable data achieves a significant performance improvement compared with the one trained from the original training data, and our methods outperform all the baseline algorithms.

Mots clés

Cross-domain Learning Reliable Data EM Algorithm

Domaines

Informatique [cs]

Fichier principal

978-3-642-32891-6_6_Chapter.pdf (392.12 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01524990

Soumis le : vendredi 19 mai 2017-10:43:43

Dernière modification le : mardi 26 mars 2024-16:24:04

Dates et versions

hal-01524990 , version 1 (19-05-2017)

Licence

Paternité

Identifiants

HAL Id : hal-01524990 , version 1
DOI : 10.1007/978-3-642-32891-6_6

Citer

Fuzhen Zhuang, Qing He, Zhongzhi Shi. Effectively Constructing Reliable Data for Cross-Domain Text Classification. 7th International Conference on Intelligent Information Processing (IIP), Oct 2012, Guilin, China. pp.16-27, ⟨10.1007/978-3-642-32891-6_6⟩. ⟨hal-01524990⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP IFIP-AICT IFIP-TC IFIP-TC12 IFIP-AICT-385

77 Consultations

74 Téléchargements

Effectively Constructing Reliable Data for Cross-Domain Text Classification

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager