Skip to Main content Skip to Navigation
Conference papers

Related Text Discovery Through Consecutive Filtering and Supervised Learning

Abstract : In a related or topic-based text discovery task, there are often a small number of related or positive texts in contrast to a large number of unrelated or negative texts. So, the related and unrelated classes of the texts can be strongly imbalanced so that the classification or detection is very difficult because the recall of positive class is very low. In order to overcome this difficulty, we propose a consecutive filtering and supervised learning method, i.e., consecutive supervised bagging. That is, in each consecutive learning stage, we firstly delete some negative texts with the higher degree of confidence via the classifier trained in the previous stage. We then train the classifier on the retained texts. We repeat this procedure until the ratio of the negative and positive texts becomes reasonable and finally obtain a tree-like filtering and recognition system. It is demonstrated by the experimental results on 20NewsGroups data (English data) and THUCNews (Chinese data) that our proposed method is much better than AdaBoost and Rocchio.
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.inria.fr/hal-02118810
Contributor : Hal Ifip <>
Submitted on : Friday, May 3, 2019 - 1:25:15 PM
Last modification on : Friday, May 3, 2019 - 3:40:19 PM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2021-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Daqing Wu, Jinwen Ma. Related Text Discovery Through Consecutive Filtering and Supervised Learning. 2nd International Conference on Intelligence Science (ICIS), Nov 2018, Beijing, China. pp.211-220, ⟨10.1007/978-3-030-01313-4_22⟩. ⟨hal-02118810⟩

Share

Metrics

Record views

32