HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Related Text Discovery Through Consecutive Filtering and Supervised Learning

Abstract : In a related or topic-based text discovery task, there are often a small number of related or positive texts in contrast to a large number of unrelated or negative texts. So, the related and unrelated classes of the texts can be strongly imbalanced so that the classification or detection is very difficult because the recall of positive class is very low. In order to overcome this difficulty, we propose a consecutive filtering and supervised learning method, i.e., consecutive supervised bagging. That is, in each consecutive learning stage, we firstly delete some negative texts with the higher degree of confidence via the classifier trained in the previous stage. We then train the classifier on the retained texts. We repeat this procedure until the ratio of the negative and positive texts becomes reasonable and finally obtain a tree-like filtering and recognition system. It is demonstrated by the experimental results on 20NewsGroups data (English data) and THUCNews (Chinese data) that our proposed method is much better than AdaBoost and Rocchio.
Document type :
Conference papers
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download

Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Friday, May 3, 2019 - 1:25:15 PM
Last modification on : Friday, May 3, 2019 - 3:40:19 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Daqing Wu, Jinwen Ma. Related Text Discovery Through Consecutive Filtering and Supervised Learning. 2nd International Conference on Intelligence Science (ICIS), Nov 2018, Beijing, China. pp.211-220, ⟨10.1007/978-3-030-01313-4_22⟩. ⟨hal-02118810⟩



Record views


Files downloads