Skip to Main content Skip to Navigation
Conference papers

Memory-Based Active Learning for French Broadcast News

Frédéric Tantini 1 Christophe Cerisara 1 Claire Gardent 2
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 TALARIS - Natural Language Processing: representation, inference and semantics
Inria Nancy - Grand Est, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Stochastic dependency parsers can achieve very good results when they are trained on large corpora that have been manually annotated. Active learning is a procedure that aims at reducing this annotation cost by selecting as few sentences as possible that will produce the best possible parser. We propose a new selective sampling function for Active Learning that exploits two memory-based distances to find a good compromise between parser uncertainty and sentence representativeness. The reduced dependency between both parsing and selection models opens interesting perspectives for future models combination. The approach is validated on a French broadcast news corpus creation task dedicated to dependency parsing. It outperforms the baseline uncertainty entropy-based selective sampling on this task. We plan to extend this work with self- and co-training methods in order to enlarge this corpus and produce the first French broadcast news Tree Bank.
Document type :
Conference papers
Complete list of metadata
Contributor : Christophe Cerisara Connect in order to contact the contributor
Submitted on : Friday, November 26, 2010 - 4:33:19 PM
Last modification on : Friday, February 26, 2021 - 3:28:08 PM


  • HAL Id : inria-00540423, version 1



Frédéric Tantini, Christophe Cerisara, Claire Gardent. Memory-Based Active Learning for French Broadcast News. INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1377-1380. ⟨inria-00540423⟩



Record views