Memory-Based Active Learning for French Broadcast News

Frédéric Tantini 1 Christophe Cerisara 1 Claire Gardent 2
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 TALARIS - Natural Language Processing: representation, inference and semantics
Inria Nancy - Grand Est, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Stochastic dependency parsers can achieve very good results when they are trained on large corpora that have been manually annotated. Active learning is a procedure that aims at reducing this annotation cost by selecting as few sentences as possible that will produce the best possible parser. We propose a new selective sampling function for Active Learning that exploits two memory-based distances to find a good compromise between parser uncertainty and sentence representativeness. The reduced dependency between both parsing and selection models opens interesting perspectives for future models combination. The approach is validated on a French broadcast news corpus creation task dedicated to dependency parsing. It outperforms the baseline uncertainty entropy-based selective sampling on this task. We plan to extend this work with self- and co-training methods in order to enlarge this corpus and produce the first French broadcast news Tree Bank.
Type de document :
Communication dans un congrès
INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1377-1380, 2010
Liste complète des métadonnées

https://hal.inria.fr/inria-00540423
Contributeur : Christophe Cerisara <>
Soumis le : vendredi 26 novembre 2010 - 16:33:19
Dernière modification le : vendredi 9 février 2018 - 13:20:01

Identifiants

  • HAL Id : inria-00540423, version 1

Collections

Citation

Frédéric Tantini, Christophe Cerisara, Claire Gardent. Memory-Based Active Learning for French Broadcast News. INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1377-1380, 2010. 〈inria-00540423〉

Partager

Métriques

Consultations de la notice

169