Memory-Based Active Learning for French Broadcast News - Archive ouverte HAL Access content directly
Conference Papers Year : 2010

Memory-Based Active Learning for French Broadcast News

(1) , (1) , (2)
1
2

Abstract

Stochastic dependency parsers can achieve very good results when they are trained on large corpora that have been manually annotated. Active learning is a procedure that aims at reducing this annotation cost by selecting as few sentences as possible that will produce the best possible parser. We propose a new selective sampling function for Active Learning that exploits two memory-based distances to find a good compromise between parser uncertainty and sentence representativeness. The reduced dependency between both parsing and selection models opens interesting perspectives for future models combination. The approach is validated on a French broadcast news corpus creation task dedicated to dependency parsing. It outperforms the baseline uncertainty entropy-based selective sampling on this task. We plan to extend this work with self- and co-training methods in order to enlarge this corpus and produce the first French broadcast news Tree Bank.
Not file

Dates and versions

inria-00540423 , version 1 (26-11-2010)

Identifiers

  • HAL Id : inria-00540423 , version 1

Cite

Frédéric Tantini, Christophe Cerisara, Claire Gardent. Memory-Based Active Learning for French Broadcast News. INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1377-1380. ⟨inria-00540423⟩
85 View
0 Download

Share

Gmail Facebook Twitter LinkedIn More