Memory-Based Active Learning for French Broadcast News

Frédéric Tantini; Christophe Cerisara; Claire Gardent

Communication Dans Un Congrès Année : 2010

Memory-Based Active Learning for French Broadcast News

(1) , (1) , (2)

1
2

Frédéric Tantini

Fonction : Auteur
PersonId : 883798

Analysis, perception and recognition of speech

Christophe Cerisara

Fonction : Auteur
PersonId : 2353
IdHAL : christophe-cerisara
IdRef : 102700168

Analysis, perception and recognition of speech

Claire Gardent

Fonction : Auteur
PersonId : 3949
IdHAL : claire-gardent
ORCID : 0000-0002-3805-6662
IdRef : 034104593

Natural Language Processing: representation, inference and semantics

Résumé

Stochastic dependency parsers can achieve very good results when they are trained on large corpora that have been manually annotated. Active learning is a procedure that aims at reducing this annotation cost by selecting as few sentences as possible that will produce the best possible parser. We propose a new selective sampling function for Active Learning that exploits two memory-based distances to find a good compromise between parser uncertainty and sentence representativeness. The reduced dependency between both parsing and selection models opens interesting perspectives for future models combination. The approach is validated on a French broadcast news corpus creation task dedicated to dependency parsing. It outperforms the baseline uncertainty entropy-based selective sampling on this task. We plan to extend this work with self- and co-training methods in order to enlarge this corpus and produce the first French broadcast news Tree Bank.

Christophe Cerisara : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00540423

Soumis le : vendredi 26 novembre 2010-16:33:19

Dernière modification le : vendredi 24 mars 2023-14:52:53

Dates et versions

inria-00540423 , version 1 (26-11-2010)

Identifiants

HAL Id : inria-00540423 , version 1

Citer

Frédéric Tantini, Christophe Cerisara, Claire Gardent. Memory-Based Active Learning for French Broadcast News. INTERSPEECH 2010, Sep 2010, Tokyo, Japan. pp.1377-1380. ⟨inria-00540423⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

86 Consultations

0 Téléchargements

Memory-Based Active Learning for French Broadcast News

Résumé

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager