INTERNET EVOLUTION AND PROGRESS IN FULL AUTOMATIC FRENCH LANGUAGE MODELLING

Abstract : The World Wide Web is the greatest information space unseen until now, distributed all over the world, in many languages, on many various topics. In a first part of this paper, we study the evolution of a French subset of this space during the last 3 years. During this time, the size of automatically extracted text for language modelling was multiplied by 6.5. Moreover, the French coverage has grown from 140,000 to 200,000 lexical forms. So, we show that we can get more and more reliable data in order to train our trigrams models. At last, recognition experiments, made on a French “state of the art” evaluation set, show that word accuracy increase from 51% up to 62.30% using two different models automatically calculated on Web corpora. The first corpus was gathered at the beginning of 1999 and the last one at the end of 2000.
Type de document :
Communication dans un congrès
IEEE Workshop ASRU'01 (Automatic Speech Recognition and Understanding), Dec 2001, Madonna di Campiglio, Italy. pp.CD-ROM, 2001
Liste complète des métadonnées

Littérature citée [5 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00326163
Contributeur : Dominique Vaufreydaz <>
Soumis le : jeudi 2 octobre 2008 - 09:10:44
Dernière modification le : jeudi 11 janvier 2018 - 06:14:32
Document(s) archivé(s) le : jeudi 3 juin 2010 - 20:02:29

Fichier

Vaufreydaz01b.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00326163, version 1

Collections

IMAG | UGA

Citation

Dominique Vaufreydaz, Mathias Géry. INTERNET EVOLUTION AND PROGRESS IN FULL AUTOMATIC FRENCH LANGUAGE MODELLING. IEEE Workshop ASRU'01 (Automatic Speech Recognition and Understanding), Dec 2001, Madonna di Campiglio, Italy. pp.CD-ROM, 2001. 〈inria-00326163〉

Partager

Métriques

Consultations de la notice

78

Téléchargements de fichiers

71