INTERNET EVOLUTION AND PROGRESS IN FULL AUTOMATIC FRENCH LANGUAGE MODELLING - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2001

INTERNET EVOLUTION AND PROGRESS IN FULL AUTOMATIC FRENCH LANGUAGE MODELLING

Résumé

The World Wide Web is the greatest information space unseen until now, distributed all over the world, in many languages, on many various topics. In a first part of this paper, we study the evolution of a French subset of this space during the last 3 years. During this time, the size of automatically extracted text for language modelling was multiplied by 6.5. Moreover, the French coverage has grown from 140,000 to 200,000 lexical forms. So, we show that we can get more and more reliable data in order to train our trigrams models. At last, recognition experiments, made on a French “state of the art” evaluation set, show that word accuracy increase from 51% up to 62.30% using two different models automatically calculated on Web corpora. The first corpus was gathered at the beginning of 1999 and the last one at the end of 2000.
Fichier principal
Vignette du fichier
Vaufreydaz01b.pdf (55.71 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00326163 , version 1 (02-10-2008)

Identifiants

  • HAL Id : inria-00326163 , version 1

Citer

Dominique Vaufreydaz, Mathias Géry. INTERNET EVOLUTION AND PROGRESS IN FULL AUTOMATIC FRENCH LANGUAGE MODELLING. IEEE Workshop ASRU'01 (Automatic Speech Recognition and Understanding), IEEE, Dec 2001, Madonna di Campiglio, Italy. pp.CD-ROM. ⟨inria-00326163⟩
163 Consultations
122 Téléchargements

Partager

Gmail Facebook X LinkedIn More