The WWW as a Resource for Lexicography

Abstract : Until the appearance of the Brown Corpus with its 1 million words in the 1960s and then, on a larger scale, the British National Corpus (the BNC) with its 100 million words, the lexicographer had to rely pretty much on his or her intuition (and amassed scraps of papers) to describe how words were used. Since the task of a lexicographer was to summarize the senses and usages of a word, that person was called upon to be very well read, with a good memory, and a great sensitivity to nuance. These qualities are still and always will be needed when one must condense the description of a great variety of phenomena into a fixed amount of space. But what if this last constraint, a fixed amount of space, disappears? One can then imagine fuller descriptions of how words are used. Taking this imaginative step, the FrameNet project has begun collecting new, fuller descriptions into a new type of lexicographical resource in which '[e] ach entry will in principle provide an exhaustive account of the semantic and syntactic combinatorial properties of one "lexical unit" (i.e., one word in one of its uses).' (Fillmore & Atkins 1998) This ambition to provide an exhaustive accounting of these properties implies access to a large number of examples of words in use. Though the Brown Corpus and the British National Corpus can provide a certain number of these, the World Wide Web (WWW) presents a vastly larger collection of examples of language use. The WWW is a new resource for lexicographers in their task of describing word patterns and their meanings. In this chapter, we look at the WWW as a corpus, and see how this will change how lexicographers model word meaning.
Type de document :
Chapitre d'ouvrage
Marie-Hélène Corréard. Lexicography and Natural Language Processing: A Festschrift in Honour of B.T.S. Atkins, Euralex, 2002, 2-9518583-0-2
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01081131
Contributeur : Gregory Grefenstette <>
Soumis le : vendredi 7 novembre 2014 - 09:59:55
Dernière modification le : jeudi 9 février 2017 - 15:47:17
Document(s) archivé(s) le : dimanche 8 février 2015 - 10:15:36

Fichier

Gregory Grefenstette - The WWW...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01081131, version 1

Collections

Citation

Gregory Grefenstette. The WWW as a Resource for Lexicography. Marie-Hélène Corréard. Lexicography and Natural Language Processing: A Festschrift in Honour of B.T.S. Atkins, Euralex, 2002, 2-9518583-0-2. 〈hal-01081131〉

Partager

Métriques

Consultations de la notice

121

Téléchargements de fichiers

90