The WWW as a Resource for Lexicography - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Chapitre D'ouvrage Année : 2002

The WWW as a Resource for Lexicography

Gregory Grefenstette

Résumé

Until the appearance of the Brown Corpus with its 1 million words in the 1960s and then, on a larger scale, the British National Corpus (the BNC) with its 100 million words, the lexicographer had to rely pretty much on his or her intuition (and amassed scraps of papers) to describe how words were used. Since the task of a lexicographer was to summarize the senses and usages of a word, that person was called upon to be very well read, with a good memory, and a great sensitivity to nuance. These qualities are still and always will be needed when one must condense the description of a great variety of phenomena into a fixed amount of space. But what if this last constraint, a fixed amount of space, disappears? One can then imagine fuller descriptions of how words are used. Taking this imaginative step, the FrameNet project has begun collecting new, fuller descriptions into a new type of lexicographical resource in which '[e] ach entry will in principle provide an exhaustive account of the semantic and syntactic combinatorial properties of one "lexical unit" (i.e., one word in one of its uses).' (Fillmore & Atkins 1998) This ambition to provide an exhaustive accounting of these properties implies access to a large number of examples of words in use. Though the Brown Corpus and the British National Corpus can provide a certain number of these, the World Wide Web (WWW) presents a vastly larger collection of examples of language use. The WWW is a new resource for lexicographers in their task of describing word patterns and their meanings. In this chapter, we look at the WWW as a corpus, and see how this will change how lexicographers model word meaning.
Fichier principal
Vignette du fichier
Gregory Grefenstette - The WWW as a Resource for Lexicography.pdf (2.09 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01081131 , version 1 (07-11-2014)

Identifiants

  • HAL Id : hal-01081131 , version 1

Citer

Gregory Grefenstette. The WWW as a Resource for Lexicography. Marie-Hélène Corréard. Lexicography and Natural Language Processing: A Festschrift in Honour of B.T.S. Atkins, Euralex, 2002, 2-9518583-0-2. ⟨hal-01081131⟩

Collections

INRIA INRIA2
107 Consultations
64 Téléchargements

Partager

Gmail Facebook X LinkedIn More