Skip to Main content Skip to Navigation
Book sections

Estimating the Number of Concepts

Abstract : Most Natural Language Processing systems have been built around the idea of a word being something found between white spaces and punctuation. This is a normal and efficient way to proceed. Tasks such as Word Sense Disambigua-tion, Machine Translation, or even indexing rarely go beyond the single word. Language models used in NLP applications are built on the word, with a few multiword expressions taken as exceptions. But future NLP systems will neces-sarily venture out into the uncharted areas of multiword expressions. The di-mensions and the topology of multiword concepts are unknown: Are there hun-dreds of thousands or tens of millions? Which words participate in multiword concepts and which do not? As the corpus grows, will their number keep on increasing? In this paper, I estimate the number of multiword concepts that are used in English, systematically probing the Web as our corpus.
Document type :
Book sections
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/hal-01081033
Contributor : Gregory Grefenstette <>
Submitted on : Thursday, November 6, 2014 - 5:11:31 PM
Last modification on : Thursday, February 9, 2017 - 3:47:09 PM
Long-term archiving on: : Saturday, February 7, 2015 - 11:17:03 AM

File

8._Grefenstette-libre (1).pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01081033, version 1

Collections

Citation

Gregory Grefenstette. Estimating the Number of Concepts. A Way with Words: Recent Advances in Lexical Theory and Analysis: A Festschrift for Patrick Hanks, Menha Publishers, 2010, 978-9970-10-101-6. ⟨hal-01081033⟩

Share

Metrics

Record views

87

Files downloads

488