inria-00116910, version 4
Extraction d'entités dans des collections évolutives
Thierry Despeyroux
1Eduardo Fraschini 1Anne-Marie Vercoustre
1
7ièmes Journées francophones Extraction et Gestion des Connaissances EGC 2007 76300 (2007) 533-538
Abstract: The goal of our work is to use a set of reports and extract named entities, in our case the names of Industrial or Academic partners. Starting with an initial list of entities, we use a first set of documents to identify syntactic patterns that are then validated in a supervised learning phase on a set of annotated documents. The complete collection is then explored. This approach is similar to the ones used in data extraction from semi-structured documents (wrappers) and do not need any linguistic resources neither a large set for training. As our collection of documents would evolve over years , we hope that the performance of the extraction would improve with the increased size of the training set.
- Domain : Computer Science/Document and Text Processing
Computer Science/Information Retrieval - Keywords : Entity extraction – wrapping method – extraction pattern
- Internal note : http://www.cepadues.com/livre_details.asp?l=76300
- Comment : The bibteX file has been replaced with the correct one.
- Available versions : v1 (2006-11-28) v2 (2007-06-19) v3 (2007-07-13) v4 (2007-07-20)
- inria-00116910, version 4
- http://hal.inria.fr/inria-00116910
- oai:hal.inria.fr:inria-00116910
- From: Anne-Marie Vercoustre
- Submitted on: Friday, 20 July 2007 16:46:29
- Updated on: Thursday, 21 April 2011 10:55:52






Associated documents

Export