Extraction d'entités dans des collections évolutives
Résumé
The goal of our work is to use a set of reports and extract named entities, in our case the names of partners. Starting with an initial list of entities, we use a first set of documents to identify syntactic patterns that are then validated in a supervised learning phase on a set of annotated documents to perform a performance test. The complete collection is then explored. This approach comes from the one that is used in data extraction for semi-structured documents (wrappers) and do not need any linguistic ressources neither a large set for training. As our collection of documents evoluate, we hope that the performance of the extraction becomes better year after year.
Origine : Fichiers produits par l'(les) auteur(s)