index - Archive ouverte HAL Access content directly



Modeling Tree Structures, Machine Learning, and Information Extraction


During the last decade, the World Wide Web has evolved to the most important public data store on world. The recent data formats used on the Web are heterogeneous and still evolving. The web community is highly interested in adequate information representing so that information on the Web can be accessed and extracted more easily. A major challenge in that perspective is adaptive information extraction that can exploit the tree structure of web documents. Tree structure is available in the recent Web formats, HTML and XML, to encompassed textual information. In this project, we want to integrate tree structures and emerging machine learning techniques into adaptive information extraction systems.


Mostrare is a project of the research center INRIA Lille - Nord Europe and a group of the Lille's computer science department LIFL. The members of Mostrare are employed by the Lille University 1, the Lille University 3, and INRIA.

Mostrare team 2009
The picture above was taken on January 9th, 2009.