M. Abolhassani, N. Fuhr, and N. Govert, Information Extraction and Automatic Markup for XML Documents, pp.159-174, 2003.
DOI : 10.1007/978-3-540-45194-5_11

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.83.465

T. Berners­lee, J. Hendler, and O. Lassila, The semantic web, may, Scientific American, vol.45, p.10, 2001.

J. Clark and M. Murata, RELAX NG specification, ACM Symposium on Document Engineering, 2001.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society, B, issue.39, p.138, 1977.

O. Etzioni, The World-Wide Web: quagmire or gold mine?, Communications of the ACM, vol.39, issue.11, p.6568, 1996.
DOI : 10.1145/240455.240473

S. Eyheramendy, D. Lewis, and D. Madigan, On the naive bayes model for text categorization. the 9th, International Workshop on Artificial Intelligence and Statistics, 2003.

D. Fallside and P. Walmsley, Xml schema part 0: Primer, second edition, W3C Recommendation, 2004.

P. Fankhauser and Y. Xu, MarkItUp! An incremental approach to document structure recognition, Conference on Electronic Publishing, 1994.

E. Gaussier, Contributions l'accés à l'information documentaire, 2005.

E. Gaussier, C. Goutte, K. Popat, and F. Chen, A Hierarchical Model for Clustering and Categorising Documents, 24th European Colloquium on Information Retrieval Research ( ECIR­02), (2291), 2002.
DOI : 10.1007/3-540-45886-7_16

R. Kosala and H. Blockeel, Microformats: the next (small) thing on the semantic web?, IEEE Internet Computing, vol.10, issue.1, p.6875, 2006.

L. Ma, J. Shepherd, and A. Nguyen, Document classification via structure synopses, ADC, pp.59-65, 2003.

V. Quint and I. Vatton, Techniques for authoring complex XML documents, Proceedings of the 2004 ACM symposium on Document engineering , DocEng '04, p.115123, 2004.
DOI : 10.1145/1030397.1030422

URL : https://hal.archives-ouvertes.fr/inria-00423365

M. Sifer, Y. Peres, and Y. Maarek, Browsing and Editing XML Schema Documents with an Interactive Editor, Proceedings of DNIS 2003, p.97111, 2003.
DOI : 10.1007/978-3-540-39845-5_9

S. Soderland, Learning information extraction rules for semi­structured and free text, Machine Learning, vol.34, issue.1/3, pp.233-272, 1999.
DOI : 10.1023/A:1007562322031

G. Stumme, A. Hotho, and B. Berendt, Semantic web mining. state of the art and future directions, Journal of Web Semantics, vol.4, pp.1-37, 2006.