HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Book sections

Mining XML Documents

Abstract : XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections.
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download

Contributor : Anne-Marie Vercoustre Connect in order to contact the contributor
Submitted on : Monday, November 19, 2007 - 3:55:34 PM
Last modification on : Wednesday, April 6, 2022 - 3:48:20 PM
Long-term archiving on: : Monday, April 12, 2010 - 2:42:34 AM


Files produced by the author(s)



Laurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie-Christine Rousset, Alexandre Termier, et al.. Mining XML Documents. P. Poncelet, F. Masseglia, M. Teisseire. Data Mining Patterns: New Methods and Applications, Information Science Reference, pp.198-219, 2007, ⟨10.4018/978-1-59904-162-9.ch009⟩. ⟨inria-00188899⟩



Record views


Files downloads