D. Th, Practical Semantic Analysis of Web Sites and Documents, Proceedings of the 13 th World Wide Web Conference, pp.685-693, 2004.

D. Guillaume and F. Murtagh, Clustering of XML documents, Computer Physics Communications, vol.127, issue.2-3, pp.215-227, 2000.
DOI : 10.1016/S0010-4655(99)00511-1

L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, vol.78, issue.1, pp.193-218, 1985.
DOI : 10.1007/BF01908075

B. Larsen and C. Aone, Fast and effective text mining using linear-time document clustering, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '99, pp.16-22, 1999.
DOI : 10.1145/312129.312186

H. Schmid, Probabilistic Part-of-Speech Tagging Using Decision Trees, Proc. of the International Conference on New Methods in Language Processing, pp.44-49, 1994.

J. Yi and N. Sundaresan, A classifier for semi-structured documents, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '00, pp.340-344, 2000.
DOI : 10.1145/347090.347164