Wrapping Web Pages into XML Documents: A Practical Experience and Comparison of Two Tools

Abstract : he notion of wrapping a web server to produce XML documents from unstructed web pages is driven by the need to produce structured data that can be used by a variety of applications. The web contains vast amounts of information that cannot be used by most applications as it targets a human audience. A solution to this is to automate the browsing process and convert the unstructured extracted information into a more structured format such as XML. This is called wrapping. We have used two different tools to wrap several tourist sites into XML The tools we have used are Norfolk, a system developed by the CSIRO TED group and W4F, initially developed at the University of Pennsylvania and now a commercial product. This report describes our practical experience with the tools and compares them. The comparison highlights features required by a wrapper system to support real applications.
Type de document :
Communication dans un congrès
Allan Ellis. The Eighth Australian World Wide Web Conference, Jul 2002, Sunshine Coast, Queensland, Australia, 2002
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00092248
Contributeur : Anne-Marie Vercoustre <>
Soumis le : vendredi 8 septembre 2006 - 15:44:27
Dernière modification le : jeudi 11 janvier 2018 - 17:22:01
Document(s) archivé(s) le : lundi 5 avril 2010 - 23:35:57

Identifiants

  • HAL Id : inria-00092248, version 1

Citation

Sabine Jabbour, Anne-Marie Vercoustre. Wrapping Web Pages into XML Documents: A Practical Experience and Comparison of Two Tools. Allan Ellis. The Eighth Australian World Wide Web Conference, Jul 2002, Sunshine Coast, Queensland, Australia, 2002. 〈inria-00092248〉

Partager

Métriques

Consultations de la notice

96

Téléchargements de fichiers

134