Deep Neural Networks for Web Page Information Extraction

Abstract : Web wrappers are systems for extracting structured information from web pages. Currently, wrappers need to be adapted to a particular website template before they can start the extraction process. In this work we present a new method, which uses convolutional neural networks to learn a wrapper that can extract information from previously unseen templates. Therefore, this wrapper does not need any site-specific initialization and is able to extract information from a single web page. We also propose a method for spatial text encoding, which allows us to encode visual and textual content of a web page into a single neural net. The first experiments with product information extraction showed very promising results and suggest that this approach can lead to a general site-independent web wrapper.
Type de document :
Communication dans un congrès
Lazaros Iliadis; Ilias Maglogiannis. 12th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2016, Thessaloniki, Greece. IFIP Advances in Information and Communication Technology, AICT-475, pp.154-163, 2016, Artificial Intelligence Applications and Innovations. 〈10.1007/978-3-319-44944-9_14〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01557648
Contributeur : Hal Ifip <>
Soumis le : jeudi 6 juillet 2017 - 13:55:42
Dernière modification le : vendredi 1 décembre 2017 - 01:16:25
Document(s) archivé(s) le : mercredi 24 janvier 2018 - 01:11:40

Fichier

 Accès restreint
Fichier visible le : 2019-01-01

Connectez-vous pour demander l'accès au fichier

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Tomas Gogar, Ondrej Hubacek, Jan Sedivy. Deep Neural Networks for Web Page Information Extraction. Lazaros Iliadis; Ilias Maglogiannis. 12th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2016, Thessaloniki, Greece. IFIP Advances in Information and Communication Technology, AICT-475, pp.154-163, 2016, Artificial Intelligence Applications and Innovations. 〈10.1007/978-3-319-44944-9_14〉. 〈hal-01557648〉

Partager

Métriques

Consultations de la notice

503