Extracting linked data from statistic spreadsheets

Abstract : Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economic output etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistic files published by INSEE, the leading French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD populating an instance of this conceptual model. Our method was used to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.
Type de document :
Communication dans un congrès
International Workshop on Semantic Big Data, May 2017, Chicago, United States. pp.1 - 5, 2017, International Workshop on Semantic Big Data. 〈https://www.ifis.uni-luebeck.de/~groppe/sbd/〉. 〈10.1145/3066911.3066914〉
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01583975
Contributeur : Ioana Manolescu <>
Soumis le : vendredi 8 septembre 2017 - 10:52:15
Dernière modification le : jeudi 12 avril 2018 - 01:50:51

Fichier

paper-HAL.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Tien Duc Cao, Ioana Manolescu, Xavier Tannier. Extracting linked data from statistic spreadsheets. International Workshop on Semantic Big Data, May 2017, Chicago, United States. pp.1 - 5, 2017, International Workshop on Semantic Big Data. 〈https://www.ifis.uni-luebeck.de/~groppe/sbd/〉. 〈10.1145/3066911.3066914〉. 〈hal-01583975〉

Partager

Métriques

Consultations de la notice

189

Téléchargements de fichiers

29