Extracting Linked Data from statistic spreadsheets

Abstract : Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economic output etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistic files published by INSEE, the leading French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD populating an instance of this conceptual model. Our method was used to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.
Type de document :
[Research Report] Inria Saclay Ile de France. 2017
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

Contributeur : Tien-Duc Cao <>
Soumis le : mardi 28 mars 2017 - 10:30:24
Dernière modification le : mercredi 13 février 2019 - 01:26:59
Document(s) archivé(s) le : jeudi 29 juin 2017 - 16:49:08


paper (1).pdf
Fichiers produits par l'(les) auteur(s)



Tien Duc Cao, Ioana Manolescu, Xavier Tannier. Extracting Linked Data from statistic spreadsheets. [Research Report] Inria Saclay Ile de France. 2017. 〈hal-01496700〉



Consultations de la notice


Téléchargements de fichiers