Extracting Linked Data from statistic spreadsheets - Archive ouverte HAL Access content directly
Reports (Research Report) Year : 2017

Extracting Linked Data from statistic spreadsheets

(1, 2, 3) , (1, 2, 3) , (4, 3, 5)
1
2
3
4
5

Abstract

Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economic output etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistic files published by INSEE, the leading French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD populating an instance of this conceptual model. Our method was used to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.
Fichier principal
Vignette du fichier
paper (1).pdf (153.66 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01496700 , version 1 (28-03-2017)

Identifiers

Cite

Tien Duc Cao, Ioana Manolescu, Xavier Tannier. Extracting Linked Data from statistic spreadsheets. [Research Report] Inria Saclay Ile de France. 2017. ⟨hal-01496700⟩
432 View
228 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More