Extracting linked data from statistic spreadsheets

Abstract : Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economic output etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistic files published by INSEE, the leading French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD populating an instance of this conceptual model. Our method was used to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [9 references]  Display  Hide  Download

https://hal.inria.fr/hal-01583975
Contributor : Ioana Manolescu <>
Submitted on : Friday, September 8, 2017 - 10:52:15 AM
Last modification on : Wednesday, March 27, 2019 - 4:41:28 PM

File

paper-HAL.pdf
Files produced by the author(s)

Identifiers

Citation

Tien Duc Cao, Ioana Manolescu, Xavier Tannier. Extracting linked data from statistic spreadsheets. International Workshop on Semantic Big Data, May 2017, Chicago, United States. pp.1 - 5, ⟨10.1145/3066911.3066914⟩. ⟨hal-01583975⟩

Share

Metrics

Record views

375

Files downloads

72