Searching for Truth in a Database of Statistics

Abstract : The proliferation of falsehood and misinformation, in particular through the Web, has lead to increasing energy being invested into journalistic fact-checking. Fact-checking journalists typically check the accuracy of a claim against some trusted data source. Statistic databases such as those compiled by state agencies are often used as trusted data sources, as they contain valuable, high-quality information. However, their usability is limited when they are shared in a format such as HTML or spreadsheets: this makes it hard to find the most relevant dataset for checking a specific claim, or to quickly extract from a dataset the best answer to a given query. We present a novel algorithm enabling the exploitation of such statistic tables, by (i) identifying the statistic datasets most relevant for a given fact-checking query, and (ii) extracting from each dataset the best specific (precise) query answer it may contain. We have implemented our approach and experimented on the complete corpus of statistics obtained from INSEE, the French national statistic institute. Our experiments and comparisons demonstrate the effectiveness of our proposed method.
Document type :
Conference papers
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://hal.inria.fr/hal-01745768
Contributor : Tien-Duc Cao <>
Submitted on : Wednesday, March 28, 2018 - 3:16:14 PM
Last modification on : Thursday, June 13, 2019 - 11:34:02 AM
Long-term archiving on : Thursday, September 13, 2018 - 9:55:39 AM

File

paper-hal.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01745768, version 1

Citation

Tien-Duc Cao, Ioana Manolescu, Xavier Tannier. Searching for Truth in a Database of Statistics. WebDB 2018 - 21st International Workshop on the Web and Databases, Jun 2018, Houston, United States. pp.1-6. ⟨hal-01745768⟩

Share

Metrics

Record views

704

Files downloads

266