Skip to Main content Skip to Navigation
Theses

Découverte de définitions dans le web des données

Justine Reynaud 1, 2
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this thesis, we are interested in the web of data and knowledge units that can be possibly discovered inside. The web of data can be considered as a very large graph consisting of connected RDF triple databases. An RDF triple, denoted as (subject, predicate, object), represents a relation (i.e. the predicate) existing between two resources (i.e. the subject and the object). Resources can belong to one or more classes, where a class aggregates resources sharing common characteristics. Thus, these RDF triple databases can be seen as interconnected knowledge bases. Most of the time, these knowledge bases are collaboratively built thanks to human users. This is particularly the case of DBpedia, a central knowledge base within the web of data, which encodes Wikipedia content in RDF format. DBpedia is built from two types of Wikipedia data: on the one hand, (semi-)structured data such as infoboxes, and, on the other hand, categories, which are thematic clusters of manually generated pages. However, the semantics of categories in DBpedia, that is, the reason a human agent has bundled resources, is rarely made explicit. In fact, considering a class, a software agent has access to the resources that are regrouped together, i.e. the class extension, but it generally does not have access to the ``reasons'' underlying such a cluster, i.e. it does not have the class intension. Considering a category as a class of resources, we aim at discovering an intensional description of the category. More precisely, given a class extension, we are searching for the related intension. The pair (extension, intension) which is produced provides the final definition and the implementation of classification-based reasoning for software agents. This can be expressed in terms of necessary and sufficient conditions: if x belongs to the class C, then x has the property P (necessary condition), and if x has the property P, then it belongs to the class C (sufficient condition). Two complementary data mining methods allow us to materialize the discovery of definitions, the search for association rules and the search for redescriptions. In this thesis, we first present a state of the art about association rules and redescriptions. Next, we propose an adaptation of each data mining method for the task of definition discovery. Then we detail a set of experiments applied to DBpedia, and we qualitatively and quantitatively compare the two approaches. Finally, we discuss how discovered definitions can be added to DBpedia to improve its quality in terms of consistency and completeness.
Document type :
Theses
Complete list of metadata

Cited literature [122 references]  Display  Hide  Download

https://hal.inria.fr/tel-02426421
Contributor : Justine Reynaud <>
Submitted on : Thursday, January 2, 2020 - 1:24:00 PM
Last modification on : Friday, June 19, 2020 - 10:28:08 AM
Long-term archiving on: : Monday, April 6, 2020 - 7:28:07 PM

File

pdf2star-1576506202-manuscrit....
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02426421, version 1

Citation

Justine Reynaud. Découverte de définitions dans le web des données. Informatique [cs]. Université de Lorraine, 2019. Français. ⟨NNT : 2019LORR0160⟩. ⟨tel-02426421⟩

Share

Metrics

Record views

269

Files downloads

484