. .. Jeux-de-données,

. .. Résultats, 1 Expérimentation 1 : Comparaison des trois approches

.. .. Conclusion,

, Expérimentations 3 et 4 : Autour des redescriptions

, Dans un premier temps, nous regardons l'expressivité des redescriptions, en permettant l'utilisation de disjonctions et de négations dans les règles. Dans un second temps, nous recherchons des catégories incompatibles, c'est-à-dire des catégories qui n, Les deux dernières expériences s'intéressent au potentiel des redescriptions

, Expérimentation 3 : Vers des règles plus expressives Dans cette expérimentation, nous nous intéressons à la définition des catégories en utilisant des conjonctions, des disjonctions et des négations. Pour cela, nous utilisons REREMI sur le jeu de données Smartphones selon deux modalités. Première modalité : seules les disjonctions sont autorisées

, Seconde modalité : les disjonctions et les négations sont autorisées

. S'agissant-d'une-démarche-exploratoire, La table 7.8 présente quelques exemples de définitions utilisant des conjonctions, des disjonctions et des négations. Les règles 51 à 55 sont obtenues dans la première modalité, les règles 56 à 60 sont obtenues dans la seconde modalité, nous utilisons des seuils peu restrictifs : le coefficient de Jaccard minimum est fixé à 0.4 et la p-valeur à 0, vol.2, p.27

, Dans près de la moitié des cas, la disjonction est utilisée pour ?.manufacturer{Samsung ? Samsung_Electronics}, comme le montrent les règles 51 à 54. La disjonction permet dans ce cas de considérer deux attributs synonymes. Dans les autres cas, la disjonction sert à énumérer de nombreux attributs qui qualifient des objets différents. Par exemple, la règle 56 montre une définition qui énumère les parties d'un ensemble : la catégorie Android_(operating_system)_de-vices) est constituée de tous les téléphones de marques utilisant l'OS Android (il est rare qu'une marque propose deux OS différents). Dans ce cas, la disjonction permet de représenter une partition, contiennent des disjonctions. Leur utilisation permet de « raffiner » certaines règles. Par exemple, la règle 51 raffine la règle 49 trouvée précédemment

, Dans la deuxième modalité, nous obtenons 76 règles

, Parmi les 38 règles restantes, 31 ont la catégorie à la forme négative et 14 sont des « double négation » : tous les attributs de la partie gauche et de la partie droite sont mis à la forme négative. En effet, jacc(A,B) ? jacc(¬A,¬B) et dans certains cas, le Jaccard est plus élevé lorsque les attributs sont mis à la forme négative. C'est par exemple le cas de la règle 57 : son coefficient Jaccard 20, Cependant, certaines contraintes spécifiées dans le fichier de configuration ne sont pas prises en compte par l'interface de l'algorithme. Ainsi, nous observons des règles contenant plusieurs catégories et/ou ayant un Jaccard inférieur au seuil fixé

, Au contraire, les règles de traduction offrent un faible nombre de définition, avec une très bonne couverture des données. Cependant, les définitions extraites comportent de nombreux attributs. Finalement, les redescriptions sont un compromis en terme de nombre de définitions extraites : leur nombre est bien moins grand que les redescriptions, Il en ressort que les règles d'association permettre d'extraire un très grand nombre de définitions, mais présentent de nombreuses redondances

J. Conférences, Y. Reynaud, A. Toussaint, and . Napoli, Using redescriptions and formal concept analysis for mining definitions in linked data, Formal Concept Analysis -15th International Conference, ICFCA 2019, pp.241-256, 2019.

J. Reynaud, Y. Toussaint, and A. Napoli, Redescription mining for learning definitions and disjointness axioms in linked open data, Graph-Based Representation and Reasoning -24th International Conference on Conceptual Structures, ICCS 2019, pp.175-189, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02170763

J. Reynaud, M. Alam, Y. Toussaint, A. Napoli, ;. Brabant et al., A proposal for classifying the content of the web of data based on FCA and pattern structures, Proc. of the 6th International Workshop "What can FCA do for Artificial Intelligence" ? co-located with International Joint Conference on Artificial Intelligence and European Conference on Artificial Intelligence, pp.21-32, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01667437

J. Reynaud, Y. Toussaint, and A. Napoli, Publications nationales J. Reynaud, Y. Toussaint, and A. Napoli. Trois approches pour classifier les données du web des données, Actes de la Conférence Nationale d'Intelligence Artificielle et Rencontres des Jeunes Chercheurs en Intelligence Artificielle (CNIA+RJCIA 2018), vol.4, pp.94-101, 2016.

J. Reynaud, E. Galbrun, M. Alam, Y. Toussaint, and A. Napoli, Définir les catégories de DBpedia avec des règles d'associations et des redescriptions, 18ème Conférence Extraction et Gestion de Connaissances (EGC 2018), 2018.

A. Nacira, D. Jérôme, and N. Amedeo, Linkex : A Tool for Link Key Discovery Based on Pattern Structures, Supplementary Proceedings of ICFCA 2019 Conference and Workshops, pp.33-38, 2019.

A. Rakesh, I. Tomasz, and S. Arun, Mining Association Rules Between Sets of Items in Large Databases, ACM SIGMOD Rec. T. 22. 2. ACM, pp.207-216, 1993.

A. Rakesh and S. Ramakrishnan, Fast Algorithms for Mining Association Rules in Large Databases, VLDB'94, pp.487-499, 1994.

A. Mehwish, Bridging DBpedia Categories and DL-Concept Definitions Using Formal Concept Analysis, FCA4AI 2015, co-located with the International Joint Conference on Artificial Intelligence (IJCAI 2015), pp.9-16, 2015.

A. El-arby-sidi, Découverte de cardinalité maximale contextuelle dans les bases de connaissances(Mining contextual maximum cardinality in knowledge bases), Actes de la Conférence Nationale d'Intelligence Artificielle et Rencontres des Jeunes Chercheurs en Intelligence Artificielle (CNIA+RJCIA 2018), pp.86-93, 2018.

A. Simon and P. Simon, A Mapping from Conceptual Graphs to Formal Concept Analysis, Conceptual Structures for Discovering Knowledge -19th International Conference on Conceptual Structures, pp.63-76, 2011.

A. Mark-van, A. Gangemi, and S. Guus, RDF/OWL Representation of WordNet. W3C Working Draft, 2006.

A. Manuel and C. Michel, Defining Key Semantics for the RDF Datasets : Experiments and Evaluations, Graph-Based Representation and Reasoning -21st International Conference on Conceptual Structures, pp.65-78, 2014.

A. Manuel and D. Jérôme, Link key candidate extraction with relational concept analysis, Discrete Applied Mathematics, pp.1-19, 2019.

A. Sören, DBpedia : A Nucleus for a Web of Open Data, The Semantic Web. Sous la dir, pp.722-735, 2007.

B. Franz, H. Ian, and U. Sattler, Description Logics, Handbook on Ontologies, pp.21-43, 2009.

B. Marc and M. Bernard, Ordre et classification, vols. 1 and 2". In : Hachette, 1970.

Y. Bastide, Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets, Proc. of the 1st International Conference on Computational Logic, pp.972-986, 2000.
URL : https://hal.archives-ouvertes.fr/hal-00467751

B. Tim, Linked Data -Design Issues, 2009.

B. Tim and H. James, Publishing on the semantic web, Nature 410.6832 (avr. 2001), pp.1476-4687

B. Garrett, Lattice theory. T. 25, 1940.

B. Leo, Classification and Regression Trees, pp.0-534, 1984.

B. Dan and G. Ramanathan, RDF Schema 1.1. W3C Recommendation. W3C, fév, 2014.

B. Sergey, M. Rajeev, and S. Craig, Beyond Market Baskets : Generalizing Association Rules to Correlations, Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. SIGMOD '97, pp.265-276, 1997.

B. Sergey, R. M. , and J. D. Ullman, Dynamic Itemset Counting and Implication Rules for Market Basket Data, SIGMOD Rec, vol.26, pp.255-264, 1997.

C. Claudio and R. Giovanni, Concept Data Analysis : Theory and Applications, 2004.

C. Jeremy and K. Graham, Resource Description Framework (RDF) : Concepts and Abstract Syntax. W3C Recommendation. W3C, fév, 2004.

C. Lo?c, Data Peeler : Contraint-Based Closed Pattern Mining in n-ary Relations, Proceedings of the SIAM International Conference on Data Mining, SDM, pp.37-48, 2008.

C. Hong, Y. Xifeng, and H. Jiawei, Mining Graph Patterns, Frequent Pattern Mining, pp.307-338, 2014.

C. V?ctor and N. Amedeo, A Proposition for Combining Pattern Structures and Relational Concept Analysis, Formal Concept Analysis -12th International Conference, ICFCA 2014, pp.96-111, 2014.

B. A. Davey and H. A. Priestley, Introduction to Lattices and Order, 2002.

D. Mike and S. Guus, OWL Web Ontology Language Reference. W3C Recommendation. W3C, fév, 2004.

D. Felix, Learning description logic knowledge bases from data using methods from formal concept analysis, pp.14-70199, 2011.

E. Leo and R. Ronald, Duality in information retrieval and the hypergeometric distribution, Journal of Documentation, vol.53, pp.488-496, 1997.

F. Nicola, D. Claudia, E. Amato-et-floriana, and B. Luis, Induction of Concepts in Web Ontologies through Terminological Decision Trees, Machine Learning and Knowledge Discovery in Databases. Sous la dir, pp.442-457, 2010.

M. Usama, . Fayyad, P. Gregory, . Shapiro, and S. Padhraic, From Data Mining to Knowledge Discovery : An Overview, Advances in Knowledge Discovery and Data Mining, pp.1-34, 1996.

F. Sébastien, A Proposal for Extending Formal Concept Analysis to Knowledge Graphs, Formal Concept Analysis -13th International Conference, ICFCA 2015, pp.271-286, 2015.

F. Sébastien and R. Olivier, A Logical Generalization of Formal Concept Analysis, Conceptual Structures : Logical, Linguistic, and Computational Issues. Sous la dir. de Bernhard GANTER et Guy W. MINEAU, pp.371-384, 2000.

F. Luciano, Information : A Very Short Introduction, p.152, 2010.

L. Antonio and G. , AMIE : association rule mining under incomplete evidence in ontological knowledge bases, WWW'13, pp.413-422, 2013.

L. Antonio and G. , Fast rule mining in ontological knowledge bases with AMIE+, VLDB Journal, vol.24, pp.707-730, 2015.

G. Esther and K. Angelika, Finding relational redescriptions, Machine Learning, vol.96, pp.225-248, 2014.

G. Esther and M. Pauli, Siren : an interactive tool for mining and visualizing geospatial redescriptions, The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pp.1544-1547, 2012.

G. Esther and M. Pauli, From Black and White to Full Color : Extending Redescription Mining Outside the Boolean World, Statistical Analysis and Data Mining, vol.5, pp.284-303, 2012.

G. Esther and M. Pauli, Redescription Mining. Springer Briefs in Computer Science, 2017.

G. Aldo, Modelling Ontology Evaluation and Validation, The Semantic Web : Research and Applications, 3rd European Semantic Web Conference, pp.140-154, 2006.

S. O. Bernhard-ganter and . Kuznetsov, Pattern Structures and Their Projections, Conceptual Structures : Broadening the Base, 9th International Conference on Conceptual Structures, pp.129-142, 2001.

G. Bernhard and W. Rudolf, Formal Concept Analysis -Mathematical Foundations, 1999.

G. Asunción, F. Mariano, and C. Óscar, Ontological Engineering : With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web, Advanced Information and Knowledge Processing, pp.978-979, 2004.

T. R. Gruber, A translation approach to portable ontology specifications, Knowledge Acquisition, vol.5, pp.1042-8143, 1993.

G. Peter, A tutorial introduction to the minimum description length principle, 2004.

J. Vincent and D. , Familles minimales d'implications informatives résultant d'un tableau de données binaires, Mathématiques et Sciences humaines, vol.95, pp.5-18, 1986.

F. Guillet, J. Howard, and . Hamilton, Quality Measures in Data Mining (Studies in Computational Intelligence)

H. Steven and S. Andy, SPARQL 1.1 Query Language. W3C Recommendation. W3C, mar, 2013.

H. Ian, OWL : A Description Logic Based Ontology Language, Principles and Practice of Constraint Programming -CP 2005, 11th International Conference, pp.5-8, 2005.

H. Ian, OWL 2 Web Ontology Language Profiles (Second Edition). W3C Recommendation. W3C, déc, 2012.

M. Shahriar and . Hossain, Connecting the Dots between PubMed Abstracts, PLOS ONE, vol.7, pp.1-23, 2012.

A. Hotho, BibSonomy : a social bookmark and publication sharing system, Proceedings of the First Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures, 2006.

. Ibm and . Ibm, Knowledge Center -Lift in an association rule, 2018.

. Mar, , 2018.

J. Robert, TRIAS -An Algorithm for Mining Iceberg Tri-Lattices, Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), pp.907-911, 2006.

K. Daniel, La représentation des connaissances, Hermes Paris, 1997.

K. Mehdi, Mining gene expression data with pattern structures in formal concept analysis, Information Sciences, vol.181, issue.10, pp.1989-2001, 2011.

M. Kaytoue, The Coron System, 8th International Conference on Formal Concept Analsis (ICFCA) -Supplementary Proceedings. Sous la dir, pp.55-58
URL : https://hal.archives-ouvertes.fr/inria-00600232

K. Mika, Finding Interesting Rules from Large Sets of Discovered Association Rules, CIKM'94, pp.401-407, 1994.

K. Marja-riitta and E. Miller, W3C Semantic Web Activity, Semantic Web Kick-Off in Finland -Vision, Technologies, Research, and Applications. Sous la dir. d'Eero HYVÖNEN. Helsinki Institute for Information Technology, pp.27-44, 2002.

K. Jens, Concept Lattices of RDF Graphs, Proceedings of the International Workshop on Formal Concept Analysis and Applications, FCA&A 2015, co-located with 13th International Conference on Formal Concept Analysis (ICFCA 2015), pp.81-91, 2015.

K. Deept, Algorithms for Storytelling, IEEE Trans. Knowl. Data Eng, vol.20, pp.736-751, 2008.

J. Lajus, M. Fabian, and . Suchanek, Are All People Married ? Determining Obligatory Attributes in Knowledge Bases, International Conference WWW, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01699857

. Matthijs-van-leeuwen and G. Esther, Association Discovery in Two-View Data, TKDE 27, vol.12, pp.3190-3202, 2015.

L. Fritz and W. Rudolf, A Triadic Approach to Formal Concept Analysis, Conceptual Structures : Applications, Implementation and Theory, Third International Conference on Conceptual Structures, ICCS '95, pp.32-43, 1995.

L. Michael, Implications partielles dans un contexte, Mathématiques, informatique et sciences humaines, vol.29, pp.35-55, 1991.

M. John, Circumscription -A Form of Non-Monotonic Reasoning, Artif. Intell, vol.13, pp.90011-90020, 1980.

M. Matej, D. Sa?o, and ?. Tomislav, Extending Redescription Mining to Multiple Views, Discovery Science -21st International Conference, pp.292-307, 2018.

A. George and . Miller, WordNet : An electronic lexical database, 1998.

N. Siegfried, Mining Structured Data, 2006.

P. Gregory and . Shapiro, Discovery, Analysis, and Presentation of Strong Rules, Knowledge Discovery in Databases, 1991.

R. Naren and K. Deept, Turning CARTwheels : an Alternating Algorithm for Mining Redescriptions, KDD'04, pp.266-275, 2004.

R. Naren, M. Javeed, and Z. , Redescription Mining and Applications in Bioinformatics

C. J. Van-rijsbergen, Information Retrieval. 2nd, p.408709294, 1979.

R. Jorma, Modeling by shortest data description, Automatica 14, vol.5, pp.90005-90010, 1978.

R. Jorma, Minimum description length, Scholarpedia 3, vol.8, p.6727, 2008.

G. Rizzo and C. D. Amato, Terminological Cluster Trees for Disjointness Axiom Discovery, Proceedings of ESWC, pp.184-201, 2017.

G. Rizzo and C. D. Amato, Tree-based models for inductive classification on the Web Of Data, J. Web Semant, vol.45, pp.1-22, 2017.

R. Giuseppe and F. Nicola, Approximate classification with web ontologies through evidential terminological trees and forests, Int. J. Approx. Reasoning, vol.92, pp.340-362, 2018.

A. Mohamed and R. Hacene, Relational Concept Analysis : Mining Concept Lattices From Multi-Relational Data, Annals of Mathematics and Artificial Intelligence, vol.67, issue.1, pp.81-108, 2013.

S. Guus and A. Hans, Knowledge Engineering and Management : The Commonkads Methodology, 1999.

S. Baris, A Survey on how Description Logic Ontologies Benefit from FCA, Proceedings of the 7th International Conference on Concept Lattices and Their Applications, pp.2-21, 2010.

J. F. Sowa, Conceptual Graphs, Handbook of Knowledge Representation, pp.3005-3007, 2008.

S. Fabian, M. Gjergji, . Kasneci, M. Gerhard, and . Weikum, Yago : A Core of Semantic KnowledgeUnifying WordNet and Wikipedia, 16th international conference on World Wide Web. Proceedings of the 16th international conference on World Wide Web, pp.697-697, 2007.

S. Laszlo, Méthodes symboliques de fouille de données avec la plate-forme Coron, 2006.

T. Pang-ning, K. Vipin, and S. Jaideep, Selecting the right interestingness measure for association patterns, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.32-41, 2002.

T. Pang-ning, K. Vipin, and S. Jaideep, Selecting the right interestingness measure for association patterns, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.32-41, 2002.

T. Élodie, Do Competency Questions for Alignment Help Fostering Complex Correspondences ?, In : Proceedings of the EKAW Doctoral Consortium 2018 co-located with the 21st International Conference on Knowledge Engineering and Knowledge Management, 2018.

V. Johanna, F. Daniel, and S. Heiner, Automatic acquisition of class disjointness, J. Web Sem, vol.35, pp.124-139, 2015.

J. Völker and N. Mathias, Statistical Schema Induction, The Semantic Web : Research and Applications -8th Extended Semantic Web Conference, pp.124-138, 2011.

V. George, Polyadic Concept Analysis, Order 19, vol.3, pp.295-304, 2002.

W. Rudolf, Why can concept lattices support knowledge discovery in databases ?, In : J. Exp. Theor. Artif. Intell, vol.14, pp.81-92, 2002.

Y. Xifeng, H. Jiawei, and A. Ramin, CloSpan : Mining Closed Sequential Patterns in Large Datasets, Proceedings of the Third SIAM International Conference on Data Mining, pp.166-177, 2003.

M. Javeed and Z. , Scalable algorithms for association mining, pp.372-390, 2000.

M. Javeed, Z. , and C. Hsiao, CHARM : An Efficient Algorithm for Closed Itemset Mining, Proceedings of the 2002 SIAM International Conference on Data Mining. Proceedings. Society for Industrial et Applied Mathematics, avr, pp.457-473, 2002.

M. Javeed, Z. Et-naren, and R. , Reasoning about sets using redescription mining, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.364-373, 2005.

Z. Amrapali and K. Dimitris, User-driven quality evaluation of DBpedia, I-SEMANTICS 2013 -9th International Conference on Semantic Systems, ISEM '13, pp.97-104, 2013.

Z. Amrapali and R. Anisa, Quality assessment for Linked Data : A Survey, Semantic Web, vol.7, pp.63-93, 2016.

, Le web des données se présente comme un très grand graphe constitué de bases de triplets RDF connectées entre elles. Un triplet RDF, dénoté (sujet, prédicat, objet), représente une relation (le prédicat) qui existe entre deux ressources (le sujet et l'objet). Les ressources peuvent appartenir à une ou plusieurs classes, nous nous intéressons au web des données et aux "connaissances" que potentiellement il renferme

. Ainsi, La plupart du temps ces bases de connaissances sont construites de manière collaborative par des utilisateurs. C'est notamment le cas de DBpedia, une base de connaissances centrale dans le web des données, qui encode le contenu de Wikipédia au format RDF. DBpedia est construite à partir de deux types de données de Wikipédia : d'une part, des données (semi-)structurées telles que les infoboxes et d'autre part les catégories, ces bases de triplets RDF peuvent être vues comme des bases de connaissances interconnectées

L. Cependant and . Dans-dbpedia, est-à-dire la raison pour laquelle un agent humain a regroupé des ressources, n'est pas explicite. De fait, en considérant une classe, un agent logiciel a accès aux ressources qui y sont regroupées -il dispose de la définition dite en extension -mais il n'a généralement pas accès aux "motifs" de ce regroupement -il ne dispose pas de la définition dite en intension

T. Dans-cette, Plus précisément, nous cherchons à associer une intension à une classe donnée en extension. La paire (extension, intension) produite va fournir la définition recherchée et va autoriser la mise en oeuvre d'un raisonnement par classification pour un agent logiciel. Cela peut s'exprimer en termes de conditions nécessaires et suffisantes : si x appartient à la classe C, alors x a la propriété P (condition nécessaire), et si x a la propriété P, alors il appartient à la classe C (condition suffisante). Deux méthodes de fouille de données complémentaires nous permettent de matérialiser la découverte de définitions, nous cherchons à associer une définition à une catégorie en l'assimilant à une classe de ressources

M. Dans-le, Ensuite, nous proposons une adaptation de chacune des méthodes pour finaliser la tâche de découverte de définitions. Puis nous détaillons un ensemble d'expérimentations menées sur DBpedia, où nous comparons qualitativement et quantitativement les deux approches. Enfin les définitions découvertes peuvent potentiellement être ajoutées à DBpedia pour améliorer sa qualité, nous présentons d'abord un état de l'art sur les règles d'association et les redescriptions

. Mots-clés, Découverte de connaissances ? Analyse de Concepts Formels ? Fouille de Redescriptions ? Fouille de Règles ? Construction de Définitions ? Classification dans le Web des Données