Semantic Knowledge Bases from Web Sources

Fabian Suchanek; Martin Theobald; Gerhard Weikum; Hady Lauw; Ralf Schenkel

Communication Dans Un Congrès Année : 2011

Semantic Knowledge Bases from Web Sources

(1) , (2) , (2) , (3) , (4)

1
2
3
4

Fabian Suchanek

Fonction : Auteur
PersonId : 12540
IdHAL : fabian-suchanek
ORCID : 0000-0001-7189-2796
IdRef : 203477707

Distributed and heterogeneous data and knowledge

Martin Theobald

Fonction : Auteur

Max-Planck-Institut für Informatik

Gerhard Weikum

Fonction : Auteur

Max-Planck-Institut für Informatik

Hady Lauw

Fonction : Auteur

Institute for Infocomm Research - I²R [Singapore]

Ralf Schenkel

Fonction : Auteur

Saarland University [Saarbrücken]

Résumé

The Web bears the potential of being the world's greatest encyclopedic source, but we are far from fully ex- ploiting this potential. Valuable scientific and cultural content is interspersed with a huge amount of noisy, low- quality, unstructured text and media. The proliferation of knowledge-sharing communities like Wikipedia and the advances in automated information extraction from Web pages give rise to an unprecedented opportunity: Can we systematically harvest facts from the Web and compile them into a comprehensive machine-readable knowledge base? Such a knowledge base would contain not only the world's entities, but also their semantic properties, and their relationships with each other. Imagine a “Structured Wikipedia” that has the same scale and richness as Wikipedia itself, but offers a precise and concise representation of knowledge, e.g., in the RDF format. This would enable expressive and highly precise querying, e.g., in the SPARQL language (or appropriate extensions), with additional capabilities for informative ranking of query results. The benefits from solving the above challenge would be enormous. Potential applications include 1) aformalizedmachine-readableencyclopediathatcanbequeriedwithhighprecisionlikeasemanticdatabase; 2) a key asset for disambiguating entities by supporting fast and accurate mappings of textual phrases onto named entities in the knowledge base; 3) an enabler for entity-relationship-oriented semantic search on the Web, for detecting entities and relations in Web pages and reasoning about them in expressive (probabilistic) logics; 4) a backbone for natural-language question answering that would aid in dealing with entities and their rela- tionships in answering who/where/when/ etc. questions; 5) a key asset for machine translation (e.g., English to German) and interpretation of spoken dialogs, where world knowledge provides essential context for disambiguation; 6) acatalystforacquisitionoffurtherknowledgeandlargelyautomatedmaintenanceandgrowthoftheknowl- edge base. While these application areas cover a broad, partly AI-flavored ground, the most notable one from a database perspective is semantic search: finally bringing DB methodology to Web search! For example, users (or tools on behalf of users) would be able to formulate queries about succulents that grow both in Africa and America, politicians who are also scientists or are married to singers, or flu medication that can be taken by people with high blood pressure. The search engine would return precise and concise answers: lists of entities or entity pairs (depending on the question structure), for example, Angela Merkel, Benjamin Franklin, etc., or Nicolas Sarkozy for the questions about scientists. This would be a quantum leap over today's search where an- swers are embedded if not buried in lots of result pages, and the human users would have to read them to extract entities and connect them to other entities. In this sense, the envisioned large-scale knowledge harvesting [42] from Web sources may also be viewed as machine reading [13].

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

ijcai2011t.pdf (136.58 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fabian Suchanek : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00539623

Soumis le : lundi 13 décembre 2010-16:51:47

Dernière modification le : mercredi 19 avril 2023-04:22:29

Archivage à long terme le : lundi 5 novembre 2012-13:20:23

Dates et versions

inria-00539623 , version 1 (13-12-2010)

Identifiants

HAL Id : inria-00539623 , version 1

Citer

Fabian Suchanek, Martin Theobald, Gerhard Weikum, Hady Lauw, Ralf Schenkel. Semantic Knowledge Bases from Web Sources. IJCAI, SIG, 2011, Barcelona, Spain. pp.0. ⟨inria-00539623⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS CNRS INRIA UMR8623 INRIA2 UNIV-PARIS-SACLAY

274 Consultations

219 Téléchargements

Semantic Knowledge Bases from Web Sources

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager