Semantic Knowledge Bases from Web Sources

Abstract : The Web bears the potential of being the world's greatest encyclopedic source, but we are far from fully ex- ploiting this potential. Valuable scientific and cultural content is interspersed with a huge amount of noisy, low- quality, unstructured text and media. The proliferation of knowledge-sharing communities like Wikipedia and the advances in automated information extraction from Web pages give rise to an unprecedented opportunity: Can we systematically harvest facts from the Web and compile them into a comprehensive machine-readable knowledge base? Such a knowledge base would contain not only the world's entities, but also their semantic properties, and their relationships with each other. Imagine a “Structured Wikipedia” that has the same scale and richness as Wikipedia itself, but offers a precise and concise representation of knowledge, e.g., in the RDF format. This would enable expressive and highly precise querying, e.g., in the SPARQL language (or appropriate extensions), with additional capabilities for informative ranking of query results. The benefits from solving the above challenge would be enormous. Potential applications include 1) aformalizedmachine-readableencyclopediathatcanbequeriedwithhighprecisionlikeasemanticdatabase; 2) a key asset for disambiguating entities by supporting fast and accurate mappings of textual phrases onto named entities in the knowledge base; 3) an enabler for entity-relationship-oriented semantic search on the Web, for detecting entities and relations in Web pages and reasoning about them in expressive (probabilistic) logics; 4) a backbone for natural-language question answering that would aid in dealing with entities and their rela- tionships in answering who/where/when/ etc. questions; 5) a key asset for machine translation (e.g., English to German) and interpretation of spoken dialogs, where world knowledge provides essential context for disambiguation; 6) acatalystforacquisitionoffurtherknowledgeandlargelyautomatedmaintenanceandgrowthoftheknowl- edge base. While these application areas cover a broad, partly AI-flavored ground, the most notable one from a database perspective is semantic search: finally bringing DB methodology to Web search! For example, users (or tools on behalf of users) would be able to formulate queries about succulents that grow both in Africa and America, politicians who are also scientists or are married to singers, or flu medication that can be taken by people with high blood pressure. The search engine would return precise and concise answers: lists of entities or entity pairs (depending on the question structure), for example, Angela Merkel, Benjamin Franklin, etc., or Nicolas Sarkozy for the questions about scientists. This would be a quantum leap over today's search where an- swers are embedded if not buried in lots of result pages, and the human users would have to read them to extract entities and connect them to other entities. In this sense, the envisioned large-scale knowledge harvesting [42] from Web sources may also be viewed as machine reading [13].
Type de document :
Communication dans un congrès
IJCAI, 2011, Barcelona, Spain. pp.0, 2011
Liste complète des métadonnées

Littérature citée [2 références]  Voir  Masquer  Télécharger
Contributeur : Fabian Suchanek <>
Soumis le : lundi 13 décembre 2010 - 16:51:47
Dernière modification le : jeudi 20 septembre 2018 - 07:54:02
Document(s) archivé(s) le : lundi 5 novembre 2012 - 13:20:23


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00539623, version 1



Fabian Suchanek, Martin Theobald, Gerhard Weikum, Hady Lauw, Ralf Schenkel. Semantic Knowledge Bases from Web Sources. IJCAI, 2011, Barcelona, Spain. pp.0, 2011. 〈inria-00539623〉



Consultations de la notice


Téléchargements de fichiers