Exploiting web ontologies for automated critical infrastructure data retrieval

Semantic web technologies play a signiﬁcant role in many open data initiatives, including geo-mapping projects and platforms. At the same time, semantic principles are also promoted as a key enabling factor for multi-domain analyses of critical infrastructures as well as for improved emergency response. This chapter reviews the recent literature on ontology-based analysis and management of critical infrastructures, and proposes the use of ontology processing techniques to bridge the gap between infrastructure knowledge representation and available (often general-purpose) open data sources. In particular, it discusses an approach for matching a given critical infrastructure ontology to an ontology built on OpenStreetMap (OSM) tags that enables structured access to the associated geographical dataset.


Introduction
Directive 2003/98/EC [19] and revised Directive 2013/37/EU [20] are fundamental policy references for the implementation of European Union initiatives on public data sharing.Furthermore, Commission Decision 2011/833/EU [18] enforces the principles of extended accessibility, economy of access to data, reusability and expanded data reach.Accordingly, the European Union Open Data Portal [22] was established as "the single point of access to a growing range of data from the institutions and other bodies of the European Union (EU)."The portal offers searching, exploration and downloading functionalities, and supports semantic technologies (e.g., SPARQL queries) based on linked data principles.Data is free for use and reuse for commercial and non-commercial purposes, and users can provide suggestions and feedback.
Directive 2007/2/EC [21] established the Infrastructure for Spatial Information in the European Community (INSPIRE), which promotes extensive geographic data sharing.The data infrastructure under development [30] has the objective to support European Union environmental and environment-related policies and activities by enabling data sharing and access by public organizations and the community.A series of implementation stages is planned for completion by 2021 [32].Related legislation [31] focuses on data specifications, metadata, network services, data and service sharing, monitoring and reporting.
Several other governmental and non-governmental open data initiatives are also in place [44].An interesting phenomenon is represented by the surge of voluntary open data projects, many of which focus on geographic knowledge.Ballatore et al. [3] provide an overview of a number of these efforts, mainly focusing on global-scoped and mostly crowdsourced projects based on popular semantic web formats specific to geographic data and to broader knowledge realms.
Over time, the original concepts informing the creation of the first geographic portals have been evolving towards a next-generation vision involving "multiple connected infrastructures based on open access and participation across multiple technological platforms that will address the needs of different audiences" [26].Additionally, Digital Earth [27] is expected to evolve into "a digital nervous system of the globe, actively informing about events happening on (or close to) the Earth's surface by connecting to sensor networks and situation-aware systems" [26].Semantic heterogeneity is acknowledged as one of the key challenges to achieving this objective.Semantic web principles are being applied to address this issue and the Semantic Geospatial Web initiative, supported by the W3C Geospatial Semantic Web Community Group [78], promotes the use of geospatial ontologies, semantic gazetteers and geographic vocabularies.
Semantic-web-oriented effort are accompanying the development of Open-StreetMap (OSM) [46], a collaborative mapping project that aggregates geographic information collected on a voluntary basis.OpenStreetMap data is organized into different element types and users complement them with freelychosen tags in order to associate meaning and supporting information with geographic items.Each tag is defined by a (key, value) pair and, while the use of existing tags is encouraged, contributors are allowed to introduce new tags.
As a result of this intrinsic flexibility, the set of tags in use dynamically evolves in time along with geographic entries.Therefore, in addition to the tag reference pages provided by the OpenStreetMap Wiki [49], projects such as Taginfo [48] exist to keep track of the tags currently represented in OpenStreetMap and to provide statistics about their usage.Methods and tools based on semantic web principles have been proposed to overcome tag heterogeneity and enable structured access to the OpenStreetMap dataset and related resources.The LinkedGeoData portal [38] accommodates OpenStreetMap-sourced dataset information and links to third-party projects in a semantic web format [2,60].
A resource description framework (RDF) graph representation based on the OpenStreetMap Wiki contents is provided by the OpenStreetMap Semantic Network [47] that also maps OpenStreetMap tags to corresponding concepts in the WordNet lexical database [77] and LinkedGeoData [3].An ontology constructed over the set of OpenStreetMap tags is provided by the OSMonto Project [50], enabling hierarchically structured access to tags and providing a baseline reference to interface with other types of ontologies in order to perform, for instance, semantic analysis [8].
These efforts are part of a wide landscape of general ontology application scenarios.Uschold and Gruninger [68] identify the following four categories: Neutral Authoring: Information artifacts are authored in a unified, ontology-based language, supporting conversions to multiple target formats and ultimately overcoming interoperability constraints imposed by ad hoc approaches.
Common Access to Information: Ontologies are exploited to translate information between various formats and representations.
Ontology-Based Specification: Ontologies provide a basis for specifying and developing applications.
Ontology-Based Search: Ontologies are used to support structured access to information repositories, enabling their organization and classification at appropriate levels of abstraction.
Ontologies can be expressed using formal languages, often based on firstorder logic or descriptive logic.Researchers [13,25] have examined the development of ontology languages, in general, and web ontology languages, in particular.The OWL Web Ontology Language is one of the most common standards in use today [1].
As discussed in this chapter, ontologies and semantic (web) technologies are also being promoted in the literature to support the analysis and management of critical infrastructures.For instance, they help provide a systematic representation of heterogeneous systems in terms of entities and their interconnections for study and simulation purposes.They can also portray threat types, targets and actors involved in disaster response, as well as enhance emergency management and information sharing [39].The emergence of semantic-oriented approaches in the geospatial information and critical infrastructure protection communities provide opportunities to create enhanced critical infrastructure analysis and management solutions.Indeed, geo-information sources such as OpenStreetMap contain valuable critical infrastructure information and they stimulate community efforts to overcome crises [29,58].
This chapter considers all the aspects discussed above and relates the use of ontological representations of critical infrastructure concepts (e.g., assets, threats and interdependencies) to information collection from data sources.Several ontology-based methods for critical infrastructure analysis and management are reviewed, and a method for critical infrastructure information retrieval from open data sources with an emphasis on OpenStreetMap is proposed.The method exploits ontology mapping as a means to interface an assumed ontology describing a critical infrastructure system to a second ontology constructed on the OpenStreetMap tag system.

Ontological Approaches
Ontologies play a significant role in the development of models of critical infrastructures and their management during and after adverse events.This section reviews recent literature in the areas of conceptual modeling of critical infrastructures, critical infrastructure simulation and information sharing.

Critical Infrastructure Modeling
Systematizing knowledge about critical infrastructures requires the establishment of a consistent semantics and a means for addressing the diversity that stems from an inherently multi-disciplinary investigation area.At the same time, it opens many opportunities for analysis, including automated reasoning and decision support.The construction of taxonomies is a fundamental step in this direction.Drawing on the work by Perrow [53], Rinaldi et al. [54] have proposed a seminal taxonomy of critical infrastructure elements and their interdependencies.Another notable work is the Infrastructure Data Taxonomy of the U.S. Department of Homeland Security [69] that has been used to guide and structure analyses [64].Taxonomies have also been employed to categorize critical infrastructure threats and attacks (see, e.g., [35] for cyber-related threats).
Wolthusen [76] has proposed a method for representing critical infrastructure systems that is oriented towards data collection and exchange as well as modeling and simulation.The approach starts with a high level description of entities and dependencies, and exploits multigraphs to handle different types of dependencies.An ontological model and exchange mechanism data format are introduced based on RDF and OWL, and a multi-domain critical infrastructure representation sourced from expert knowledge is formalized.
Lee and Gandhi [36] have introduced an ontology-based active requirements engineering framework for software-intensive systems analysis based on a hierarchical representation that includes top-level generic requirements, mid-level domain spanning requirements and leaf-node subdomain requirements.
Sotoodeh and Kruchten [59] present a conceptual modeling framework for disaster management that comprises three ontologies: an emergency operation center ontology related to disaster response components and two disaster affecting infrastructure ontologies that represent infrastructures and their reference communities along with their relationships at a high level of abstraction.Concepts such as regions and people with associated wellness characterizations are included together with infrastructure and resource characterizations.Interdependencies are described at the physical and social levels.
Sicilia and Santos [57] have introduced an infrastructure incident assessment ontology for the high-level representation of infrastructures, incidents and their causes.It involves a service-dependent semantics of connections.Interdependencies (physical, connectivity-based, policy-based and procedure-based) can be specified a priori or inferred using Semantic Web Rule Language (SWRL) rules.The use of reasoning techniques to support emergency response is also demonstrated.
A network security framework has been developed by the INTERSECTION Project [11] for identifying and classifying vulnerabilities in heterogeneous networks [7].The framework comprises an ontology that extends beyond singledomain networks and focuses on resources and vulnerabilities as the key components.Four resource subclasses are identified: (i) physical resources; (ii) logical resources; (iii) software; and (iv) services.In addition, three vulnerability subclasses are identified: (i) physical resource vulnerabilities; (ii) logical resource vulnerabilities; and (iii) software vulnerabilities.An OWL-based decision support architecture is presented as well.
An ontology handling tool created by the INSPIRE Project [10] and described in [4] is a standards-based instrument for enabling automatic audits of the security and criticality levels associated with information systems.Its infrastructure discovery component, ontology repository and expert visualization tool combine to facilitate analyses of critical infrastructure vulnerabilities while considering the associated information and communications technology components.
Creese et al. [14] have used automated reasoning for critical infrastructure resilience assessment problems based on a top-down, layered conceptual mapping of assets, controls, vulnerabilities and risk.An ad hoc dependency modeling language that exploits a natural-language-like semantics is used to enable automated reasoning and what-if analyses with a focus on organizational aspects.
El-Diraby and Osman [17] have used domain ontologies to depict urban infrastructures in terms of processes, actors and products.Physical products are classified into generic and sector-specific products.Infrastructure products are characterized in terms of the functions they perform (i.e., conveyance, control, protection, access, measuring, storage and locating products).The composition of products is allowed and an extended set of attributes (i.e., dimensional, spatial, material, shape, cost, performance, surrounding soil, dependency, redundancy and state of operation attributes) is specified for each product.The construction of the ontology is subject to formal consistency checks and an expert elicitation-based assessment.
De Nicola et al. [15] have presented an ontology and semantic rules for emergency management in smart cities. CEML is used as the reference modeling language.The knowledge modeling strategy employs an upper-level ontology that extends the domain ontology with CEML concepts, along with an emergency ontology that enables automated knowledge management and reasoning.A tool exploiting SPARQL provides automated support for defining emergency management plans.
Xu et al. [79] employ geo-ontologies for earthquake emergency response.Knowledge is organized into four classes: (i) factual knowledge; (ii) rule-based knowledge; (iii) procedural knowledge; and (iv) meta-knowledge.The content types include emergency response and rescue knowledge, disaster information estimation, emergency information and terms, and emergency foundation data.
Takahashi and Kadobayashi [62] provide a list of industry specifications and a reference ontology for cyber security operational information.Conceptual models that target cyber dependencies are considered in [45], where a human factors ontology is employed to specify a cyber security framework.Other researchers [66,67] discuss an ontology-based approach for vulnerability and interdependency representation as well as disruption scenario generation for critical infrastructures.Luo et al. [40] have developed a knowledge modeling formalism for emergency situations and planning in metropolitan areas, and have implemented it in a training tool.

Critical Infrastructure Simulation
Ontologies are widely used in critical infrastructure simulation architectures.In particular, they provide model specifications and enable the integration and combination of multiple analysis techniques and tools.Conceptual interoperability has been formally studied in the context of simulation theory.Wang et al. [75] have introduced a conceptual interoperability model for determining the degree of interoperability of a system.Various aspects of integration, interoperability, composability are discussed in [52].
Critical infrastructure taxonomies are leveraged in creating complex critical infrastructure simulators.For example, Tolk et al. [65] have presented a modeling and simulation development framework, and have used it in a case study involving the Infrastructure Data Taxonomy of the U.S. Department of Homeland Security [69].
Van Dam and Lukszo [71] have employed agent-based models for the energy and transportation infrastructures.Their bottom-up methodology, which is motivated by the presence of multiple decision makers, distributed nature of the problem and dynamic operational environments, involves a generic ontology that is customized to the domains of interest based on expert opinion.
The IRRIIS Project [12] has developed an ontological information model for vulnerability analyses of large and complex critical infrastructures [34].It is implemented in a federated simulation environment and supports the development of a risk estimator for determining if specific conditions in an infrastructure are critical singly or in combination.
The DIESIS Project [9] has adopted a layered approach for federated critical infrastructure simulation.It employs ontologies to describe the dynamic bindings between subsystems [55].An ontology component is used to express meta-knowledge (abstract representations of basic system concepts and relationships); this is accompanied by an infrastructure ontology (domainspecific critical infrastructure ontology) and a federation ontology (for specifying semantically-coherent interconnections and rules).
Masucci et al. [42] discuss the derivation of ontology components and their relationships using OWL and SWRL.Castorini et al. [5] describe an application involving the power grid, railway and telecommunication domains, and their mutual relationships.Interested readers are referred to [63,70] for more details.
The I2Sim interdependency simulator is a key contribution to ontology-based simulations of critical infrastructures [41].To support I2Sim development, an ontology is presented for modeling temporal dependencies between infrastructures based on tokens (goods and services provided by one entity to another), cells (entities that perform functions), nodes (token generators) and transportation channels (flows of tokens subject to capacity and time delay constraints).
Ventura et al. [73] expand on this approach and introduce a taxonomy for classifying infrastructure interdependencies based on criteria such as the nature of the involved entities (human-object, object-object or human-human), directions of relationships (unidirectional or bidirectional), nature of relationships (information, physical, geographical or organizational/human/societal), states of relationships (static or dynamic) and type of failure if disrupted (cascading failure in associated entities, escalating failure or common origin failure).
Grolinger et al. [28] explore ontologies associated with a water distribution simulator and power system simulator, and map them to the I2Sim ontology.According to Grolinger et al., while federated simulation approaches as used in the DIESIS Project attempt to "integrate existing domain simulators by enabling their coordination and collaboration," I2Sim belongs to the architectural type that "includes simulation frameworks that enable the modeling of different infrastructures and their interdependencies."

Information Sharing
Another research topic covers information sharing and the related interoperability concerns, especially the need to manage the diversity of data sources and formats in emergency response procedures that often involve a number of actors.The literature in this field is extensive and has strong relationships with the conceptual modeling and simulation of critical infrastructures.Some of the literature discussed above addresses this aspect as well.However, certain recent contributions related to information sharing remain to be discussed.Kim et al. [33] present an information sharing mechanism based on ontologies that addresses cyber dependencies between infrastructures.Di Maio [16] has proposed an open ontology approach that improves the performance of emergency response systems based on the principle of collaboration.Di Maio also discusses different levels of conceptual interoperability and the important notion of resilience in emergency response systems.
The interoperability gap affecting emergency planning systems is addressed in [74] using an emergency planning ontology.This ontology, which is based on the suggested upper merged ontology, is formally specified in terms of concepts, relations, functions, axioms and instances.
Galton and Worboys [24] have proposed architectural specifications for interoperability that take into account sensor networks and crowdsourced informa-tion collection procedures, together with related spatio-temporal data distribution considerations.Galton and Worboys also note that open-source geospatial information can be very useful in an emergency management framework.
Li et al. [37] describe a cloud computing platform for emergency management that relies on crowdsourced information.Drawing on existing emergency management ontologies, Li et al. introduce a novel ontology that considers additional information such as the types of hazards and emergencies, as well as meteorological factors.Castrucci et al. [6] have developed a mediation system that enables secure communications between critical infrastructures, implements fault mitigation strategies and supports information discovery while overcoming information exchange and data heterogeneity problems.

Ontology-Based Information Retrieval
The literature review in the previous section covers conceptualizations of various aspects of critical infrastructure protection and emergency management.An important point is that structured descriptions of critical infrastructures can also enhance information retrieval processes.As a consequence, this section focuses on an information retrieval approach for critical infrastructures based on ontology matching or alignment [23,56].
Ontology matching techniques seek to find relationships between elements of different ontologies under analysis and provide similarity measures.An automated procedure for performing the alignment is important when dealing with large ontologies, multi-step information exchange chains and real-time processing.The complexity of multiple ontology matching applications can be managed by combining the alignment criteria and the resulting similarity measures using expert judgment or artificial intelligence [72].
The proposed matching-based approach has three sub-tasks: (i) ontology population; (ii) ontology matching; and (iii) ontology-driven data retrieval.These sub-tasks are described below.

Ontology Population
In the first step, two starting ontologies CI Ont and T Ont are created and populated.CI Ont is a domain ontology that describes a set of critical infrastructure components, threats and their relationships.T Ont is a target ontology built on a reference dataset from which information is to be retrieved.The application focuses on the OpenStreetMap geographic dataset.
The following are the key aspects involved in constructing the two ontologies: CI Ont: An OWL ontology CI Ont is constructed to include classes that describe the set of critical infrastructure sectors, sub-sectors and asset types (i.e., critical infrastructure classes) relevant to the geographic information retrieval problem and classes that describe the set of threats (threat classes).Subclass relationships are established between the elements of the critical infrastructure classes based on general-purpose and specialized glossaries and taxonomies.Interdependency relationships are also specified between critical infrastructure classes based on information collected from the technical literature.Furthermore, threat-to-criticalinfrastructure class relationships are specified to express the significance of threats to the various infrastructure elements.
T Ont: The T Ont ontology is populated with tag information via a semi-automated process based on the Taginfo system.A set of significant reference keys (e.g., building, highway, natural, land use, surface, power, waterway, wall, amenity, leisure, railway) is first identified for the analysis domain of interest and consistent with CI Ont.A set of values is extracted for each key from the same source based on a filtering criterion.As in the case of the OSMonto ontology, the filtering criterion only includes values used a sufficiently high number of times according to the statistics provided by Taginfo.The resulting keys and values are arranged into the OWL ontology T Ont to express the hierarchical relationships between keys and their associated values.

Ontology Matching
An alignment is computed based on the CI Ont and T Ont ontologies and additional lexical resources.A preprocessing step and a multi-step alignment procedure are involved.
Preprocessing.In this step, the CI Ont and T Ont ontologies are processed to adhere to established orthographic rules and standards (e.g., hyphenation and capitalization of labels).Furthermore, CI Ont undergoes an ontology enrichment stage.This is accomplished by defining a lexicon L CI Ont by collecting the labels associated with the CI Ont classes.The lexicon is partitioned into L CI CI Ont and L th CI Ont by collecting the labels of the critical infrastructure and threat classes, respectively.
For each element in L CI CI Ont , a set of relevant synonyms is fetched from the WordNet database via an automated routine that uses the MIT Java Wordnet Interface [43].Correspondingly, the enriched critical infrastructure ontology CI Ont,e is constructed by extending CI Ont with the synonym entries and establishing consistent equivalence relationships.The associated enriched lexicon L CI Ont,e collects the labels of the extended set of classes so that L CI Ont ⊆ L CI Ont,e .The lexicon L CI Ont,e is partitioned into L CI CI Ont,e and L th CI Ont , where L CI CI Ont,e collects the extended set of labels of the critical infrastructure classes in the enriched ontology.Finally, in the case of T Ont, a lexicon L T Ont is created based on OpenStreetMap keys and values expressed in the ontology.
Multi-Step Alignment.The alignment procedure involves the following steps that are performed in sequence: Lexical Matching: Exact element matches between L CI CI Ont,e and L T Ont are determined.
String Matching: String similarity metrics are employed to compare terms from the two lexicons.One of the metrics used is the Levenshtein distance (e.g., [61]).
The two matching methods are applied sequentially so that the alignments found via lexical matching are included directly in the final result while string matching is used to search for additional relevant bindings.The final alignment is produced using a similarity aggregation criterion encoded in a matching matrix M(L CI CI Ont,e , L T Ont ) ∈ [0, 1] (|L CI CI Ont,e |,|LT Ont |) based on similarity thresholding.To increase the confidence levels when applying string matching, it is possible to consider the presence of multiple matches among CI Ont synonym classes and T Ont components that are associated with OpenStreetMap tag values that refer to the same keys.

Data Retrieval
The inclusion of threat classes, threat-to-critical-infrastructure class relationships and interdependency relationships in CI Ont enables the discovered alignment to be used for targeted data extraction.This is accomplished by defining a relationship map in terms of the following adjacency matrices: |) for all threat-to-critical-infrastructure class relationships, where (i, j) = 1 means that threat i affects critical infrastructure component j based on lexical indexing.The three adjacency matrices are then combined with the matching matrix described above.The set of T Ont items of interest is then obtained starting from each considered threat component.Thus, starting from a specified threat scenario and a geographical area of interest, it is possible to use the composition and query OpenStreetMap based on the significant (key, value) items that are found.The operation can be performed, for instance, by using the Overpass API [51] via the Overpass Turbo interface (see overpass-turbo.eu).
As an example, consider a scenario where waterways are affected by a specified threat and the road infrastructure may be affected due to the interdependencies.For both components, the matcher identifies a set of relevant (key, value) pairs to include in a query, whose output is presented in Figure 1.The identified pairs include several facilities that are related to the waterway and road sectors.

Conclusions
This research has focused on the use of semantic technologies in critical infrastructure analysis with an emphasis on information retrieval.Of particular interest has been the use of ontologies for critical infrastructure modeling and simulation as well as for emergency response.The proposed approach for ontology-based information retrieval from open geographic data sources (especially, OpenStreetMap) is applicable to critical infrastructure protection.The target ontology describing the OpenStreetMap content is constructed based on statistics about the actual use of tags, which evolves over time.An ontology specified for critical infrastructures incorporates threat and interdependency information and is enriched during processing.The matching procedure used for querying information is layered in terms of alignment methods and comprises lexical and string matching components.The principal contribution of this research is its ability to foster critical infrastructure tool integration, interoperability and composability.Moreover, the structured access to opensource information facilitated by the proposed approach can enhance multisector knowledge advancement, especially in conjunction with expertise from various technical fields.
Future research will incorporate interactive matching and special taxonomies and dictionaries devoted to critical infrastructures for ontology enrichment and improved similarity aggregation.Additionally, efforts will focus on inference mechanisms for incrementally improving the reference critical infrastructure ontology and on incorporating data quality checking mechanisms.