Using Ontologies to Access Complex Data: Applications on Bio-Imaging

: Information Systems, used to share information, lead to the growth of heterogeneous data and then the dependencies between them. Thus, the links and dependencies among heterogeneous and distributed data are more and more complex during daily activities of users (researchers, engineers, etc.). Our contribution is to propose a methodology to facilitate the exploitation (interrogation and sharing) of complex data in an organization. The system, we propose, tends to mix semantic approach with data management.


Introduction
Information Systems, used to share information, lead to the growth of heterogeneous data and then the dependencies between them.Thus, the links and dependencies among heterogeneous and distributed data are more and more complex during daily activities of users (researchers, engineers, etc.).The data exploitation (interrogation and sharing) has to be adapted to the context of large data and complex dependencies.To overcome this current inconvenience, more and more research works have investigated and studied the Semantic Web (SW) concepts and techniques to give a cognitive access to information and data.Our contribution is to propose a methodology to facilitate the exploitation (interrogation and sharing) of data in an organization.Techniques to handle heterogeneous and complex data structuring are defined in order to support the dynamic evolution of this type of data.A request system using ontology and semantic mechanism is also developed in order to offer a user friendly data management system.This work is defined for Bio-Imaging domain.

Related Work
Data access or querying data is the data search process of users to answer a specific question.It is an important function of any information system.From very first development of database technologies, querying data has been a function often dedicated to Information and Technology (IT) experts.Providing non-IT people like end-users with an efficient way to query database, semantic query for instance, was always a challenged topic.In current BMI data management system, some methodologies have been proposed to enable the data access of end-users.
Riazanov et al, [17] proposed a semantic querying of relational data for clinical intelligence.For the authors, "self-service ad-hoc querying of clinical is problematic as it requires specialized technical skills and the knowledge of particular data schemas".A semantic querying allows end-users (clinical researchers, surveillance practitioners, health care managers…) to formulate queries in terms of domain ontologies which are more understandable than data schemas.User queries are then transformed to the one on the data sources by using a mapping between ontologies' terms and data sources.
LORIS [3] is a web-based data management system for multi-center studies from data acquisition to processing and dissemination.The main querying function of LORIS is Data Querying Gui (DQG).By using a web-based interface, DQG "allows researchers to design, execute, and save queries in a simple and intuitive manner, without having to write complex SQL queries".Ping et al. [15] also developed a web-based dataquerying tool using ontology-driven methodology and flowchart-based model.The former was used to formulate the query task through a Protégé plugin, the later executes queries and presents the result through a visual and graphical interface.
In the BIRN [11], BIRN Mediator helps users make easily a query to a collection of data sources (such as relational or XML databases, or web services) by provide them with a single consistent virtual schema while the actual data resides in remote sources under their original schemas."The mediator maps the sources schemas to the common domain schema, using declarative logical formulas, transparently to the user.The mediator provides the most recent data available, since the user queries are translated into source queries and executed at the sources in real-time".
All these proposed methodologies have been based on the well-known approach called Ontology-Based Data Access (OBDA).This approach uses ontology, the pivot component of the Semantic Web (https://www.w3.org/standards/semanticweb/) to enable the semantic data access of end-users by provide them with a semantic representation of data.In this chapter, we propose to use this techniques in order to support knowledge sharing in Bio-Imaging at Gin Lab.

Knowledge sharing in Bio-Imaging
Several researchers in Bio-Imaging need to share their results and data they use.They manipulate heterogeneous data: human information, brain images, diseases descriptions files.They produce more images, statistics data, diseases and brain descriptions.Information grown very quickly and they need a support that handle data evolving than classic database cannot tackle.
Bio-Imaging researchers in GIN lab has been interviewed in order to identify their real needs and their difficulties to share and use information using existing information system during daily activities.Most of scientists have difficulties in data querying and they almost cannot accomplish this task without helps of database technicians.In fact, their data base in very complex; there is a lot of links between data and it is not very easy for them to discover the data base structure in order to understand where a data can be inserted.Otherwise, the data names in the database do not reflect their use.So, when they want to obtain information from data base, they describe their needs, and a technician try to answer them.The problem is on the interaction with database.They need to use incremental requests in order to discover existing data and apply some process on it.Sometimes, they repeat process techniques already done by other researcher; they need to discover their colleague results and use them.Finally, technician cannot update the Database because of its complexity.
Currently, they extract a copy from data base in Excel files (example Fig 1 ) and each researcher deal with his excel file.The problem is each one, use specific name of the same data.Results are not shared and used.There is a duplication of the database and its structure.As conclusion, there is no information sharing between researchers due of the complexity and the dynamic updating of database.

Example of data Excel File
To answer these problems, we propose: • To organize data in incremental way, basing on the main object.In the Bio-imaging case, it is the studied person (named as subject).So data identified and produced concerns this subject.
• To use a database management system that enable easily to manage and update this type of data organization.For instance, in our application, we use Product Life Cycle Management Tool (PLM) in order to handle this type of organization.In fact, structure of data in PLM is organized as the evolution of the product from the idea to the concept and product characteristics.Adding of that, several forms of data can be supported : data, images, text files, etc.We know that different data management systems exists (SQL, NoSQL, etc.) and data organizations (object, relational, etc.) [16].The PLM system, we used is based on relational data base.We prefer to use a PLM system in which the organization of data correspond to the need of Gin lab bioimaging researchers; keeping data as an evolution of studies around a subject.
• To develop a request interface as a support of domain vocabulary and links to database.So, a domain ontology has been defined and a request system has been developed in order to handle knowledge sharing between bio-imaging researchers at Gin-lab.

Structuring data
As mentioned above, complex data will be structured as relations of a main object.In Bio-imaging studies, the subject (person who follows a medical protocol) is central.So data are structured as to subject as follows: • subject: number, name, birthdate • exam: investigator, type, tool, protocole, results • process: investigator, type, tool, results, references To support the dynamic evolution of data, PLM TeamCenter tool are used.

Product Lifecycle Management
The Product Lifecycle Management (PLM) systems integrate constantly all the information produced throughout all phases of a product's lifecycle to everyone in an organization at every level (managerial, technical…) [19].This type of tools are developed to help product designers and to provide a traceability support of the evolution of n artefact from the requirements to the product use and even recycling (Fig 3).We can figure some key advantages of PLM systems [10]: • Establishing an effective PLM system reduces the enormous data resources to a coherent data flow, avoids redundancies and heterogeneities.• PLM enables the collaboration through distributed and virtual/extended enterprises (workflow and process management, communication and notifications, secure data exchange...) • PLM permits the product structure and its evolution management during different steps and track-performed modifications tracking.• PLM is a mature solution to tackle the heterogeneity, growth and complexity of the data and its processing methods as well as some of the traceability and confidentiality issues.So, PLM system brings together: Products, service, activities, processes, people, skills, data, knowledge, procedures and standards.It provides an efficient solution to handle the complex and heterogeneous data resources and a mature method to track the evolution and modification of these data.However, along with these advantages, it also exists some issues: • Lack of strong stakeholders, ICT tools as well as a common standard between PLM systems causes data integrity problems and limits the access to and sharing of product information and knowledge distributed, • Another issue of PLM community is the increasing of need for product lifecycle knowledge capitalization and reuse in order to reduce time and cost.• Database exploitation requires a good understanding of database structure as well as data model especially in the context where the data is heterogeneous and the links and dependencies among data are complex.

Classification in corresponding with the BMI-LM data model
In this classification, the nature of data is repeated.In fact, we have images classes as results and as entry data.It is the same for processes, descriptions, etc.The low-level expression of UML schema and the complex relations among classes in the classification also brings difficulties for users in querying the database.To overcome this issue, we build an ontology, which bases on both of data model and classification.This ontology shows logic representation of information in Bio-Imaging and it provides an overview of concepts in the data model and the relationship among them but now represented in a natural language, and therefore it allows end-users to create a query close to his reasoning.

Data access using ontology
The concept « ontology » has been used a long time ago in different communities.In the area of computer sciences, "an ontology is a special kind of information object or computational artifact" [9].In 1993, Gruber [8] defined an ontology as an "explicit specification of a conceptualization" while four years later Borst [2] defined it as a "formal specification of a shared conceptualization".This definition implied that the conceptualization should be readable by machine (formal format) and should be expressed a shared view between several parties, that means a consensus rather than an individual view.Merging Gruber's and Borst's definitions, Studer [21] stated that: "An ontology as a formal explicit specification of a shared conceptualization" (Erreur !Nous n'avons pas trouvé la source du renvoi.).The use of ontology brings some benefits: • Support communication and cooperation among systems: Ontology enables interoperability, and integration of heterogeneous data sources.• Enable the knowledge sharing and knowledge reuse.
• Enable content-based access and provide automated services based on machine (the key component of the Semantic Web).Data access using ontology or Ontology-based data access (OBDA) is a new paradigm for data integration and accessing data sources with complex structure [13] It aims to provide end-users with a semantic access to databases by using a three-level architecture [14] containing: • Conceptual layer (ontology layer), • The data sources, • The mapping between the ontology and the data source.
The ontology acts as a mediator between user and the data source.Its aims to provide user with a semantic representation of data sources by using a set of concepts in the domain of interest and all relations among them.The data sources are the repositories of data stocked in a relational or non-relational databases.The mapping layer maps the domain concepts to the data sources.
The queries formulated by users using concepts of ontology are translated to the one on the data sources by using this explicit mapping (Fig 7).

Architecture of OBDA systems
This principle are used to define an Ontology Based System in order to support data acces in Bio-Imaging at Gin lab.Before describing the architecture of this system, letus define the used ontology.

Bio-Imaging Ontology definition
There is a lot of ontology defined in medicine, for instance, in oncology, neurology [6].
But, little work study bio-imaging representation.Gibaud et al, [7] define concepts used in Bio-imaging like: Dataset, Processing, Investigators, Medical Image files, Equipment, subjects, etc. (Fig 8).Conceptual tree of GIN lab ontology

Ontology -Data link
Ontologies' low-level concepts have to be linked to data in database, or unless respect the variable name of these data.An inference engine can help to build a data request and generate links using the propagation of relations between concepts (Fig 11).
To generate request on database, relations between concepts and data types are defined.
For each concept, several data types are linked.For instance, acquisition protocol is related to different protocols in magnetic-resonance-imaging acquisition process.
These links are integrated in XML file as interface between Inference engine and database management tool (the TeamCenter in GIN lab).
A request system is then developed in order to provide a user friendly data request system.Links between Ontology and database model.

Ontology and classification mapping.
A table of mapping between ontology concepts and classes is required since only classes of the classification of data are connected to data.Using this table of mapping, the query formulated by users with vocabularies of ontology will be translated to the one that is understandable and executable by a Query Processor.
One concept can be connected with many classes and vice versa.To simple the mapping, firstly we used some visual and interactive tools (Free-mindhttp://freemind.sourceforge.netfor instance) to map classes to concepts.The mapping was then transformed into XML and integrated in the query system.F i g 1 2 illustrates the mapping between concept "image-acquisition-protocol" of imaging ontology (Fig 12Erreur !Nous n'avons pas trouvé la source du renvoi.)and all leaf classes in branch "AcquisitionDefinition-Branch/ Imaging" of the classification of data.

Ontology based query system
By using ontology tree and ontology graph, ontology-based graph query interface helps users to make a query more easily.Using the ontology tree and ontology graph, users can understand the relationships among concepts and directly choose query parameters.
Users also can choose a query in query history to re-execute, modify or complete it.When a user completes his query, our system does as following: 1. Identifying nodes links, following relations in ontologies.
2. Generating an output query in a format understandable and executable by Query Processor, XQuery Engine for example.With the support of these graph, user understand relationships among data in the database, he/ she defines the conditions of query according to their purpose.We take here an example of query frequently used by scientists at GIN lab: "Querying about Brain image results from Retin Treatment using Dash protocol corresponding to men less than 45 years old, left handled and passed exam before January 2013" If a researcher tries to define a SQL query using PLM related to this question, without using ontology.He/she has to know relations between Study Subject, Exam results, Processing definition and Processing Results data in PLM Database.First, the name of objects in data base are more computer driven called and he has to follow all links in order to identify the ones corresponding to his query which can then be:

Select Image Results from Processing Results
Where Processing Protocol = "Dash" and Processing Treatment = "Retin" and from Acquisition Result where Acquisition Date < "January 2013"and from Study Subject where Study Subject Gender = "men", Study Subject Handless = "Left handled" Study Subject Age <"45".

CONCLUSION
In this complex and heterogeneous data management problem is studied; How to search data without knowing the data structure, how to discover data defined by actors in an organization, how to share knowledge and data in friendly way.A general approach for ontological model construction and an ontology-base query interface is presented as a solution to tackle the difficulties in querying complex database.Data will be organized around objects.For that, specific data organization systems can be used as PLM.PLM help to organize data as the evolution description of product parts.an architecture and request system are developed in order to build links between data management and semantic system.A use case in Bio-Imaging domain has been also used to illustrate the abilities of our proposed interface.
As future work we will focus on the test of proposed query interface with various queries sets (in Bio-Imaging domain) and engineering design (in PLM).The ontology tree and ontology graph must be also developed to cover all concepts in Bio-Imaging domain.Ontology will be implemented in semantic web language (RDF, SPARQL) in order to use inference engine for information search.We propose to generate an alert system when new data are added with semantic annotation of these data in order to show the reason of their creation.
Fig 1.Example of data Excel File

Fig 2 Fig 2 .
Fig 2 illustrates the needs of knowledge sharing at Gin-Lab

Fig 4
Fig 4 presents the BMI-LM (Bio Medical Imaging Life cycle Management) data model used in the PLM "TeamCenter 9.1" [1].By adopting PLM solutions in the context of Bio-Imaging, this PLM-oriented data model covered the whole stages of a BMI study from specifications to publications and enabled the flexibility in data management.

Fig 4 .
Fig 4. BMI-LM data model implemented in Teamcenter 9.1 BMI-LM contains three types of objects: Result objects (Exam, Acquisition, Data Unit, Processing), Definition objects (Exam, Acquisition, Data Unit, Processing) and Reference objects (Bibliographical, Data)."Definition" concepts have been used in order to enable the reuse of data.For example, all the Processing results computed by using the same Acquisition device and the Processing parameter can be attached to the same corresponding Processing definition.The classification (Fig 5) has been built based on the data model.From that, BMI data have been classified into branch, classes and subclasses.The classification allows a specific class to be added to a generic item (object in the data model).In comparing with the data model, the classification and its attributes are easier to modify for user than objects attributes, it is good to fit the model flexibility requirement and also for the appropriation of the database by user[1].

Fig 10 .
Fig 10.Conceptual graph of GIN lab ontology

Fig 11 .
Fig 11.Architecture of Knowledge sharing in PLM

Fig 12 .
Fig 12. Ontology concepts tree and the classification of data mapping.

3 .
Executing the output query on PLM data file (.xml or.json format).4. Results are then visualized as a graph and data in the Interface query.

Fig 13 .
Fig 13.Ontology-based graph query interface Fig 14 presents  an extract of results represented in a graph.

Fig 14 .
Fig 14.Graphical representation of results.Click on a node to highlight all related nodes.