OGDL4M Ontology: Analysis of EU Member States National PSI Law

. Developers of Open Government Data Mash-ups face the following legal barriers: different licenses, legal notices, terms-of-use and legal rules from different jurisdictions that are applied to an open datasets. This paper analyzes implementation of Revised PSI Directive in EU Member states, also highlights the legal problems. Moreover it analyzes how Public Sector Information is defined by the national law and what requirements are applied to the datasets released by public sector institutions. The results of the paper show that PSI regulation in EU Member countries is very different and the implementation of revised PSI Directive is not successful. These problems limit the reuse of Open Government Datasets. The paper suggests the ontology in order to understand the requirements that originate from the national EU Member countries law and which are applied to Open Government Datasets. Also, the ontology models different implementations of the EU PSI Directive in the Member countries.


Problem and motivation
Open data, open government data definitions and princip les were presented in our previous work [1].This paper will focus on how the technology could be used in dealing with a d ifferent regulat ion of the important subjectopen government data (OGD).
In general, data is a fuel for Enterprise Information Systems.According to the Report [1] EU econo my could potentially gro w by 1.9 per cent GDB by 2020 as a result of reusing big & open data.In the ideal World the idea of Linked Open Data [2] could be realized easily, but the law and the regulation of data make this idea hard to a ccomplish in a real-life.Govern ments, municipalities and other public bodies are releasing Public Sector Informat ion (PSI) under d ifferent legal and technical conditions, which are unstable and create artificial barriers to get benefits fro m the re-use of information.Probably, the most efficient results that follow fro m the use of open government data can be extracted when the data is merged, connected, combined, mixed or enriched and analyzed in other ways.However the legal problems, that do not allow to do it smoothly and to reach the expected economic benefits, exist.
Open data licenses (or other regulation as legal notices, terms of use) are not un ified.This problem influences a deep analysis of open data licenses for every develo per before starting to connect different datasets in a mash up model.The results of The Survey of the Licensing of Open Govern ment Data [3] had d iscovered a crit ical situ ation concerning regulation (licensing) regime: the national open government data portals consist of datasets which are protected by different licensing regimes starting fro m 33 (Spain), 16 (Germany, Italy) and ending up to 1-2 (Austria, EC, Moldova, Portugal, UK) regimes.
Different licensing terms mean that: first of all it is not clear if the datasets can be merged, used for commercial purposes or are there any limitations applied to the mashup work protection, also if the different Adapters licenses can be used.The Su rvey [3] identified that OGD portals consists of the datasets, which identify wrong licensing regimes, or do not identify any licensing regime at all (it is not clear if the lin k to regulation is missing, or there is no regulation applied), or the rules that come fro m national PSI law are not being copied.This situation creates a possible risk that government (the owner of OGD) could start legal procedures against the developers of OGD because of violation of the national PSI rules, even when notification about the licensing regime is provided not correctly by the government itself.
So how the developers of Enterprise Informat ion Sys tems which use OGD could avoid investments to legal analysis of OGD regulation and to reduce risks coming fro m possible failure of misinterpretation of national law in the global environ ment?The possible solution is to force governments to withdraw all regulat ion of the OGD, or alternative solution is to have a tool wh ich provides legal analysis of OGD aut omatically, or at least semi-automatically.
We believe that it is possible to create such a tool.We decided to deal with the legal problems co ming fro m EU Member States in that way: 1) we have identified general p roblems existing in the PSI do main of EU (different regulation object in national law, PSI directive and Revised PSI direct ive is not implemented fully); 2) we have found what kind of specific legal requirements are applied to open government d atasets by national PSI law and 3) we have tried to model those requirements in the Ontology aiming to create a useful tool to understand the comp lexity of OGD regulation on EU level.This paper is organized: 1) introduction to the problem and mot ivation; 2) analysis of imp lementation of Rev ised PSI Directive; 3) analysis of EU Member States natio nal PSI law; 4) ontology for the legal requirements of OGD; 5) conclusions and future work.

2
Open Government Data: legal problems coming from EU in re -use of PSI domain In European Union the philosophy of re-use of public info rmation and the main legal requirements applied to Open Govern ment Data are co ming fro m PSI Directive.If the concept of PSI Directive [2] (including Revised PSI Directive [3]) wo rked as it is planned, legal prob lems concerning the re-use of open datasets would not exist.Unfortunately the reality is different.EU Co mmission still has a lot of work to do in order to change the existing opinion, that the information hold by the public institution is the property of the state and "no one can touch it".Our investigation has found that the development of EU Co mmission supported PSI concept could be grouped as: 1) The period before the PSI Directive was adopted; 2) The period of implementation of the PSI Directive (~2003/2005-2013/ 2015); 3) The period of revision of the PSI Directive in 2013 and its implementation.Before the PSI directive was adopted, the concept of PSI was developing decentralized in EU member and pre-member countries.Every single country had its own independent concept which had created "Tower of Babel" effect.In 2003 the PSI Directive was published and should have been imp lemented until 2005.PSI directive sets a minimu m harmonis ation of national rules and practices of PSI concept and its re-use.Implementation of PSI d irective wasn't enough successful in Co mmun ity and revision of PSI d irective was made after 10 years.The rev ised PSI d irective g ives tools to EU Co mmission to control the imp lementation of the PSI direct ive and hopefully in the next years the united concept of PSI in EU could be found, if EU Co mmission could use those tools effectively.

Implementati on of Revised PSI Directive
The survey investigated the laws of the national PSI law of Member states pu blished in the Portal of European Commission [4].There are some explanations of the Tab le 1: 1) in Spain different charges for the commercial re-use may apply wh ile Revised PSI Direct ive do not allow such an op-tion; 2) in Latvia the re-use is allowed only for private individuals; 3) in Den mark charging princip les are not applied; 4) in Hungary different terms of exclusive arrangements are provided fro m the 1st of January 2016 instead of the 17th of July 2013 and Hungary excludes libraries, museums and archives, university libraries fro m the duty to provide the information for the re-use and etc. 5) Finland has not implemented the PSI directive because it had already imp lemented their unique concept: PSI belongs to the public domain.

Analysis of National PSI Law
As we already have found the implementation of Revised PSI Directive was not successful, we continued the analysis of national PSI law to get a clear view regarding the legal framework and discover the differences that follows fro m the OGD regulation.
We have asked two questions to start the legal analysis of national PSI laws in EU Member States: 1) Does the investigation objectpublic sector information -is understood in the same way as it is defined in EU PSI Directive, if not?If yes, thenhow it differs?2) What are the legal requirements applied to OGD licensing?

Analysis of PSI term used in legal domain of EU Member countries
Analysis of the legal domain in EU and its member countries indicates that the main problem is that term "Public sector informat ion" is differently understood in EU member countries, but EU leg islation is trying to gather different concepts to one united concept of PSI.In the wider approach, PSI concept could be found not only de-centralized or united, but also direct or expanded.Direct concept covers the idea of the concept which already comes exactly fro m the term "Public sector information" and includes different forms of informat ion managed by Public sector.Expanded concept fulfills the direct concept by extra rules, exceptions and tasks.There is a good example of direct PSI defin ition published by The Organization for Economic Co-operation and Develop ment (OECD): Public sector information is "informati on, including in formation products and services, generated, created, collected, processed, preserved, maintained, disseminated, or funded by or for the Govern ment or public institution" [5].OECD PSI definit ion is clear enough and describes PSI basically as all the information that with holds the Public institution.EU PSI Direct ive represents expanded form of PSI concept and presents a bit diffe rent concept of PSI (co mparing to OECD), because the PSI concept has been deve loped from "the right to get access to public informat ion" and it's basically could be described shortly as accessible information to public which can be re-usable by public and it is hold by Public institution.This concept during 10 years has changed a bit fro m "can be re-usable" (in PSI Directive, 2003) to "must be re-usable" (in Rev ised PSI Directive, 2013).The term 'in formation" got expansive meaning in nowadays and usually is used as synonym to data, records, documents and etc. Erik Borglund and Tove Engvall investigated how the open data discourse is co mmunicated in legal text and they found out that there is no single term and the principal words are: record, informat ion, document and data [6].
It is not a surprise that the terminology problems arrive to European Union, esp ecially including its Member States' legislation.In European Union Member States legislation Public sector information (PSI) definition is understood differently.
In Directive 2003/98/ EC (PSI Direct ive) PSI is understood as a " document" and during revision o f the direct ive the definit ion was not changed but concept was e xpanded in Directive 2013/37/ EC (Revised PSI Directive).Imp lementation of PSI Directive and the Revised PSI Directive in the EU Member States still is developing, so the PSI definition is not yet harmonized by EU Member States national law.
Definition of the document is provided by Directive Article 2 Para 1 Sec 3: 'Document' means: (a) any content whatever its mediu m (written on paper or stored in electronic form or as a sound, visual or audiovisual record ing); (b) any part of such content."[2] So basically, Public sector informat ion is understood as document or part of the document, no matter what form o r content.In preamb le of Direct ive term "document" used as synonym to information and includes also data.In legal interpretation term "document" is mo re related to legal responsibility of institution or in formation ho lder co mparing to other terms as "information" or "data".Also, concept "access to documents" comes from "right to get information fro m pu blic sector" and it was understood as right to get some concrete documents.Secondly, after 10 years PSI directive was revised with an intention to harmonize more the PSI definition in member states.The legislators of Directive 2013/37/ EU (revised PSI d irective) noted: "since the first set of rules on re-use of public sector informat ion was adopted in 2003, the amount of data in the world, including public data, has increased exponentially and new types of data are being generated and collected (recital 5)." [3] "At the same time, Member States have now established re-use policies under Directive 2003/ 98/ EC and some of them have been adopting ambit ious open data approaches to make re-use of accessible public data easier for citizens and companies beyond the min imu m level set by that Directive.To prevent different rules in d ifferent Member States acting as a barrier to the cross -border offer of p roducts and services, and to enable co mparable public data sets to be re-usable for pan-European applications based on them, a min imu m harmonization is required to determine what public data are available for re-use in the internal information market, consistent with the relevant access regime.(recital 6)" [3].On one hand, legislators exp ressed their good will to harmonize "public data" (it affects internal European information market) in p reamble o f Revised PSI Direct ive but, on other hand, important changes to definition was not done in the text of PSI Directive Art icle 2, only the concept of PSI was updated.
Thirdly, the PSI direct ive 2003/98/ EC is imp lemented in all EU member countries and EEA countries (Iceland, Liechtenstein and Norway).The problem exists that "EU Member States have imp lemented the PSI Direct ive in different ways.13 Member States have adopted specific PSI re-use measures: Belgiu m, Cyprus, Germany, Greece, Hungary, Ireland, Italy, Lu xembourg, Malta, Ro man ia, Spain, Sweden, United Kingdom. 3 Member States have used the combination of new measures specifically addressing re-use and legislation predating the Directive: Austria, Den mar k and Slovenia.9 Member States have adapted their legislative framewo rk for access to documents to include re-use of PSI: Bulgaria, Croatia, Czech Republic, Estonia, Finland, France, Latvia, Lithuania, Netherlands, Poland, Portugal, Slovak Republic."[4] Deeper investigation of national EU member states law shows existing differences of PSI defin ition.So me countries use PSI definit ion as "document", "informat ion", "data" or other.
These differences could be classified to those which are usin g: 1) same defi nition of PSI as it is provided in PSI Directive (Austria (including Vienna, Vo rarlberg, Lo wer Austria, Tyro l, Styria, Salzburg and Upper Austria lands), Cyprus, Slovak Republic (fro m 2012), Greece (fro m 2006 t ill 2014), Lu xembourg and Spain ) and 2) those which have adopted specific definition (all others).
It could be classified also to 4 groups: document group (defin ition of PSI is strongly related to a document), information group (PSI is understood as some kind of informat ion), data group (PSI is understood as a data, record, file and etc.) and other group (PSI is understood as representation of content, knowledge, matters and other).
A document group could be classified to the smaller parts: 1) Docu ment (Austria (including Vienna, Vorarlberg, Lower Austria, Tyrol, Styria, Salzburg and Upper Austria lands), Cyprus, Slovak Republic (fro m 2012), Greece (fro m 2006 till 2014), Lu xembourg, Spain used the same defin ition as it is provided in PSI Direct ive; 2) Documented informat ion (Estonia defines it as information wh ich is recorded and documented.It means that information which is not documented is not under the scope of PSI; Latvia it defines as "documented informationinformat ion whose entry into circulat ion can be identified"); 3) Admin istrative documents (France and Portugal it defines as "administrative documents"); 4) Documents, informat ion and data (Greece (fro m 2014) implements Revised PSI Direct ive and provides updated conce ption of PSI: it is the documents, informat ion and data which are made availab le online as a dataset or via programming interfaces in open machine-readable format wh ich complies with open standards); 5) Docu ments, record and data (Ireland it defines as document and it means all or part of any form of document, record or data); 6) Document and any content (Roman ia it defines as a document and it means any content or part of such content).
An information group could be classified to: 1) In formation and metadata (Czech Republic it defines as "publicly disclosed informat ion".Also includes metadata which is named as "accompanying informat ion"); 2) Any informat ion (Bu lgaria defines it as any information co llected or created by a public sector body ); 3) Public information (It is defined as public informat ion in The Netherlands and Poland (all information about public matters constitutes public informat ion) and Slovak Republic (till 2012) used very narrow definition of PSI limited to information only about public money, state/municipality property and concluded agreements); 4) Informat ion in the form of a document, case, register, record and other documentary material (Slovenia it defines as information orig inating fro m the field of work of the body and occurring in the form of a document, a case, a dossier, a register, a record or other documentary material d rawn up by the body, by the body in cooperation with other body, or acquired fro m other persons); 5) Information means content (UK 2015-2015 it defines as information and it means any content or part of s uch content).
A data group could be classified to these parts: 1) Data (Croatia defines it as any data owned by a public authority.It means that ownership of rights to data is i mportant.Hungary 2005-2015 it defines as data of public interest and data made public on grounds of public interest); 2) Data collections (Den mark (fro m 2005) g ranted access not only to document but also to data collections.Exception was made to information produced for co mmercial activ ities of a public sector body's, or for wh ich third parties hold a non-material right."Data collection" means reg isters or other sy stematic lists for which use is made of electronic data processing ); 3) Files (Den mark (till 1985) granted access to files only if a) they were the substance of the authority's final decision on the outcome of a case; b) the docu ments contain only information that the authority had a duty to record; c) the documents are self-contained instruments drawn up by an authority to provide proof o r clarity concerning the actual facts of a case, or d) the documents contain general guidelines for the consideration of certain types of cases); 4) Any record (Germany it defines as any record stored in any way).
Another group consists of these parts: 1) Presentation and message (Finland it defines as "written or visual presentation, and also as a message"); 2) Presentation of acts, facts and information (Italy it defines as document and it means the presentation of acts, facts and informat ion); 3) Any representation of content (Vorarlberg land (of Austria) till 2015 it defines as any representation of content, or part of it which pu blic -sector body may decide whether to allow reuse); 4) Representation of acts, facts or informat ion -and any compilation (Malta till 2015 it defines as document and it means any representation of acts, facts or information -and any co mpilation of such acts, facts or informat ion); 5) Knowledge (Lithuania it defines as "document shall mean any information; information shall mean knowledge availab le to a State or local authority institution or body"); 6) Known factual statements on matters (Carinthia and Burgenland lands (of Austria) it defines as factual statements on matters which at the time of the request for informat ion are known to the body); 7) Matter or recording and compilation of information (Sweden it defines as a document and it means any written or pictorial matter or record ing which may be read, listened to, or otherwise comprehended only using technical aids.It also includes a compilation of information taken from material recorded for automatic data processing).
Analysis of definit ions shows the most EU Member States use different terms to describe the Public sector information.Looking fro m open government data perspective it is not so important which term is used "document" or "data", but is more i mportant to see can definition set extra limits which goes out of the scope of the PSI directive.
Firstly, it is risky to limit PSI defin ition only to administrative documents or documented information.Because there are p lenty of information held by public bodies which are not ad min istrative documents or just "documents", "documented info r-mat ion" in bureaucracy terms.E.g. live traffic data from municipality's sensors/cameras do not fit the requirements of administrative documents.
Secondly, the ownership of informat ion should be also avoided (ex.belongs to public sector institution), because some works belongs to public domain and according to Revised PSI Directive it should be provided (e.g.fro m arch ives, museums) as public do main works.Also, there are d iscussions [7] held by open data community: does PSI belongs to Public sector or it belongs to public domain (because it was produced by public money).
Thirdly, it is a co mmon mistake, that PSI is defined as informat ion given to re-use.E.g. "Document held by a public sector body: a "document" regarding which the pu blic sector body is entitled to allow re-use" [8].PSI limitation to only information which is provided for re -use by institution should be avoided, because it limits the right to get access to information and init iative to ask for new info rmation which is not provided by institution.On other hand such limitation is right of each EU member country according to PSI Direct ive recital 9: "This Directive does not contain an obligation to allow re-use of documents.The decision whether or not to authorise re-use will remain with the Member States or the public sector body concerned.This Directive should apply to documents that are made accessible for re-use when public sector bodies license, sell, disseminate, exchange or give out information."[2] Finally, imp lementation of Rev ised PSI Directive makes changes in PSI termino logy, because PSI concept was updated by including metadata, open and machine readable formats, and up-coming understanding what is open data.Examp le, Spain PSI regulation fro m 2015: Document: All information or part thereof, whatever the medium or form o f expression, whether textual, graphic, audio visual or audiovisual, including associated metadata and data content with the highest levels of accuracy and disaggregation.[9] There is a hope that the implementation of Rev ised PSI Directive will help for Co mmunity to adopt definitions of PSI, which will be constructed to support open data concept, e.g. as it did Greece [10].

Analysis of the legal requirements applied to OGD licensing in national PSI law
In each country all public sector data which is released as Open Govern ment data (or, in other wo rds, PSI ready for re-use) is regulated by national PSI law.Depending on the country there could exist also land's (e.g.Wiener Informationsweiterverwendungsgesetz (WIWG)), mun icipality's, public institution's PSI laws, but those laws fo llo ws the federal or national PSI regulation.Our analysis is limited to the main national PSI regulation.
Analysis has discovered that there exist differences concerning legal requirements applied to OGD licensing among EU Member States.Those differences in the most cases are not significant and fo llo ws EU PSI Directive's rules , but there exist so me contradistinctive, e.g. in Spain re-user of PSI could be fined up to 100000 Eu r for violation of re-use policy; in Croatia up to 100000 HRK/~13000Eur could be fined public authority which prevents or restricts the exercise of the right of access to information and re-use of information.
In order to make those requirements understandable in mach ine-readable format, primer version of the ontology has been developed.

3
The Ontology of Open Government Data Licenses Framework for a Mashup Model (OGDL4M) The Ontology of Open Govern ment Data Licenses Framewo rk fo r a Mashup Model (OGDL4M) is an OW L ontology formalizing a legal knowledge of Open Government data licensing Framework to represent legal requirements applied to open go vernment datasets in mash-up model.OGDL4M is still under develop ment and we expect to present it by the end of 2016.This section describes a part of OGDL4M which is dedicated to present legal requirements for open government data licensing, terms of use and sanctions for the violations which is coming fro m national re -use of public information (PSI) laws of EU Member States.

Merged ontologies
OGDL4M Ontology re-use some elements of other ontologies:

Objective
The objecti ve of this part of ontology is to help to create the theoretical model, which will be ab le to inspire an automat ic or the semi-automat ic co mputational model that could represent national law PSI rules of EU Member countries, especially when licensing regime is not clear, or when conditions for re-use are not provided.

Formation of list of all the relevant terminolog y and production of glossary
We have developed a table in wh ich we indicate the terms, provide legal description, legal source and normalized definition.Berne convention §12 The act or process of modifying of the content of the database

Overview
OGDL4M consist of core part , which p resents general concept, and other parts based on each country profile.
In the Fig. 1 the frag ment of core part of OGDL4M is presented.A class LKIF:LegalSource should be indicated as a source of all possible regulatory sources which could apply to dataset released by public sector.E.g. if information system wants to evaluate what are legal requirements (Class ConditionsOfPSIReuse) applied to dataset (class OpenGovDatasets), it must investigate all possible legal sources (class LKIF:LegalSource).In the Fig. 2 the fragment o f core part of OGDL4M is presented, which exp lains the model how different national PSI regulation could be explained.National PSI regulation provides rules which explain are those PSI re-use requirements are obligatory or only reco mmended, or maybe those (some/all/none) requirements are not regulated by national law, but must/ could be regulated by local PSI law.
Class NationalPSILaw represents National PSI law, which is legally binding and sets general countries legal rules applied to re-use of PSI conditions.The class Gen-eralRequirements is subclass of NationalPSILa w and represents general countries legal ru les applied to re-use of PSI conditions.Those rules could be obligatory (class ObligatoryGR) or only reco mmended (class RecommendedGR) to apply.In those cases when rules are obligatory to apply, all other contra legal rules set on dataset is not valid.E.g. in Finland OGD could be released only as part of public domain, so no other rules can apply to OGD released by public institution in Fin land, especially other license which do not rep resents public do main (like cc-by), or if there is licence missing it is clear that dataset is part of public domain.
In other cases when national PSI regulation only reco mmends to follow some rules, usually PSI policy is dedicated to the lower authority.The class of SpecialRequirements is used to present link to local psi law (of land, mun icipality, institution or other public authority) and limitation of possible use (without deeper analysis) of the ontology for current country profile.

OGDL4M model for the country profile
Legal requirements applied to OGD licensing in the national PSI law is modeled by identifying which requirements are obligatory to apply and wh ich are reco mmen ded.Requirements are presented by identifying the legal source of the requirement (concrete part of the law).It is necessary for quick cross -checking and evaluation is that norm still valid.If there are sanctions of violation of PSI re-use policy class Sanc-tioningRegime is used.In country profile ISO 3166 code is attached to PSILaw, Jurisdiction, GeneralRequirements classes .In a Fig. 3 the OGDL4M model for Spain is presented.The class PSILawES represents legally binding Spain's PSI law -Law on the re-use of public sector information it's amend ments [9].General requirements (class GeneralRequirementsES) are obligatory to apply.Model explains that: 1) there could OGD released by no cond itions/license (class NoConditionsForReuse) or 2) OGD could be regulated only by standard license.Standard license has a bunch of conditions: license should be open, not limit competit ion, not restrict re-use and etc.The model exp lains that there could be only two licensing regimes in Spain, but in reality we found 33 during the Survey.
Licensing regimes which do not fo llo w Spain's PSI law's regulation are not correctly applied.In Fig. 4 specific conditions for re-use is presented.Those conditions basically implement similar to non-derivative license conditions (cannot be altered).It means that licensed OGD released by public authority cannot be used in mash -ups in Spain.There is a conflict of legal norms which requires not limiting re-use of PSI and asks for not altering the PSI.The conditions which limits PSI re -use are supported by sanctions.
In Fig. 5 sanctioning regime is exp lained.If OGD released by Spain with a license, those sanctions should apply, e.g.failure to indicate the date of the latest update of information will cost to developer from 1000 to 10000 Eur.

Conclusions and future work
The legal analysis of EU Member States national PSI law has indicated the main problems: national law is not harmonized with the EU law, that's why situation in most EU countries is different and requires deeper analysis of the national legal domain.OGDL4M ontology could be a very useful tool for evaluating country's PSI policy, and could be used as a tool for automatic or semi-automatic evaluation of the legal regulat ion of datasets released by the public bodies of EU Member countries in the future.Moving forward we expect to enrich the ontology and present the completed version of OGDL4M by the end of 2016.

Table 3 .
Example of the glossary