Using SPEM to Analyze Open Data Publication Methods

. Open Data is a current trend in sharing data on the Web. Public sector bodies maintain large amounts of data that, if re-used, could be a source of significant benefits. Therefore Open Government Data initiatives have been launched in many countries in order to increase availability of openly licensed and machine-readable government data. Because Open Data publishers face various challenges, methods for publication of Open Data are emerging. However these methods differ in focus, scope and structure which might complicate selection of a method that would suit specific needs of an organization. In this paper we discuss the possible benefits of constructing Open Data publication methods from a meta-model and we use the Software and Systems Process Engineering Meta-Model version 2.0 to analyze similarities and differences in structure of three Open Data publication methods.


Introduction
Open Data is data "that can be freely used, re-used and redistributed by anyonesubject only, at most, to the requirement to attribute and sharealike" [22]. Further details on what "open" means are provided by the Open Definition [21]. Legal and technical openness are the key aspects of ensuring reusability of data [19]. Legal openness is achieved by open licensing of data, i.e. by making data available under a license that permits its free re-use and redistribution. In order to minimize the technical obstacles Open Data should be made available for free download as a complete dataset in a machine-readable format. Re-use of data held by public sector bodies could be a source of social and economic value [1]. Despite the fact that a number of countries have already launched their Open Government Data initiatives, many important datasets remain closed [30]. Publishing Open Government Data could be a challenging task and publishers often face various organizational, legal, technical and other barriers [11], [29].
In order to help the Open Data publishers to overcome the barriers and to promote the recommended practices for its publication various Open Data publication methods have been developed [23], [27][28]. On one hand knowledge about how to open up data is being gathered, on the other hand this knowledge is documented in different methods and their heterogeneity might make integrating their content difficult. Zuiderwijk et al. [31] also point out that the Open Data publication process should be standardized across an organization. Such a standardization requires sharing the information about the Open Data publication process across the organization.
In the software engineering domain practitioners are also struggling with difficulties in combining and integrating content about the development processes due to the heterogeneity of the sources of this content and with providing the development teams with an access to a shared body of information about the development process [18]. This situation led to development of the Software and Systems Process Engineering Meta-Model (SPEM)a conceptual framework and meta-model providing concepts that allows "modeling, documenting, presenting, managing, interchanging, and enacting development methods and processes" [18].
The goal of this paper is to discuss the possible benefits of constructing Open Data publication methods from a meta-model and the possible benefits of use of SPEM 2.0 and to analyze similarities and differences in the structure of three Open Data publication methods using the SPEM 2.0 meta-model elements. Based on this analysis we assess how the analyzed methods are constructed.
This paper is structured as follows. In the following chapter Open Data publication method is defined and examples of the existing methods are provided. Then the potential benefits of constructing an Open Data publication method from a meta-model in general and the benefits of using SPEM 2.0 in particular are discussed. Related work is described in the next section. In the following section a short overview of the SPEM 2.0 meta-model elements is provided. Then the results of the structural analysis of the three selected Open Data publication methods are presented. Conclusions are summarized at the end of this paper.

Open Data Publication Methods
Brinkkemper [3] provides definitions of the terms method, technique, tool and methodology in the information systems development domain. He defines a method as "an approach to perform a systems development project, based on a specific way of thinking, consisting of directions and rules, structured in a systematic way in development activities with corresponding development products", whereas he views a methodology of information systems development as "scientific theory building about methodical information systems development" [3]. He also points out that the term methodology is sometimes used incorrectly standing for method. We share the view of Brinkkemper that the term methodology should be used to refer to the theory of methodical aspects of some particular field. Therefore we use the term Open Data publication (ODP) method in this paper which we broadly define as an approach to the publication of Open Data consisting of recommendations about what should be done or achieved when publishing Open Data or how it should be implemented.
Number of ODP methods have already been developed. For example Project Open Data [23] provides guidance, tools and case studies in order to help agencies in the USA to implement the Open Data policy. Socrata, a provider of solutions for publication of Open Data, also provides its own ODP method called "Open Data Field Guide" [28].
As of September 2016 a list of forty guides for implementation of the revised PSI (Public Sector Information) Directive (Directive 2003/98/EC [7] amended by the Directive 2013/37/EU [6]) and for publication of Open Data has been collected during the Share-PSI 2.0 project [27]. This list contains both international as well as national ODP methods of the European states. The national ODP methods are usually written in the local language of the particular country and the list [27] also shows that they differ in what practices for publication of Open Data and PSI are recommended by these methods. These methods do not differ only in language and content but also in format and structure. For example the Open Data Handbook of Flanders [9] represents a document in PDF structured into chapters. On the other hand DCAT application profile implementation guidelines [5] are represented in a form of web pages with a common structure.

Benefits of Constructing Open Data Publication Methods from a Meta-Model
Brinkkemper [3] introduced the term method engineering and he points out that metamodelling techniques are needed for design and evaluation of methods. Gonzalez-Perez et al. [8] argue that software development methods constructed from a metamodel "usually offer a higher degree of formalisation and better support for consistent extension and customisation, since the concepts that make their foundations are explicitly defined". Making data available for re-use requires adequate workflows [29]. These workflows could be set up by implementing the suitable ODP method. However, as we indicated with the examples of the existing ODP methods, these methods might differ in scope, focus or structure which might complicate selection of a method that would suit the needs of a particular Open Data publisher or finding compatible ODP methods in situations where more than one method need to be applied.
Explicit definition of the concepts that the ODP methods are built from could make identification of the same or similar concepts across different ODP methods easier. This in turn could help the Open Data publishers in assessing, selecting and customizing the relevant ODP methods. Development and implementation of the ODP methods should therefore benefit from use of meta-models.
Software and Systems Process Engineering Meta-Model [18] is an Object Management Group (OMG) specification. It tries to address some of the problems that organizations face when developing systems such as lack of an easy access to a shared body of information about the development process, difficulties in combining content from different sources describing methods and practices due to their different presentation and style and difficulties in defining systematic development approach that fits the specific needs of an organization. The primary focus of SPEM are software development processes but it allows representing processes in other domains as well which is demonstrated in the specification with a case study describing a process for investments clubs [18].
Representing the Open Data publication methods as the SPEM method content and processes could bring the Open Data practitioners the similar benefits as it brings to the software development organizations. Possible benefits to the ODP methods resulting from the key SPEM 2.0 capabilities are summarized in table 1. Ability to represent processes based on different lifecycle models and approaches Standardized ODP method content and processes could be configured for use in specific projects or environments, e.g. ODP processes could be configured to be in line with the approaches of different types of Open Data publishers. Plug-in mechanism that enables processes to be extended or customized without modifying the original content Generally applicable recommendations for publication of Open Data could be extended or customized with specific guidelines, e.g. guidelines for publication of a specific category of data.
New processes could be assembled from reusable process patterns Process patters for implementing the recommendations provided by an ODP method could be developed. Open Data practitioners following the given ODP method could re-use the patters in their processes.
Process components might be linked with inputs and outputs but the development team could be allowed to choose the appropriate activities and techniques If appropriate ODP methods could focus on the required or recommended outputs rather than activities of the Open Data publication process. Open Data practitioners might be allowed to select the most appropriate activities or technique for achieving the outputs depending on the situation.

Related Work
Several authors discussed or used SPEM in various contexts. Bendraou et al. [2] compared six UML-based languages for software process modeling including SPEM 1.1 and SPEM 2.0. Henderson-Sellers [10] analyzed differences in granularity and ontologies of several standards including SPEM.
Martınez-Ruiz at al. [13] propose an extension to SPEM that would allow better modelling of the software process variability. Rodríguez-Elias et al. [24] adapted SPEM for modelling and analysis of knowledge flows in software processes.
Moraitis and Spanoudakis [15] present the Gaia2JADE process for multi-agent systems development that is described using SPEM specification. Another examples of the SPEM use could be found in the work of Brusa et al. [4] where a process for building a public domain ontology is based on SPEM and also in the work of Loucopoulos and Kadir [12] where BROOD (Business Rules-driven Object Oriented Design) process is represented using SPEM. Saldaña-Ramos et al. [25] proposed a competence model for testing teams and represented it using SPEM.

SPEM 2.0 Meta-Model Elements
Key feature of SPEM is a separation of the method content definitions from its application in the development process [18]. Method content represents libraries of reusable content such as definition of tasks, roles, tools or work products that is independent on its application in the specific step of a development lifecycle. In SPEM process represents a specific way of performing some project, e.g. software development project using a specific technology. Separation of the reusable method content from the development processes allows defining various processes with their own lifecycles and work breakdowns that build upon the same base components providing recommendations about how to achieve the common development goals. SPEM also reflects the fact that projects are unique and allows configuration of the method content and processes to fit the needs of a specific project.
SPEM provides meta-model classes as well as the UML stereotypes (SPEM 2.0 UML 2 Profile) for representing elements of both method content and processes [18]. According to [18] the key method content elements are Task Definitions, Work Product Definitions, Role Definitions and Guidance.
Task Definition represents an assignable unit of work and it is assigned to specific Role Definitions [18]. A Task Definition could be broken down into Steps. Work Product Definition represents work products that are consumed, produced or modified by Task Definitions. Role Definition is "a set of related skills, competencies, and responsibilities of an individual or a set of individuals" [18]. Categories can be used to categorize the content into logical groups such as requirements management.
The key process elements are Activities and "use" elements for representing use of the method content elements in the context of a specific process. Activity represents a unit of work within a Process [18]. Activities can be nested to form breakdown structures. Although the Process has a distinct symbol in SPEM 2.0, it is represented by the Activity class in the SPEM 2.0 UML profile [18]. Therefore only the Activity is taken into account in the analysis described in the following section.
Task Use, Role Use and the Work Product Use are specializations of the abstract Method Content Use element that represents a use of a particular method content element in the context of some Activity. Method Content Use element ensures the separa-tion of the method content from a process and it allows overriding the method content elements with the specifics of the given process.
Role Use and the Task Use instances are linked to the corresponding Activity instances with instances of the Process Performer which can also be used to distinguish how a particular role is involved in the process, e.g. it can be used to present the RACI (responsible, accountable, consulted, informed) relationships [18]. Similarly a Process Parameter links an Activity or a Task Use with a Work Product Use to indicate whether the Work Product Use is an input or an output of the Activity/Task Use or both. However in the SPEM 2.0 UML profile the Process Parameter instances are not represented as classes but as associations with the ParameterIn (input), Param-eterOut (output) or ParameterInOut (input and output) stereotypes.
Additional information about both the method content and the process elements could be provided by Guidance. In order to distinguish various types of guidelines Guidance can be classified with Kinds. SPEM 2.0 specification [18] also contains a Base Plug-in which provides instances of Kinds for Guidance as well as for Activity, Category, Work Product Definition and Work Product Relationship.

Analyzing Open Data Publication Methods Using SPEM
In this section we use the SPEM 2.0 meta-model to analyze structure of three Open Data publication methods. First the analyzed ODP methods are briefly introduced, then the analysis approach is explained. Results of the analysis are discussed at the end of this section.

Analyzed Open Data Publication Methods
We selected three ODP methods in whose development we were involved because we are familiar with their structure and semantics. The following methods were analyzed:  [16] represents a process-oriented approach to publication of Open Data. Both the Share-PSI 2.0 Best Practices and the COMSODE method target an international audience and thus they provide no recommendations specific to a particular region. Czech OGD standards [14] represent a national ODP method that should be followed by the public sector organizations in the Czech Republic.

Analysis Approach
Neither of the analyzed ODP methods is based on the SPEM meta-model. For each of these methods SPEM 2.0 elements were identified that were considered appropriate to represent the content of the given ODP method based on their semantics. Elements for which stereotypes are defined and summarized in the Annex A of the SPEM 2.0 specification [18] were considered in the analysis. If the content of the analyzed ODP methods was described or represented in a way that is independent on the process, appropriate SPEM method content elements were chosen. If it was not possible to separate the content from the process, e.g. in cases where the description referenced a particular part of the process, the SPEM process elements were selected. The Czech OGD standards are represented as a set of web pages. Sometimes one page contained both the process-independent and the process-dependent content. In such cases more than one SPEM meta-model element was considered to represent the content.
Because all of the analyzed ODP methods contain guidance, we further analyzed what kind of guidance is provided by mapping the provided guidance to the guidance kinds specified in the SPEM 2.0 Base Plug-in. SPEM 2.0 [18] also provides means for managing the whole libraries of the method content and process, i.e. Method Plugins. However this part of the SPEM 2.0 specification was not considered in the analysis because it focuses on the extensibility and variability mechanism rather than on the structure of the content. Compared to the Share-PSI 2.0 Best Practices the COMSODE method as well as the Czech OGD standards are constructed from a broader set of concepts and they not only specify what should be done in order to publish Open Data but also who should be involved and what the expected outcomes are in terms of the work products. They both define a process for publication of Open Data that is broken down into phases and activities. COMSODE method also specifies the recommended steps for achieving the specified tasks. X Step X Task Definition X X Task Use X X Work Product Definition X X Work Product Use X X COMSODE method, especially in the annex 2 [17], clearly separates elements such as activities, phases or performers (roles) and links them with relationships (for example activities and performers are linked with the responsibility relationships using the RACI chart). Czech OGD standards are also highly structured, however description of phases or individual activities sometimes presumes a certain sequence of work. The Czech OGD standards are intended to provide the recommended process that should be followed within the Czech public administration. The process orientation of the Czech OGD standards is therefore in line with this purpose. However extracting knowledge applicable in other contexts would require separation of the method content from the process itself. Table 3 summarizes kinds of guidance provided by the analyzed methods. The following kinds of guidance were not identified in the analyzed methods: Checklist, Estimate (metric kind), Estimation Considerations (metric kind), Estimating Metric (metric kind), Example, Report, Reusable Asset, Supporting Material and Roadmap.

Analysis Results
As the name suggests practices are the main kind of guidance provided by the Share-PSI 2.0 Best Practices. However external sources are referenced as well which were classified as the SPEM whitepapers. The COMSODE method explains the concept of Open Data, provides a glossary of terms as well as a wide range of practices for conducting the tasks and activities. A Guideline provides "additional detail on how to perform a particular task or grouping of tasks" [18]. This additional detail on how the Open Data should be published is provided by a reference internal directive that is a part of the Czech OGD standards. Czech OGD standards also include reference Open Data publication plans that can be used as templates. Guidance on how to register datasets in the Czech National Open Data Catalogue is provided as well which represents the tool mentor element.

Conclusions
Openly licensed machine-readable data could be a source of social and economic value [1], [29]. Open Data movement is strong in the public sector domain and the release of data held by public sector bodies for re-use is sometimes even encouraged by the legislative means such as the European PSI directive [6]. Methods that provide the publishers with recommendations how to overcome the problems commonly faced when publishing Open Data are emerging. Use of metamodels could help the Open Data practitioners when assessing, selecting and customizing the Open Data publication methods because the concepts that form the building blocks of these methods are more likely to be explicitly defined.
Software and Systems Process Engineering Meta-Model [18] is a common metamodel for representing the development methods and processes which is intended to make their development, maintenance and interchange easier. In this paper we analyzed structure of three ODP methods by identification of the SPEM 2.0 meta-model concepts that were considered suitable for representing the content of the analyzed methods.
This paper presents results of an ongoing research. In the future research we will further assess suitability of SPEM 2.0 as the meta-model for engineering of the ODP methods. Zuiderwijk et al. [31] point out that multiple versions of the processes for publication of Open Data might be required for different types of data. Therefore we will also focus on the extension and variability mechanism offered by SPEM and its potential application for building bodies of information about publication of Open Data that could be shared and customized to fit the needs of the specific organizations and the types of data they manage and publish.