Data-Driven Process Discovery and Analysis

Lion’s share of process mining research focuses on the discovery of end-to-end process models describing the characteristic behavior of observed cases. The notion of a process instance (i.e., the case) plays an important role in process mining. Pattern mining techniques (such as traditional episode mining, i.e., mining collections of partially ordered events) do not consider process instances. In this paper, we present a new technique (and corresponding implementation) that discovers frequently occurring episodes in event logs, thereby exploiting the fact that events are associated with cases. Hence, the work can be positioned in-between process mining and pattern mining. Episode Discovery has its applications in, amongst others, discovering local patterns in complex processes and conformance checking based on partial orders. We also discover episode rules to predict behavior and discover correlated behaviors in processes, and apply our technique to other perspectives present in event logs. We have developed a ProM plug-in that exploits efficient algorithms for the discovery of frequent episodes and episode rules. Experimental results based on real-life event logs demonstrate the feasibility and usefulness of the approach.


Preface
The rapid growth of organizational and business processes managed via information systems made a big variety of data available that consequently created a high demand for making available data analysis techniques more effective and valuable. The second edition of the International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA 2012) was conceived to offer a forum where researchers from different communities and industry can share their insights in this hot new field. The symposium featured a number of advanced keynotes illustrating new approaches, presentations on recent research, a competitive PhD seminar, and selected research and industrial demonstrations. The goal is to foster exchanges among academic researchers, industry, and a wider audience interested in process discovery and analysis. The event was organized jointly by the IFIP WG 2.6 and W.G 2.12.
Submissions covered theoretical issues related to process representation, discovery, and analysis or provided practical and operational experiences in process discovery and analysis. To improve the quality of the contributions the symposium fostered the discussion during the presentation, giving authors the opportunity to improve their work by extending the presented results. For this reason, authors of accepted papers and keynote speakers were invited to submit extended articles to this post-symposium volume of LNBIP. There 17 submissions and 6 papers accepted for publication.
In the first paper "Lightweight RDF Data Model for Business Processes Analysis," Marcello Leida et al., presents a lightweight data representation model to implement business process monitoring transparently to the data creation process.
The second paper by Santiago Aguirre et al., "Combination of Process Mining and Simulation Techniques for Business Process Redesign: a Methodology Approach," addresses the problem of using simulation for organizing process redesign. In particular the simulation model is constructed based on the discovery analysis and on the waiting times calculated through a statistical analysis of the event log data.
The third paper by Wil van der Aalst et al., "Improving Business Process Models Using Observed Behavior," proposes a technique for aligning reference process models to observed behaviours, introducing five quality dimensions that are balanced and used to introduce enrich the reference model. In particular this work analyses the effect of introducing the similarity dimension in addition to other dimensions previously adopted in the literature.
The fourth paper by Sjoerd van der Spoel et al., "Process Prediction in Noisy Data Sets: A Case Study in a Dutch Hospital," applies classifier algorithms for predicting the outcome and duration of a process with the objective of improving the capability of predicting the cash flow of health care organizations.
The fifth paper by Andreas Wombacher et al., "Towards Automatic Capturing of Semi-Structured Process Provenance", discusses the problem of integrating provenance information with process monitoring systems, improving the automation of the procedure. The authors propose to achieve this result by using the access logs on file used in the execution of the process and provide a demonstration of the analysis that can be performed using that approach.
The sixth paper by Jan Mendling, "Managing Structural and Textual Quality of Business Process Modelss," gives an overview on how empirical research informs structural and textual quality assurance of process models.
We gratefully acknowledge the strong research community that gathered around the research problems related to process data analysis and the high quality of their research work, which is hopefully reflected in the papers of this issue. We would also like to express our deep appreciation for the referees' hard work and dedication. Above all, thanks are due to the authors for submitting the best results of their work to the symposium on Data-Driven Process Discovery and Analysis.
We are very grateful to the Università degli Studi di Milano and to IFIP for their financial support, and to the University of Fribourg, and the University of Athabasca.