Greedy Heuristics for Automatic Synthesis of Eﬀicient Block-Structured Scheduling Processes from Declarative Specifications

. This paper introduces a new Greedy heuristic algorithm for the automatic synthesis of block-structured scheduling processes that satisfy a given set of declarative ordering constraints, as well as basic theoretical results that support the correctness of this algorithm. We propose two heuristics that can be used with this algorithm: hierarchical decomposition heuristic and critical path heuristic. We also present initial experimental results supporting the e ﬀ ectiveness and e ﬃ ciency of our proposed algorithm and heuristics.


Introduction
There are many formalisms for the specification of business process models.Blockstructured models have certain advantages compared with other approaches [3].
It is useful and quite intuitive to declaratively specify desired properties of process models.We are interested in constructing process models that are consistent with the given declarative specification [6].This problem has practical applications in scheduling tasks encountered in manufacturing systems [4].
Manual construction of large process models satisfying a set of ordering constraints is almost impossible or at least not scalable.Automatic generation based on exhaustive exploration of the space of possibilities is difficult because of the huge number of potential candidates.The only feasible solution is to design automatic approaches based on efficient heuristic algorithms that are able to drastically prune the huge search space.
In this paper we focus on scheduling processes.In this case, the declarative specification defines the scheduling constraints.We are interested in determining optimal or at least, as efficient as possible, block-structured scheduling processes that satisfy the scheduling constraints.The optimization criterion requires the minimization of the total completion time.Optionally, we can add other constraints, like for example imposing upper bounds for the amount of parallel work.This constraint may result from the practical restriction regarding the limited availability of certain resources.In particular: i) our processes are defined only using sequential and parallel composition; ii) each activity must have exactly one instance in the schedule.
Our work was mainly influenced by previous results of [4,5].Nevertheless, our results are different in many aspects.Most important, our heuristics are deterministic and different.We are using the hierarchical decomposition of a graph, while [4,5] are based on the more complex modular decomposition.We also provide theoretical results to support our work.Finally, we performed experiments with larger graphs, and our preliminary results suggest that our algorithm might be faster.
Note that there are many theoretical studies on evolutionary algorithms and randomized (meta/hyper) heuristics applied to combinatorial optimization algorithms [1].Such works could be considered for the further expansion of our results by comparison of our method with different, but related approaches.

Process Models
Let us consider a finite nonempty set of activities Σ.A trace t ∈ Σ * is a sequence of zero or more activities 3 .The length of a trace t = a 1 a 2 . . .a n is n and this is denoted as |t| = n.The empty trace is denoted by ε and |ε| = 0.For each nonempty trace t = a 1 a 2 . . .a n we define: i) the head of t as head(t) = a 1 , and ii) the tail of t as tail(t) = a 2 . . .a n .
A language L ⊆ 2 Σ * is defined as a set of traces.We can define certain operations with languages.The sequential composition of two languages L 1 and L 2 denoted by L 1 → L 2 , is defined as follows: This notation can be extended for a trace t and a language L as: t → L = {t} → L. The parallel composition of two traces t 1 and t 2 , denoted by t 1 t 2 , is defined as: -For each nonempty trace t we have: t ε = ε t = {t} -For each nonempty traces t 1 and t 2 we have: The parallel composition L 1 L 2 of two languages L 1 and L 2 is now defined as: Let us consider the set {→, |, } of three binary operators used for constructing blockstructured processes.The operator → denotes sequential composition, the operator | denotes the nondeterministic choice, and the operator denotes parallel composition.
Let us denote with a, b, c, . . . the activities of Σ and with P, Q, R, . . .process terms.Process terms can be defined recursively as follows: The language L(P) of process P is recursively defined as follows: Operator has higher precedence, operator → has middle precedence, and operator | has lower precedence.All operators are associative, while and | are also commutative.
Process terms represent models of processes and they can be graphically depicted as trees or as block-structured flowcharts, as shown in Figure 1.In what follows we focus on process models with the following particularities: -They represent sets of possible activity schedules.A schedule must contain exactly one instance of each activity.-They use sequential (→) and parallel ( ) operators.Scheduling processes are deterministic, explaining why nondeterministic choice is not used in their definition.
Rigorously defining scheduling processes requires the introduction of the support set supp(P) of a process P that denotes the set of activities that occur in process P.
A block-structured scheduling process is recursively defined as follows: -If a is an activity then a is also a process such that supp(a) = {a}.
-If P and Q are processes such that supp(P) ∩ supp(Q) = ∅ then P → Q and P Q are processes with supp(P → Q) = supp(P Q) = supp(P) ∪ supp(Q).
For example, processes a c → b and a (c → b) are well-formed, and: It is not difficult to observe that if P is a well-formed block-structured scheduling process then all its traces t ∈ L(P) have the same length |t| = |supp(P)|.
3 Declarative Specification of Ordering Constraints

Activity Ordering Graph
Based on domain-specific semantics, one can impose ordering constraints of the activities of a process.For example if two activities are independent and there are enough resources to be allocated to each of them then those activities can be scheduled for parallel execution.However, if an activity depends on the output produced by another activity, then the first activity can be scheduled for execution only after the completion of the second activity, i.e. there is a sequencing constraint between their execution order.Finally, if two activities define distinct action options then their execution is incompatible, so it cannot occur within the same schedule, i.e. they are mutually exclusive.
The ordering constraints imposed on each trace of a scheduling process are declaratively specified using an activity ordering graph G = V, E [5] such that: -V is the set of nodes and each node represents an activity.
-E ⊆ V × V is the set of edges.Each edge represents an ordering constraint.Set E is partitioned into two disjoint sets E → and E with the following meaning: • Set E → specifies sequential ordering constraints.If (u, v) ∈ E → then activity v cannot occur in a schedule without being preceded by activity u.E → is a partial ordering, i.e. it is transitive and antisymmetric, so it cannot define cycles.
• Set E specifies mutual exclusion constraints.If (u, v) ∈ E then activities u and v are incompatible, so they cannot occur within the same schedule.Set E defines a symmetric relation.
Intuitively, satisfaction of mutual exclusion constraints requires the availability of nondeterministic choice operator in process definition.As we assumed that this operator is not available for scheduling processes, we will now focus only on sequential ordering constraints, i.e. we assume that E = ∅ so E = E → .This means that the ordering graph is a directed acyclic graph with arcs defining sequential ordering constraints.
If t = a 1 a 2 . . .a n is a trace of a scheduling process and u, v are two activities of t then u precedes v in t, i.e. u t → v if there are 1 ≤ i < j ≤ n such that a i = u and a j = v.Let G = V, E be an ordering graph and let t be a trace containing all the activities of V with no repetition.Then t satisfies G, written as t | = G, if and only if E → ⊆ t →.This means that trace t cannot contain activities ordered differently than as specified by G.
The language L(G) of an ordering graph G is the set of all traces that satisfy G, i.e: Let P be a scheduling process and let G = V, E be an ordering graph.P satisfies G written as P | = G, if and only if: -L(P) ⊆ L(G), i.e. each trace of P satisfies G, and supp(P) = V, i.e. all the activities of V are relevant and occur in P.
The set of processes P such that P | = G is nonempty, as it contains at least one sequential process defined by the topological sorting of G.

Optimal Scheduling Processes
Each activity has an estimated duration of execution that is represented using a function d : Σ → R + .The duration of execution d(P) of a process P is defined as follows: - The minimum duration of execution of a process that satisfies a given ordering graph G, denoted with d MIN (G), is defined as: An optimal scheduling process that satisfies a given ordering graph G is a process P * with a minimum duration of execution, i.e. it satisfies: There is a finite and nonempty set of processes that satisfy an ordering graph G, so the optimal scheduling process trivially exists.Moreover, as there is an exponential number of candidate processes satisfying G, we postulate that the computation of the optimal scheduling process is generally an intractable problem.Therefore, we will be focusing on developing efficient heuristic algorithms that are able to produce "suboptimal" or "good enough" scheduling processes using a reasonable computational effort.

Heuristics for Suboptimal Processes
We introduce two heuristics that are used to derive an efficient Greedy heuristic algorithm for computing a suboptimal scheduling process satisfying an ordering graph.

Hierarchical Decomposition Heuristic
Let G = V, E be an ordering graph.Remember that G is a directed acyclic graph defining the sequential ordering constraints imposed on a scheduling process.
-For each node v ∈ V we define the set I(v) of input neighbors of v as follows: -For each node v ∈ V we define the level l(v) of v as a function l : v → N such that: - Proposition 1. (Hierarchical Decomposition Process) Let G = V, E be an ordering graph.The hierarchical decomposition process P HD (G) associated to G is defined as: Then Figure 2 shows an ordering graph G 1 , and two processes P 1 and P 2 such that G 1 | = P 1 and G 1 | = P 2 .The hierarchical decomposition of G 1 is induced by the partition of its vertices {{a, c}, {b}}, so we can easily notice that P 1 is the hierarchical decomposition process of G 1 .Observe that:

Critical Path Heuristic
Observe that an activity u cannot start unless all the neighboring activities from the input set I(u) are finished.This time point is denoted with start(u).Activity u that started at start(u) will finish at time f inish(u) = start(u)+d(u).The values start(u) and f inish(u) for each activity u ∈ V can be computed using the critical path method [2], as follows: The maximum value of the finishing time of each activity, known as critical path length, is a lower bound for the duration of execution of the optimal scheduling process.

Reducing the Duration of Execution
Analyzing Figure 2, we can observe that the duration of execution of the hierarchical decomposition process can be reduced by doing a transformation that pushes the parallel composition operations upper in the process tree.However, this transformation is not always possible.We now provide sufficient conditions that enable the transformation and guarantee that the duration of execution of the resulted process is lower than of the original process.Referring at Figure 2, the key observation is that the set of nodes of graph G 1 can be partitioned in two subsets U 0 = {a, b} and U 1 = {c} such that there are no arcs cross-linking nodes in U 0 to nodes in U 1 or nodes in U 1 to nodes in U 0 .Note that such a decomposition is not possible for the graph G 2 from Figure 2. We consider the most general situation of reducing the duration of execution of a process (P 1 P 2 ) → (Q 1 Q 2 ).Similar results can be obtained for the processes of the form ( and let G = V, E be an ordering graph such that P | = G.Let us also assume that Then it follows that:

Automatic Synthesis Algorithm
Let G = V, E be an ordering graph and let U be the undirected graph obtained by removing the orientation of arcs of graph G.We denote with G(W) and U(W) the subgraphs of G and U induced by a subset W ⊆ V of nodes.
Let {V 0 , V 1 , . . ., V m } be the partition of node set V defined by the hierarchical decomposition of G.We define the following sets of nodes: Each undirected graph U i can be partitioned into connected components that induce the partition {U 1 , U 2 , . . ., U k i } of the set W i of nodes such that k i > 1.This situation is intuitively described in Figure 5.
Following the result of Proposition 3, the hierarchical decomposition process P defined for subgraph G i can be transformed into process P such that: Consider for example the sample ordering graph G 3 from Figure 6.The partition of nodes corresponding to the hierarchical decomposition of Its duration of execution is 46.
We observe that for i = 1 the set , d} can be partitioned into {{a, b}, {c, d}}, so k 1 = 2. Using this observation we determine the transformed process It follows that by applying our proposed transformation we were able to significantly reduce the duration of execution of process P 6 from d(P 6 ) = 46 to d(P 7 ) = 36.We can combine this transformation with the hierarchical decomposition heuristic d HD provided by Proposition 1 or with the critical path heuristic d CP provided by Proposition 2 to design an efficient Greedy algorithm for the automatic synthesis of a suboptimal scheduling process that is consistent with a declarative specification.
Let G = V, E be an ordering graph.The algorithm can be defined as a function proc(W, G(W)) that takes a subset of nodes W ⊆ V, the subgraph G(W) of G induced by W and returns a suboptimal process that satisfies G(W).
Let {V 0 , V 1 , . . ., V m } be the partition of node set V defined by the hierarchical decomposition of G. Function proc(V, G(V)) is recursively defined as follows: -If m > 0 and V 0 has at least two elements then for each 0 ≤ i ≤ m determine the number k i of the sets of the partition of set W i induced by the connected components of the undirected graph U i obtained from the directed graph G i .We have Let i be the largest index for which k i > 1.Such an index always exists as k 0 = |V 0 | > 1. Select an index 0 ≤ j ≤ i for which the estimated duration of execution of the "synthesized process" (to be defined in what follows) is minimized.
We now recursively define the "synthesized process" and its estimated duration of execution, in terms of function proc.Let G = V, E be an ordering graph, let {V 0 , V 1 , . . ., V m } be the partition of node set V defined by the hierarchical decomposition of G, and let us assume that m > 0 and |V 0 | > 1.The "synthesized process" P j and its estimated duration of execution d G−ES T (P j ) with ES T ∈ {HD, CP} is: -If 0 < j < m then let us consider the partition {Y 1 , Y 2 , . . ., Y k j } of W j .Then P j = (

Experimental Evaluation
We implemented our algorithm in Standard C using the 64-bit GCC compiler, version 5.1.0and tested it on a x64-based PC with Intel(R) Core(TM) i7-5500U CPU at 2.40GHz running Windows 10.In this section we present the experimental results that we obtained with this implementation.The experiment was organized as follows: -We randomly generated a number of directed acyclic graphs of increasing sizes representing ordering constraints, as well as random durations of execution for each activity of the graph.The parameters of a data set are: number n of graph nodes, number ng of generated graphs, minimum and maximum durations dmin and dmax of each activity, and the density factor f ∈ [0, 1] of the graph.The higher is this factor the more dense is the graph.Value of f is given as a percentage.-For each graph G we estimated the basic metrics given by the hierarchical decomposition heuristic d HD (G) and by the critical path heuristic d CP (G).-For each graph G we computed the suboptimal scheduling process that satisfies G using the Greedy heuristic algorithm proposed in Section 4.4, in two variants: using the hierarchical decomposition heuristic and respectively using the critical path heuristic, to confirm the result claimed by Proposition 4, and to compare the results obtained for d G−HD and d G−CP .
The graph data sets were generated for the following values of the parameters: ng = 100, n ∈ {10, 50, 150, 300, 500, 700}, dmin = 1, dmax = 20, and density factor f ∈ {15%, 30%, 45%, 60%, 75%}.For each test we recorded the total execution time and the values of the metrics of interest.We labelled each data set to reflect its number of nodes and density.For example if n = 500 and f = 30% then the label is 500-30.  1 presents the total execution time of running the synthesis algorithm for each data set.We observe that increasing the number of nodes, as well as the density, determines the increase of the execution time.Note that these times cover the processing of batches of 100 graphs.This means for example that the average time to process one graph of the 700-75 data set is approximately 1 second, i.e. our algorithm is quite fast.Secondly, that results of both experiments show that CP heuristic performs better than HD heuristic for almost all the graphs of the data set (there are few exceptions difficult to observe on the figures).Thirdly, the heuristics CP and HD tend to give closer results for higher density ordering graphs, as can be noticed by comparing the "closeness" of the cost values obtained for G-CP and G-HD for each data set 700-30 and 700-60.

Conclusions
We proposed a new Greedy algorithm for the automatic synthesis of block structured scheduling processes that satisfy given declarative ordering constraints.We presented basic theoretical results that support the correctness of this algorithm.We proposed two heuristics that can be used with this algorithm: hierarchical decomposition and critical path.Our initial experimental results support the effectiveness of our proposals and suggest that the critical path heuristic performs better.

Fig. 1 .
Fig. 1.Tree model of process a → b c (left) and its equivalent block-structured model (right)

Fig. 2 .
Fig.2.Ordering graph G 1 (left), process P 1 (middle) and process P 2 (right) max {d(a) + d(b), d(c)} Clearly d(P 1 ) ≥ d(P 2 ) and P 2 is optimal (other satisfying processes are strictly sequential, incurring a higher duration of execution).But note that if d(a) ≥ d(c) then d(P 1 ) = d(P 2 ) = d(a) + d(b) so the optimal scheduling process has duration d HD (G 1 ) which shows that we can have equality in the inequality resulted from Proposition 1.However, if d(a) < d(c) the optimal scheduling process has duration d(P 2 ) = max {d(a) + d(b), d(c)} < d HD (G 1 ) = d(c) + d(b).

Proposition 2 .
(Critical Path) Let G = V, E be an ordering graph and let d CP (G) be its critical path length.Then d CP (G) is a lower bound of the duration of execution of the optimal scheduling process d MIN (G), i.e. d MIN (G) ≥ d CP (G).
and is duration of execution is estimated tod G−ES T (P m ) = max k m i=1 {d G−ES T (Y i )}.Proposition 4. (Duration of Execution of Greedy Suboptimal Processes) Let d G−ES T (G)be the duration of execution of the suboptimal process that was computed with the Greedy algorithm using heuristic ES T ∈ {HD, CP}.Then this process satisfies ordering graph G and d HD

Table 1 .
Total execution time in seconds for processing each data set Table