Sampling Method for the Flow Shop with Uncertain Parameters

. In the classic approach for optimization problems modelling well deﬁned parameters are assumed. However, in real life problems we ﬁnd ourself very often in a situation where parameters are not deﬁned precisely. This may have many sources like inaccurate measurement, in-ability to establishing precise values, randomness, inconsistent information or subjectivity. In this paper we propose a sampling method for solving optimization problems with uncertain parameters modeled by random variables. More-over, by applying conﬁdence intervals theory, the execution time has been signiﬁcantly reduced. We will also show an application of the method for the ﬂowshop problem with deadlines and parameters modeled by random variables with the normal distribution.


Introduction
Practical machine scheduling problems are numerous and varied. They arise in diverse areas such as flexible manufacturing systems, production planning, communication, computer design, etc. A scheduling problem consists in finding sequences of jobs on given machines with the objective of minimizing some function. In a simpler version of the problem, the flow shop scheduling, all jobs pass through all machines in the some order. In this paper, we deal with another specific version of the problem called a permutation flow shop scheduling problem where each machine processes the jobs in the same order (F ||w i U i ).
Research concerning problems of algorithms arrangement refers mainly to deterministic models ( [1]). To solve such problems, which belong in the majority of cases to the NP-strongly hard class, rough algorithms are applied successfully ( [2], [6], [8]). They are mainly based on local optimalization methods: simulated annealing, tabu search and a genetic algorithm. Determined by these algorithms, solutions only slightly differ from best solutions. However, in practice, in the course of a process realisation (according to the fixed schedule) very often it appears that certain parameters (e.g. the task completion time) are different from the initial ones. By the lack of the solutions stability in the fixed schedule there may occur a big mistake, which makes such a schedule unacceptable. That's why a necessity exists to construct such models and methods of their solutions that would take into account potential changes in the course of parameters process realisation and generate stable solutions ( [4], [9], [15]).
Problems of arrangements with uncertain data may be solved using methods based on elements of probability calculus ( [13], [14], [5]). In this work we deal with the flow shop problem of tasks arrangement with the latest completion times and the minimalisation of the costs sum of tardy tasks ( [7], [10]). On the basis of this problem the resistance to a random variable of constructive solutions of parameters according to tabu search metaheuristics is examined.
In this study, permutation flow shop scheduling problem of the typical situation of the flexible production systems which occupy a very important place in recent production systems are taken into consideration with random variables due dates.

Flowshop problem
Let J = {1, . . . , N } be a set of jobs to be executed on M machines from the set M = {1, . . . , m}. At any given moment a specific machine can execute at most one job and all jobs needs to be executed without the preemption. Any job j ∈ J needs to be executed in sequence on every machine and if a job is being executed on the machine k then it means that it has been executed on machine the k − 1 (k = 2, 3, . . . , M ). Jobs are executed in a given order determined by a permutation with a constraint that the permutation is applied on all machines. The execution of a job on a machine is named operation. In this section we consider flowshop problem with due dates defined as a set (p i,j , w i , d i ) (i = 1, . . . , N , j = 1, . . . , M ) where p i,j are processing times of operations, w i are weights for all jobs and d i are due dates for all jobs, but they are defined as random variables with the distribution N (d i , c · d i ).
Let Π be the set of all permutations of the set J . For every permutation π ∈ Π we define as a completion time of execution job i on machine j in reference to permutation π.
The cost of execution of operations determined by permutation π is as follows We consider the optimization problem where the goal is to find a permutation π * ∈ Π which minimizes cost of execution of all operations: w π(i)Ũiπ(i) .

Tabu search
Rough algorithms are used mainly to solve NP-hard problems of discrete optimization. Solutions determined by these algorithms are found to be fully satisfactory (very often they differ from the best known solutions approximately less than a few percent). One of realizations of constructive methods of these algorithms is the tabu search, whose basic elements are • movement -a function which transforms one task into another, • neighborhood -a subset of acceptable solutions set, • tabu list -a list which contains attributes of a number of examined solutions.
Let π ∈ Π be a starting permutation, L T S a tabu list and π * the best solution found so far.

Movement and Neighborhood
Let π = (π(1), . . . , π(n)) be any permutation from the set Π. By π k l (l = 1, 2, . . . , k − 1, k + 1, . . . , n) we denote the permutation obtained from π by a change of positions π(k) with π(l). We say, in such a case, that a π k l permutation was generated from π by a type of a swap move s k l (i.e. a permutation π k l = s k l (π)). Then, let M (π(k)) be a set of swap moves of an element π(k), a set of all such movements The neighborhood of an element π ∈ Π is a set of permutations By implementing an algorithm from the neighborhood permutations whose attributes are on the tabu list L TS are removed.

The Tabu Moves List
To prevent a cycle from arising some attributes of each movement are put on the list of tabu moves. It's served by means of the FIFO queue. Performing a movement s r j ∈ M(π) (i.e. generating it from π ∈ Π the π r j permutation) on the tabu list L T S attributes of this movement, i.e. the triple (π(r), j, F(π r j )) are put down.
Assuming that we examine a movement s k l ∈ M(β) generating from β ∈ Π a permutation β k l . If on the list L T S there is a triple (r, j, Ψ ) such that β(k) = r, l = j and F(β k l ) ≤ Ψ , then such a movement is forbidden and removed from the set M(β).

Robustness
Due to the fact that we consider uncertain environment and the actual values are not known at the moment of the algorithm execution, we need a way to measure the quality of solutions. We assume that we have a set of reference test data and there are two algorithms: the examined one and the reference one (classic in our paper). The scenario of verification is as follows. For a specific test instance both algorithms propose solutions π p (examined) and π d (reference) which we expect to be robust. Then we generate a set of disturbed subinstances based on the test instance and for every subinstance we calculate the cost of execution with reference to π p (cost w p ) and π d (cost w d ). We also calculate an "almost optimal" solution for the subinstance w * . Having that we calculate a relative error for all subinstances, then calculate relative error for all instances and by that we are able to take conclusion about the algorithm. We do that for both algorithms and compare the final values.
More formally, let define the basic robustness coefficient as a relative distance between examined and the reference solution, i.e. let w be a cost of "robust" solution (w p or w d ) and w * be the reference "almost optimal" solution cost. Then relative error and it shows how many percent w is worse than w * . In some scenarios we need to compare the sets of values based on the disturbed data, so we propose an extension to the basic error definition. Let consider s disturbed data instances, w 1 , . . . , w s be cost values obtained by examined algorithm and w * 1 , . . . w * s be reference cost values. Then we define extended relative error as follows: Let ψ be a data instance, D(ψ) be a set of disturbed data subinstances obtain from ψ based on the random variable d i and Then we define as solution robustness π A,ψ (obtained by the algorithm A for instance ψ) based on set of disturbed data D(ψ).
Let Ω be a set of test data for the examined problem. Then by we define as the robustness coefficient for the algorithm A on the set of test data Ω. The less the value is, the better the algorithm is, i.e. solutions obtained by the examined algorithm are more robust and random changes in the actual data don't affect significantly the final execution cost.

Sampling method
The idea of the method is as follows. In every tabusearch algorithm iteration we are testing different candidate solutions from the neighbourhood to find the best one and improve the current global best solution. Let assume an instance (p i,j , w i , d i ) and that we examine the candidate solution, a permutation π. Due to the fact that d i is defined as random variable, we don't know the actual data that may come. What we propose in the sampling method is to simulate this actual scenario by testing the candidate solution on a sample of disturbed data generated from d i . We can describe the method in the following main steps: . . , l}. By that we get l deterministic instances (p i,j , w i , d k ).
2. For every deterministic instance a cost is calculated based on the candidate solution π. By that we obtain sample W π 1 , . . . , W π l . 3. We calculate a mean x and a standard deviation from s the sample which are used in the comparison by tabusearch. Of course less is better.
One can easily notice that in the above description we are missing the size of the sample, i.e. the value of l. We want the l to be as small as possible and meaningful on the other hand. To determine that we apply confidence intervals theory with the standard significance level α = 5%. Please note that we don't know the distribution of the sample W π 1 , . . . , W π l . By that we apply the following variant of significance level formula: where l is a sample size (at least 30), x is the sample mean, s is the sample standard deviation and µ α is the value of random variable N (0, 1) under the condition: Φ(µ α ) = 1 − α 2 what, according to the assumptions, provide µ α = 1, 96.
To sum up, the comparison criteria in the tabusearch method needs to be extended by the following code: 4: For every new instance calculate cost in context of candidate solution π. We obtain sample W π 1 , . . . , W π l . 5: Calculate mean x and standard deviation s from sample. 6: z ← l 7: if d 5%x or l > N · M then 8: return (x, s) 9: else 10: l ← l + 10%l 11: Go to point 3 A remark: random sample don't need to be generated with every calculation of comparison criteria function. It is enough to generate it once for a specific data instance and this way it has been implemented in the computational experiments.

Computational experiments
In this section we describe the method for generating random data and elaborate the efficiency of the proposed method. The tabu search algorithm presented in section 3 has been appropriately applied. As a reference algorithm we use classic deterministic implementation of tabusearch which we compare with the adaptation of tabusearch for the sampling method. Moreover, the following customization has been applied: start permutation: π = (1, 2, . . . , n), tabu list size: n, algorithm's iterations count: n.
In order to measure the efficiency of the proposed method we examine the computational complexity by checking samples' size and the robustness.

Test data
Both implemented algorithms have been examined on the commonly used reference test data which comes from [12] where he hired variants with jobs' numbers Having that for every random instance we generated 100 disturbed deterministic subinstances according to distribution of the random variable d i , in total we obtained 90 · 5 · 100 = 54000 subinstances. The robustness coefficient 1 has been determined for both algorithms and results are presented in the next section.

Results
Before performing the complete set of tests we have checked whether sample measure on the mean only is good enough or is it worth introduce the standard deviation as well. It turned out the introducing standard deviation has a negligible influence on the final result, so all the tests have been executed with applying the mean only.
We executed tests for two main algorithms, but we examined the proposed method in more details to have a better insight into value that it brings. In the Tables 1, 2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2 there is a complete summary of all tested variants with main results. A quick observation leads to the conclusion that the proposed method gives much better results than the classic approach. Moreover, in all cases the results obtained by applying the sampling method are better in the sense of statistical significance than results obtained by the classic way. The only thing which is puzzling is the fact that the more value c is the less advantage of sampling method is. Another general observation is that for almost all test instances the sampling method gives better robustness than the classic approach. When the sample size is based on confidence intervals theory, we obtain the level 98, 5% of advantage. The best one is for the sample size N · M when we have the value 99, 4%. Even for a small sample size N · M · 0.03 the sampling method gives the level 92, 3% of advantage, finally we lose the advantage for a very small sample size N · M · 0.01.
Let's take a closer look at the relationship between the classic approach, the sampling one with sample size N M and the sampling one with sample size based on confidence intervals based sample size. On the chart one can see the robustness level (Figure 1). We can easily observe that for all values of parameter     c the advantage of the sampling method is indisputable. We can also observe that we can get the best robustness when the sample size is N M . Finally, let's discuss the relationship between the algorithms' results with different sample size with reference to the robustness level ( Figure 2). We can see that within the range [1..0.1] · N M the robustness levels are very close to each other. Only when the sample size is getting smaller (from 0.05N M ), the robustness level is getting significantly worse.

Conclusions
In this paper we proposed a sampling method to solve optimization problems with uncertain parameters modeled by random variables. By applying confidence intervals we wanted to keep a very good balance between the execution time and the robustness level. We have seen an application of the method for the flowshop problem with deadlines and parameters modeled by random variables with the normal distribution. Based on the performed computational experiments we can conclude that the proposed method gives substantially more robust solutions than the classic approach and by applying confidence interval theory we achieve the goal of keeping balance between the execution time and the robustness level.