A Study on Self-adaptation in the Evolutionary Strategy Algorithm

. Nature-inspired algorithms attract many researchers worldwide for solving the hardest optimization problems. One of the well-known members of this extensive family is the evolutionary strategy ES algorithm. To date, many variants of this algorithm have emerged for solving continuous as well as combinatorial problems. One of the more promising variants, a self-adaptive evolutionary algorithm, has recently been proposed that enables a self-adaptation of its control parameters. In this paper, we discuss and evaluate popular common and self-adaptive evolutionary strategy (ES) algorithms. In particular, we pre-sent an empirical comparison between three self-adaptive ES variants and common ES methods. In order to assure a fair comparison, we test the methods by using a number of well-known unimodal and multimodal, separable and non-separable, benchmark optimization problems for different dimensions and population size. The results of this experiments study were promising and have encouraged us to invest more efforts into developing in this direction.


Introduction
The most successful methods in global optimization are based on stochastic components, which allow escaping from local optima and overcome premature stagnation. A famous class of global optimization methods is ES. They are exceptionally successful in continuous solution spaces. ES belong to the most famous evolutionary methods for black-box optimization, i.e., for optimization scenarios, where no functional expressions are explicitly given and no derivatives can be computed [1] [2]. ES imitate the biological principle of evolution [3] and can serve as an excellent introduction to learning and optimization. They are based on three main mechanisms oriented to the process of Darwinian evolution, which led to the development of all species. Evolutionary concepts are translated into algorithmic operators, i.e., recombination, mutation, and selection. This paper focuses on the self-adaptation in evolution strategies (ES). Primarily, the self-adaptation has been gaining popularity due to the flexibility in adaptation to different fitness landscapes [8]. This method enables an implicit learning of mutation strengths in the real-valued search spaces. Self-adaptation bases on the mutation operator that modifies the problem variables using the strategy parameters to search the problem and parameter spaces simultaneously [8]. In line with this, the best values of problem variables and strategy parameters survive during the evolutionary process.
Evolutionary algorithms (EAs) are intrinsically dynamic, adaptive processes [21]. Therefore, setting the strategy parameters, controlling the behavior of these algorithms, fixed during the run is in contrast with an idea of evolution on which bases an evolutionary computation (EC). In line with this, the evolution of evolution has been developed by Rechengerg [10] and Schwefel [6], where the strategy parameters are put into a representation of individuals and undergo operations of the variation operators. As a result, the values of strategy parameters those modify the problem variables, which are the most adapted to the fitness landscape, as determined by the fitness function of the problem to be solved during the evolutionary search process. However, the process of the simultaneously evolution of strategy parameters together with the problem variables is also named a self-adaptation in EC.

Evolution Strategies
Evolution strategies derive inspiration from principles of biological evolution. We assume a population, P, of so-called individuals. Each individual consists of a solution or object parameter vector x ∈ R n (the visible traits) and further endogenous parameters, s (the hidden traits), and an associated fitness value, f(x). In some cases the population contains only one individual. Individuals are also denoted as parents or offspring, depending on the context. In a generational procedure [4]. First, one or several parents are picked from the population (mating selection) and new off-spring are generated by duplication and recombination of these parents; then, the new offspring undergo mutation and become new members of the population; finally, Environmental selection reduces the population to its original size.

Pseudo code For Evolutionary Algorithms
Using these ideas a computer algorithm can be developed to analyze a problem and its data to achieve an optimal solution to that problem, as shown in Fig. 2. First, an initial population P(t) is generated randomly and evaluated. Second a mutation technique is applied to adjust the children to a new set P'(t). The fitness of those children is evaluated and then children are chosen from the set of parents P(t) and children P'(t) to form the new parent set P(t+1). The loop terminates if either the maximum number of iterations are reached or the desired solution is found. As outlined here by Castro: [17] Pseudo code For Evolutionary Algorithms Initialize P (t) Evaluate P (t) While not Terminate do P'(t) = mutate P(t) Evaluate P'(t) P(t+1) = P'(t) or best of {P'(t) U P(t)} t = t + 1 Loop

Mutation And Parameter Control
Mutation introduces small, random and unbiased changes to an individual. These changes typically affect all variables. The average size of these changes depends on endogenous parameters that change over time. These parameters are also called control parameters, or endogenous strategy parameters, and define the notion of "small", for example via the step-size σ. In contrast, exogenous strategy parameters are fixed once and for all, for example parent number µ. Parameter control is not always directly inspired by biological evolution, but is an indispensable and central feature of evolution strategies [4]. Control parameters are encoded into chromosomes and undergo actions by the variation operators (e.g., crossover and mutation) using self-adaptive parameter control. The better values of parameter variables and control parameters have more chances to survive and reproduce their genetic material into the next generations. This phenomenon makes EAs more flexible and closer to natural evolution [5]. This feature was firstly introduced in ES by Schweffel [6]. Parameter control addresses only one side of the parameter setting, where the strategic parameters are changed during the run. In contrast, when the parameters are fixed during the run, an optimal parameter setting needs to be found by an algorithm's developer. Typically, these optimal parameters are searched during a tuning. In general, the taxonomy of parameter setting according to Eiben and Smith [7] is as illustrated in Fig. 2 Sample Heading (Third Level). Only two levels of headings should be numbered. Lower level headings remain unnumbered; they are formatted as run-in headings.

Deterministic parameter control
Deterministic parameter control [23]means that the parameters are adjusted according to a fixed time scheme, explicitly depending on the number of generations t. It may be useful to reduce the mutation strengths during the evolutionary search, in order to allow convergence of the population. There are many examples where parameters are controlled deterministically. Efren Mezura-Montes et al. [22] make use of deterministic parameter control for differential evolution based constraint handling.

Adaptive parameter control
Adaptive parameter control methods make use of rules the practitioner defines for the heuristic. A feedback from the search determines magnitude and direction of the parameter change. An example for an adaptive control of endogenous strategy parameters is the 1/5-th success rule for the mutation strengths by Rechenberg [10]. Rchenberg's rule adapts the mutation strengths in the following way: the whole population makes use of a global mutation strength r for all individuals. If the ratio of successful candidate solutions is greater than 1/5-th, the step size should be increased, because bigger steps towards the optimum are possible, while small steps would be a waste of time. If the success ratio is less than 1/5-th the step size should be decreased. This rule is applied every g generations [3].

Self-Adaptation
This article focuses on the self-adaptation in evolution strategies (ES). Primarily, the self-adaptation has been gaining popularity due to the flexibility in adaptation to different fitness landscapes [8]. This method enables an implicit learning of mutation strengths in the real-valued search spaces. Self-adaptation bases on the mutation operator that modifies the problem variables using the strategy parameters to search the problem and parameter spaces simultaneously [9]. In line with this, the best values of problem variables and strategy parameters survive during the evolutionary process.
There are different strategy parameters in self-adaptive EAs, e.g., probability of mutation pc, probability of crossover pm, population size Np, etc. In order to better adapt the parameter setting to the fitness landscape of the problem, the self-adaptation of strategy parameters has been emerged that is tightly connected with a development of so named evolution strategies (SA-ES) [10,11,9]. SA-ES were especially useful for solving the continuous optimization problems [12].
As each EAs, the SA-ES also consists of the following components [7]:  representation of individuals,  evaluation function,  mutation,  crossover,  Survivor selection.

Next Generation Selection Strategies
Two different selection processes were used to pick the next generation of data to be sent through the algorithm. In standard evolutionary strategy literature "given μ parents generating λ offspring (λ >= μ)" [17] The simplest way is to use the (μ,λ)-ES technique where the new set of children is used as the parents for the next iteration, the algorithm in Fig. 1 would use P(t+1) = P'(t). A second technique is to use (μ+λ)-ES technique [17], by looking at the fitness of the parents and the fitness of the new children and sorting their resulting finesses from best to worst. A next generation set is created from the top μ best parents and children, the algorithm in Fig. 1 would use P (t+1) =best of {P' (t) U P (t)}.

Mutation Operators
In classical ES, three types of mutation operators have been applied: an uncorrelated mutation with one step size, an uncorrelated mutation with n-step sizes and correlated mutation using a covariance matrix [7]. In the first mutation, the same distribution is used to mutate each problem variable, in the second, different step sizes are used for different dimensions, while in the last mutation, not only the magnitude in each dimension but also the direction (i.e., rotation angle) are taken into account [13,14]. Self-adaptation of control parameters is realized in classical evolution strategies (ES) using the appropriate mutation operators controlled by strategy parameters (i.e. mutation strengths) that are embedded into representation of individuals. The mutation strengths determine the direction and the magnitude of the changes on the basis of the new position of the individuals in the search space is determined. This article analyzes the characteristics of classical mutation operators, like uncorrelated mutation with one step size and uncorrelated mutation with n step sizes [15], and the comparison with newest approaches and improvements of self-adaptation strategies like the correlation and hybridization in the recent literature.
As schematically illustrated in Table 1, parameters that yield good solutions with high probability are selected as survivors. It is quite reasonable, at least for local optimization. Concerning selection of the set of parameters to be adapted, we are chosen in our experiments three options from Back (1996) :

Uncorrelated Mutation with One
Step Size (type 1) As this algorithm loops through each iteration, all parents are being mutated by the same single step size. Using the same standard deviation σ to create all children will have the result that "lines of equal probability density of the normal distribution are hyper-spheres in an l-dimensional space." [17] ─ Chromosomes:  x 1 ,…,x n ,   ─ ' =  • exp( • N(0,1)) ─ x' i = x i + ' • N(0,1) ─ Typically the "learning rate"   1/ n ½

Uncorrelated Mutation with Individual Step size (type 2)
As this algorithm loops through each iteration, each parent is being mutated by an individual unique step size. Using the different standard deviation σ to create each child will have the result that "lines of equal probability density of the normal distribution are hyper-ellipsoids." [17] ─ Chromosomes:  x 1 ,…,x n ,  1 ,…,  n  ─ ' i =  i • exp(' • N(0,1) +  • N i (0,1)) ─ Two learning rate parmeters: ' overall learning rate and  coordinate wise learning rate ─   1/(2 n) ½ and   1/(2 n ½ ) ½

Correlated Mutation (type 3)
As this algorithm loops through each iteration, each parent is being mutated by an individual unique step size. Just like the previous uncorrelated mutation with individual step sizes, the formulas use a different standard deviation σ to create each child. The correlated mutation formulas add an additional step of introducing rotation angles to "describe the coordinate rotations necessary to transform the uncorrelated mutation vector to a correlated mutation vector. Now the previous hyper-ellipsoids can be rotated randomly at angle α (t) to give the data more freedom of movement through the plane. ". [17] ─ Chromosomes:  x 1 ,…,x n ,  1 ,…,  n , 1 ,…,  k  where k = n • (n-1)/2 ─ and the covariance matrix C is defined as: The mutation mechanism is then: is the covariance matrix C after mutation of the  values ─   1/(2 n) ½ and   1/(2 n ½ ) ½ and   5°

Numerical and Equations Experiments
As discussed above, the ES achieve self-adaptive search in different manners. To understand similarities and differences of both mechanisms, first a comparison of them through numerical experiments has been carried out. In the experiments, the ES types are applied to several test functions picked from [20]. Usually, a comparative study of different optimization methods is performed by using a set of test functions from the literature. In this paper, five well-known benchmark functions are used. We have chosen two unimodal and three multimodal functions; the functions can also be grouped into separable or non-separable. These are the Sphere function, the generalized Rosenbrock's function, the generalized Rastrigin's function, the Ackley's function, and the Griewank's function. The above functions are defined respectively as: The sphere function is one of the simplest benchmarks. It is a continuous, unimodal and separable problem. The generalized Rosenbrock's global optimum lies inside a parabolic shaped flat valley. It is easy to find the valley but convergence to the global optimum is difficult. This problem is unimodal and non-separable. The generalized Rastrigin function is a complex multimodal separable problem with many local optima. The Ackley's function is a multimodal non-separable problem and has many local optima and a narrow global optimum. The fifth function is highly multimodal and non-separable. It has many widespread local minima, which are regularly distributed.

Algorithm Parameters Used for Experiments
The ES has many options that should be selected considering both the difficulty of the problems to be solved and the resources available for computation. Based on the recommendation by Back (1996), the following options are adopted:  The comma strategy with 15 parents and 100 children, i.e., (15,100)-ES is used.
 The discrete recombination is used for the decision variables, and the panmictic intermediate recombination for the standard deviations and the rotation angles.
 All the initial values of αij are set at π/4.   From table 1 and 2 we summarize how many times each algorithm get best results:

Behavior in Higher-Dimensional Search Spaces
For problems in the five-dimensional search space, all the algorithms succeeded in finding good solutions. Especially, the ES Type 3 and Type 2, show good performances and the auto-adaptation process outperform the adaptation one. Increasing the dimension of the search space, performances of the ES algorithms get poorer because the non-separability of the test function becomes serious. The results show clearly that the ES type 2 and 3 is more sufficient and perform better in comparing Sphere and Rosenbrock functions. Hence , using the rotated angle recommendation give the ES forces to reach a good solutions as well as the strategy parameter with individual step sizes help ES to improve solutions as shown between ES type 1 and type 2.

Behaviors for Multi-Modal Objective Functions
The results show that the auto-adaptation ES mostly get hard in finding the optimum. The best solutions found by the ESs often stayed at points that are not even local optima. This result suggests that the mechanism of the self-adaptive mutation may have some difficulty in multi-modal objective functions. Since the ES has many options and parameters, and the above experiment uses only a single test function and the recommended options and parameters in literature (Back, 1996), more efficient search may be achieved by the ES by adjusting the options and parameters. It is a subject of further study as well as more detailed examination of the behavior of the self-adaptive mechanism.

Conclusion and Future Work
The self-adaptation of strategy parameters is the most advanced mechanism in ES, where the mutation strength strategy parameters control the magnitude and the direction of changes generated by the mutation operator. There are three mutation operators in classical ES, as follows: the uncorrelated mutation with one step size, the uncorrelated mutation with n step sizes and the correlated mutations. This paper analyses the characteristics of these three mutation operators. In last decades several studies and papers focus at the self-adaptation as tools to improves evolutionary strategy algorithm performance, for that many techniques are used, either mathematics such including geometric function to improve solutions in multimodal functions [18], incorporating machine learning technique to expect suitable parameters which leads to best solutions [2]. Although the research domain of the ES seems to be already explored and, by this flood of the new nature-inspired algorithms every day, unattractive for the developers, the results of our experiments showed the opposite. Additionally, hybridization adds to a classical self-adapting ES the new value. As the first step of our future work, however, the next step is to extend this comparative study with another well-known meta-heuristics such as PSO, DE, HS in order to recognize the weakness of the self-adaptation ES for eventual future work improvement by hybridization with other local search techniques.
Regardless of the problems mentioned, self-adaptation is a state-of-the-art adaptation technique with a high degree of robustness, especially in real-coded search spaces and in environments with uncertain or noisy fitness information.