KPI Data Anomaly Detection Strategy for Intelligent Operation and Maintenance Under Cloud Environment

. In the complex and changeable cloud environment , monitoring and anomaly detection of the cloud platform is very important. In the cloud environment , because of the complex structure of the system , the characteristics of the monitoring data are constantly changing. In order to adapt to the change of the data characteristics , the operators need to adjust the anomaly detection model to solve the problem of dynamic KPI anomaly detection , this paper transforms the adjustment process of anomaly detection model into a general Markov decision process by means of reinforcement learning technology , which cloud reduce the human cost caused by anomaly detection model adjustment , and improve the effective detection rate of the anomaly detection model. Comparing the three typical KPI curves with other optimization strategies , and finally verify the effectiveness of the strategy used in this paper.


Introduction
With the development of cloud computing technology, most enterprises move services into the cloud to achieve better performance and security. In order to ensure the reliability and stability of the cloud platform, the operator obtain a large number of monitoring data from different levels of the cloud platform forming a real-time KPI (Key Performance Indicator) curve to observe the running state of the key components of the cloud platform and using anomaly detection model to analyse historical KPI data build a prediction model of KPI curve under normal conditions. In practice, enterprise will formulate marketing strategy according to the market, which will result in the change of data characteristics of the KPI curve which is monitored in the cloud, makes the anomaly detection model cause a lot of false alarm and form the alarm storm.
In order to solve the problem of above, this paper proposes an intelligent KPI data anomaly detection strategy. Firstly, differential and autocorrelation functions are The remaining part of this paper is organized as follows. The second chapter introduces the whole idea of this article. The third chapter introduces the intelligent anomaly detection framework constructed in this paper. The fourth chapter introduces the related anomaly detection contrast experiments on three typical KPI data, and the fifth chapter introduces the related work with this paper. The sixth chapter summarizes the work and prospects for the future.

2
The core idea   KPI data is essentially a continuous time series data, The characteristics of data are periodic, stable and unstable. On the periodic determination, for the monitoring data set DS, this paper uses the differential technique to carry out the difference processing to the global data and compare the changes of the global variance before and after the difference. If the monitoring data set is periodic, the global variance ( ) before the difference will be far greater than the global variance ′ � ( )� after the difference, so the use of Formula 1 can determine whether the monitoring data set is periodic.
On the determination of stable and unstable, we calculate the autocorrelation function� of monitoring data set, such as formula 2, It can identify whether the time series data have stability, if the autocorrelation function of the KPI curve does not decrease rapidly with the change of the adjacent time points to 0, then the KPI curve has unstable and vice versa.

Automatic adjustment of time series model
The Q-Learning [11] algorithm is the main method to solve the model free reinforcement learning. Its basic idea is to record the utility value of the state in each action, that is, the action state value, by establishing a function table. The action state value represents the validity and value of the action selected under the current state, and also as the basis for the next strategy to select the action, and updates the action state value of the current state through the action state value of the next state, as shown in Figure 2(the data in the diagram is used for demonstration): The initial value of function table Q(s, a) is (a). In one strategy, 0 is selected randomly from the action of non-negative value in the initial state, In Figure 2 is selected so that the state becomes S1, and the utility value is 0.1 by formula 3, where is the immediate reward given by the reward function, ( +1 , +1 ) is the utility value of the next state, 0 in Figure 2, and the update function table as shown in Figure   2 (c) at the end of a strategy.
In the process of action selection, the Q-Learning algorithm is selected according to the non-negative value of the corresponding Q(s, a)function table, such as the next policy, figure 2 (c) as the basis, the optional action at s 0 is {a 1 , a 2 , a 3 , a 4 }, and the optional action at s 1 is {a 2 , a 3 , a 4 }, The execution of each strategy will update the Q(s, a) function table until the Q(s, a) function table converge as Figure 3, and at this time the optimal strategy is selected to select the maximum cumulative return value, that is, the maximum utility value for each pair state-action is selected, for example, the optimal strategy in Figure 3 is a sequence τ = { 4 , 2 , 4 , 1 , 1 }.

Figure 3. Convergent function table ( , )
In the Q-Learning algorithm, the setting of the reward function is static. It gives rewards or penalties depending on whether the current action makes the model state better than the initial state or whether it is superior to the previous state. But at this time, there will be a state of S1 in Figure 3. When the function is the F-Score value obtained from the anomaly detection model under the current parameter adjustment action, that is, the current state value. is the target state, makes the current state value closer to the target value, and the bigger the reward value is, is the maximum state value set during the execution of a policy.
− makes the award be rewarded only if the exception detection model gets better state values under the adjustment action, otherwise it will be punished, which is beneficial to a strategy to reach the optimal state faster. Based on the above strategy, we get the pseudo code for obtaining the best policy based on Q-Learning algorithm, as shown in Table 1:  Given initial state s，Choose action a according to ε greed strategy 5: Computational anomaly detection model score F max 6: Repeat(episode): 7: Select action a according to Q(s t , a t ) in the state s 8: Q(s t , a t ) ← Q(s t , a t ) + a[R t + γmax a Q(s t+1 , a t+1 ) − Q(s t , a t )] 9: Calculation of the current anomaly model score F t 10: s t ← s t+1 ;a t ← a t+1 11: IF R t > 0 13: until s is the terminated state 15: until all Q(s, a) convergence 16:output: π(s) = argmax a Q(s, a) Through algorithm above, we compare the action selection process of static reward function, as shown in Figure4.

Figure 4. ( , ) table comparison
When the state 1 is updated to 2 in Figure 4 (a), according to the static reward function, action 4 makes the state of the model better than the previous state, so we get the reward. However, because the state a is punished, the whole model is not optimized. According to the dynamic reward function presented in this paper, we will get the punishment, as shown in Figure 4 (b). In the next policy execution, there are 4 actions that can be attempted in the state S2 (a) table, while there are only 3 of them in (b). Therefore, (b) table will arrive at the next state faster, and this advantage will be more obvious in the accumulation of multiple strategies. Comparing the convergent function table Q(s, a) in Figure 3. Under the proposed strategy, the function table Q(s, a)will converge as shown in Figure 5.  parameter selection and estimation strategy in document [3] . In order to verify the effectiveness of the strategy in the anomaly detection, we observe and analyze the change process of the recall and accuracy of the anomaly detection results and compare the F-Score values of the different anomaly detection models. In order to verify the optimization of the strategy in the iterative process, the number of iterations per adjustment process is compared with the original Q-learning algorithm.

Evaluation index
(1) recall: the ratio of true outliers representing the true outliers detected by the representative, as shown in the formula 5.

Figure 6
Comparison of recall and precision From figure 6 , it can be seen that in the process of anomaly detection, the recall and the precision of the manual adjustment method when each data characteristic changes are further reduced. And other strategies can maintain good anomaly detection after adjustment. On the recovery of anomaly detection model adjustment, from the fourth day, the tenth day, the eighteenth day, the twenty-third day and the twenty-eighth day of figure 6, the strategy (RL) and parameter selection method proposed in this paper adjust the recovery fastest in the face of changes in data characteristics. Although both the decision tree strategy and the k-means method have high reliance on the characteristics of the new data, the decision tree strategy utilizes the markup update of the expert system, so that it is faster than the adjustment of the anomaly detection effect by the k-means. In the overall detection effect, the overall trend of the decision tree strategy shown in figure 6 is relatively stable, but the overall average is low. The overall fluctuations of RL, k-means, and parameter selection methods are large, but the overall average is high.

Optimization verification of iterative process
. In order to compare the number of iterations of the original Q-Learning algorithm and the optimized algorithm of this article when adjusting the parameters, we add the counter of iterations in the iteration process to record the number of iterations. After the anomaly detection process, the data is obtained as shown in figure 7:

Figure7 Iteration number comparison
The original Q-Learning algorithm rewards each positive parameter adjustment action. The dynamic reward function proposed in this paper only rewards the current optimal parameter adjustment action, so that the action to obtain the reward is reduced, and the number of optional adjustment actions in the next adjustment is also reduced. As can be seen from figure 7 , five anomaly detection model adjustments are made in the anomaly detection process. In each adjustment, the optimized strategy of this paper is less than the original Q-Learning algorithm in the number of iterations. Therefore, the effectiveness of the proposed strategy for the optimization of the iterative process to obtain the best parameters is verified.

Related work
In cloud environment, many researchers have done research on anomaly detection algorithm. Some based on the data distribution to detect the anomaly, which using the inconsistency test method to compare the probability distribution of the detected data to the presumed probability distribution, such as the literature [1], and some methods based on deviation, such as ARIMA algorithm in literature [2], Holt-Winters algorithm in literature [3], Wavelet algorithm in literature [5]. However, these algorithms do not have a good solution to the change of data characteristics, and only rely on manual readjustment to achieve the desired detection efficiency.
To solve the problem of data characteristics over changing, researchers have made a study on the adaptive detection model. Some based on supervised learning technology, such as literature [8,9], Some based on unsupervised learning methods, such as literature [6,7], but those kind of algorithm usually needs to build an extra expert system to mark anomaly data, and has a high dependence on historical data. In literature [4], two strategies are used in parameter configuration. One is to enumerate the limited parameters by using the reduced parameter sample space and enumerate the spare parameters in advance. The other is to use the targeted parameter estimation algorithm to get the appropriate parameters. However, this method can't guarantee that the reduced sample space contains the optimal parameters under each data characteristics in the pre-proposed parameter sample space, and for the complex anomaly detection algorithm, the corresponding parameter estimation method should be tested for each anomaly detection algorithm. Based on the thought of the above work, this paper constructs an adaptive anomaly detection framework using the reinforcement learning technology, automatically triggering the adjustment of the anomaly detection model to the Markov decision process by perceiving the changes of the data characteristics,in addition, the strategy of selecting parameter adjustment action for different anomaly detection algorithms, which realizes the automatic adjustment of the anomaly detection model in the face of the change of data characteristics, and ensures a good anomaly detection effect in the cloud environment.

Conclusion
Anomaly detection is an important technology to ensure the stability of the system services of the cloud platform. However, because of the complexity of the data changes in the cloud environment, the anomaly detection model needs to be constantly adjusted. In this paper, we introduce an adaptive detection method based on reinforcement learning, which automatically triggers the transformation of the anomaly detection model to the Markov decision process by perceiving the changes in the characteristics of the monitoring data, and we put forward the selection strategy of parameter adjustment action and the optimization algorithm for obtaining the best parameters, and realize the automatic adjustment of the anomaly detection model when the data characteristics is changed. In the future work, we will further optimize the iterative process of the parameters of the Markov decision process, reduce the time of the parameter selection process, and improve the adaptability and sensitivity of the model in the anomaly detection process.

ACKNOWLEDGMENT
This work is supported by the Natural Science Foundation of China (No. 61762008, 61363003), Natural Science Foundation Project of Guangxi(No. 2017GXNSFAA198141), Key R&D project of Guangxi(No. GuiKE AB17195014), and R&D Project of Nanning(No. 20173161).