Explainable AI in Manufacturing: A Predictive Maintenance Case Study

. This paper describes an example of an explainable AI (Ar-tiﬁcial Intelligence) (XAI) in a form of Predictive Maintenance (PdM) scenario for manufacturing. Predictive maintenance has the potential of saving a lot of money by reducing and predicting machine breakdown. In this case study we work with generalized data to show how this scenario could look like with real production data. For this purpose, we created and evaluated a machine learning model based on a highly eﬃ-cient gradient boosting decision tree in order to predict machine errors or tool failures. Although the case study is strictly experimental, we can conclude that explainable AI in form of focused analytic and reliable prediction model can reasonably contribute to prediction of maintenance tasks.


Introduction
Predictive Maintenance (PdM) anticipates maintenance needs to avoid costs associated with unscheduled downtime.By connecting to devices and monitoring the data the devices produce, we can identify patterns that lead to potential problems or failures.Those insights can be used to address issues before they happen.This ability to predict when equipment or assets need maintenance allows us to optimize equipment lifetime and minimize downtime [1].
The fundamental litmus test for explainable AI (XAI) -that is, machine learning algorithms and other Artificial Intelligence systems that produce outcomes that humans can readily understand and track backwards to the origins [2].
In this case study we will consider the field of maintenance in manufacturing.More precisely we will deal with PdM by involving explainable AI outputs as base for our decisions and predictions.

Explainable AI -XAI
Recent success of Machine Learning (ML) led to series of application scenarios for Artificial Intelligence (AI) applications.Continued advances promise to produce autonomous systems that will perceive, learn, decide, and act on their own.However, the effectiveness of these systems is limited by the machine's current inability to explain their decisions and actions to human users.
The Explainable AI (XAI) program introduced by DARPA3 aims to create a suite of ML techniques that: -Produce more explainable models, while maintaining a high level of learning performance (prediction accuracy); and -Enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.
For decision makers who rely upon Data Analytics and Data Science, explainability is a real issue.If the computational system relies on a simple decision model such as logistic regression, they can understand it and convince executives who have to sign off on a system because it seems reasonable and fair.They can justify the analytical results to shareholders, regulators, and other involved stakeholders.But for "Deep Nets" and ML systems, this is no longer possible.There is a need to find ways to explain the system to the decision maker so that they know that their decisions are going to be reasonable.The goals of explanation involves reaching a persuasion, but that comes only as a consequence of understanding the how the AI works, the mistakes the system can make, and the safety measures surrounding it.
Meanwhile, AI is increasingly allowed to make and take more autonomous decisions and actions.Justifying these decisions will only become more crucial, and there is little doubt that this field will continue to rise in prominence and produce exciting and much needed work in the future [3].

Predictive Maintenance
PdM extracts insights from the data produced by the equipment on the shop floor and acts on these insights.The idea of PdM goes back to the early 1990's and augments regularly scheduled, preventive maintenance.PdM requires the equipment to provide data from sensors monitoring the equipment as well as other operational data.Humans act based on the analysis.Simply speaking, it is a technique to determine (predict) the failure of the machine component in the near future so that the component can be replaced based on the maintenance plan before it fails and stops the production process.The PdM can improve the production process and increase the productivity.By successfully handling with PdM we are able to achieve the following goals: -Reduce the operational risk of mission-critical equipment.
-Control cost of maintenance by enabling just-in-time maintenance operations.-Discover patterns connected to various maintenance problems.
-Provide Key Performance Indicators.
Usually PdM uses descriptive, statistical or probabilistic approach to drive analysis and prediction.There are also several approaches which used Machine Learning (ML) [1,14].Through the literature [15] there can be found the following types of PrM in the production: reactive, periodic, proactive and predictive (Figure 1).

Case Study
In order to handle and use this technique we need a various data from the machines in production.In this case study we used the freely available data from a data source generated as test data set for PdM containing information about: telemetry, errors, failures and machine properties.The data can be found at Azure blob storage.The data is maintained by Azure Gallery Article4 .Once the data is downloaded from the blob storage, local copies will be used for further observations in this contribution.

Methodology
Usually, every PdM technique should proceed by the following three main steps: -Collect Data -collect all possible descriptions, historical and real-time data, usually by using IoT (Internet of Things) devices, various loggers, technical documentation, etc. -Predict Failures -collected data can be used and transformed into ML ready data sets, and build a ML model to predict the failures of the components in the set of machines in the production.-React -by obtaining the information which components will fail in the near future, we can activate the process of replacement so the component will be replaced before it fails, and the production process will not.

Data Preparation
In order to predict failures in the production process, a set of data transformations, cleaning, feature engineering, and selection must be performed to prepare the data for building a ML model.The data preparation part plays a crucial role in the model building process because quality of the data and its preparation will directly influences the model accuracy and reliability.The data used for this PdM use case can be classified to: -Telemetry -which collects historical data about machine behavior (voltage, vibration, etc).-Errors -the data about warnings and errors in the machines.
-Maint -data about replacement and maintenance for the machines.
-Machines -descriptive information about the machines.
-Failures -data when a certain machine is stopped, due to component failure.
Errors data represents the most important information in every PdM system.The errors are non-breaking recorded events while the machine is still operational.In the experimental data set the error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.What we get to insight is shown in the left chart of figure 2.
Failures data represents the replacements of the components due to the failure of the machines.Once the failure is happened the machine is stopped.This is a crucial difference between errors and failures.Failure distribution produced by certain component across machines is shown in the right chart of figure 2.
Maintenance data tells us about scheduled and unscheduled maintenance.The data set contains the records which correspond to both, regular inspection of components as well as failures.To add the record into the maintenance table a component must be replaced during the scheduled inspection or replaced due to a breakdown.In case the records are created due to breakdowns are called failures.Maintenance contains the data from 2014 and 2015 years.

Feature Engineering
First, several lagged telemetry data was created, since telemetry data are classic time series data.In the following, the rolling mean and standard deviation of the telemetry data over the last 3-hours lag window is calculated for every 3 hours.For capturing a longer term effect 24 hours lag features rolling average and standard deviation were calculated.Once we have rolling lag features calculated, we can merge them into one data frame.Now that we have basic data frame, we merged previously calculated data frames with this one.At the end of the merging process, the relevant columns are selected.Unlike telemetry that had numerical values, errors have categorical values denoting the type of error that occurred at a time-stamp.This was used to aggregate categories of the error with different types of errors that occurred in the lag window.The main task here was to create a relevant feature in order to create a quality data set for the machine learning part.
One of the good features that has been chosen was the number of replacements of each component in the last 3 months to incorporate the frequency of replacements.Furthermore, we calculated how long it has been since a component is last replaced as that would be expected to correlate better with component failures since the longer a component is used, the more degradation should be expected.The machine data set contains descriptive information about machines like the type of machines and their ages which is the years in service.
As the last step in feature engineering, we are performing merging all features into one data set.The label in PdM should be the probability that a machine will fail in the near future due to a failure certain component.If we take 24 hours to be the period (task) for this problem, the label construction consists of a new column in the feature data set which indicate if certain machine will fail or not in the next 24 hours due to failure one of several components.
In this way, we are defining the label as a categorical variable containing:none -if the machine will not fail in the next 24 hours, -comp1 to comp4 if the machine will fail in the next 24 hours due to the failure of certain components.Since we can experiment with the label construction by applying different conditions, we can implement methods that take several arguments in order to define the general problem.

Preliminary Results
We analyzed 5 data sets with information about telemetry, data, errors and maintenance as well as failure for 100 machines.The data were transformed and analyzed in order to create the final data set for building a machine learning model for PdM.
Once we created all features from the data sets, as a final step is to create the label column so that it describes if a certain machine will fail in the next 24 hours due to failure a comp1, comp2, comp3, comp4 or it will continue to work.In this part, we performed a part of the ML task and start training a ML model for predicting if a certain machine will fail in the next 24 hours due to failure, or it will be in functioning normal in that time period.
The model which we built was multi-class classification model since it has 5 values to predict: comp1, comp2, comp3, comp4 or none -means it will continue to work.We used the DART Booster hyper-parameter tuning along with Light-GBM5 [16] which is a gradient boosting framework that uses tree based learning algorithm.It is especially efficient on small data sets.We evaluated the trained model first with training data set (see Table 1 the model has overall accuracy 99%, and 95% average per class accuracy which is very promising for experimental case.

Conclusions, Limitations and Outlook
In this paper we conducted a case study in the field of Predictive Maintenance (PdM) with sample machine data to demonstrate how explainable AI can be reached in the field of manufacturing.Although the study is strictly experimental, we can conclude that explainable AI in form of reliable prediction model and visualizations can reasonably contribute to avoiding unnecessary costs associated with unscheduled downtime caused through machine errors or tool failures.
The basic limitation of this contribution is that this experiment was conducted with generic data set, however the presented concept shows high maturity with promising results.
The next step in the future would be to engage the trained model with some data collected directly in real world manufacturing settings and involving data from different manufacturer.In this way the reliability of presented results could be approved and through the comparison of results from different data sources and adjustment of the prediction model.

Fig. 1 .
Fig. 1.Different types of maintenance in the production

Table 1 .
). Results on training data set.As can be seen the model predicts the values from the training data set correctly in most cases.In order to see how the model predicts unknown data entries we used the test data.The result is shown in Table2.We can see, that

Table 2 .
Results on test data set.