A Review of Current Machine Learning Techniques Used in Manufacturing Diagnosis

. Artificial intelligence applications are increasing due to advances in data collection systems, algorithms, and affordability of computing power. Within the manufacturing industry, machine learning algorithms are often used for improving manufacturing system fault diagnosis. This study focuses on a review of recent fault diagnosis applications in manufacturing that are based on several prominent machine learning algorithms. Papers published from 2007 to 2017 were reviewed and keywords were used to identify 20 articles spanning the most prominent machine learning algorithms. Most articles reviewed consisted of training data obtained from sensors attached to the equipment. The training of the machine learning algorithm consisted of designed experiments to simulate different faulty and normal processing conditions. The areas of application varied from wear of cutting tool in computer numeric control (CNC) machine, surface roughness fault, to wafer etching process in semiconductor manufacturing. In all cases, high fault classification rates were obtained. As the interest in smart manufacturing increases, this review serves to address one of the cornerstones of emerging production systems.


Introduction
Timely diagnosis of process faults provides a key advantage to help manufacturing companies stay competitive by reducing machine downtimes as more customers require manufacturers to provide the products quickly, at low cost and with high quality.Also, during machine downtimes, most of the time spent is on localization of the fault rather than carrying out the actual remediation, as fault diagnosis is the most challenging phase of machine repairs [1].This has resulted in companies looking for new ways to improve their fault root cause analysis (RCA) process.With current improvement in sensor technology, data storage, and internet speeds, factories are becoming smarter and more process data is generated.Research projects are focusing on how to utilize this 'big data' to improve manufacturing competitiveness.One such effort is a project at the National Institute of Standards and Technology (NIST) titled prognosis and health management for smart manufacturing (PHM4SMS) which is aimed at developing the necessary measurement science to enable and enhance condition-monitoring, diagnostics and prognostics [2].Part of this effort is the utilization of machine learning techniques to improve fault detection (FD) in manufacturing.
The aim of this paper is to review the recent application of machine learning techniques to manufacturing process diagnosis.This review covers papers published from 2007 to 2017 that utilized machine learning techniques for manufacturing fault diagnosis.This review covers 20 articles.The keywords used in the search are "machine learning application in manufacturing process diagnosis".The search was filtered to focus on artificial neural networks (ANN), Bayesian networks (BN), support vector machine (SVM) and hidden Markov model (HMM) techniques.The rest of the paper reviews findings in each of these prominent techniques and provides conclusion along with future directions of research.

Bayesian Networks
Bayesian networks are a commonly used machine learning technique for FD.BN is a directed acyclic graph whose nodes represent random variables and their conditional dependencies are depicted by directed arcs linking the nodes [3].Modeling a problem using BN requires specification of the network structure as well as the probabilities for each node.For generating tree structures, different authors proposed using several tools that depict the cause and effect relationship between these nodes.In generating the tree structure, De et al [4] and Pradhan et al [5] proposed the use of failure mode and effect analysis (FMEA), Pradhan et al [5] and Nguyen et al [6] utilized fishbone diagrams (cause and effect diagram), Pradhan et al [5] utilized faulttree analysis and variation sensitivity matrix, and finite element analysis (FEA) was used by Liu & Jin [7].The precise construction of the tree structure of a BN from data is an NP-hard optimization problem [8].Yang & Lee [9] and Correa et al [10] made use of K2 [11] and Chow-Liu [12], algorithms respectively to generate trees from data.Jeong et al [13] extract the cause-effect relationship for the equipment that is diagnosed from the equipment's maintenance manual.Process data obtained from sensors and stored in manufacturing execution systems (MES) or maintenance databases are then used to generate the conditional probabilities of the network.
The areas of application of BN vary across manufacturing industries.In the semiconductor industry, Yang & Lee [9] and Nguyen et al [6] used a BN to evaluate process variable influence on wafer quality to diagnose root cause of defective wafers using historic process data.Other application areas include the automobile industry [7] where BN is used to diagnose fixture fault in a taillight assembly and in machining [10] where BN is used to diagnose surface roughness fault.Data sources include quality management systems (QMS), manufacturing execution systems (MES), recipe management systems (RMS), computerized maintenance management systems (CMMS), and coordinate measuring machines (CMM).Table 1 gives a summary of the data sources, algorithms or methods used to determine the tree structure and case study or area of application for each of the BN papers surveyed.
BN is a white box model as the graphical representation makes it intuitively easy for the user to understand the interaction between the model variables.It is useful for modeling uncertainty and can be readily used to model hierarchical levels of multiple causes and effects with data from numerous sources, which is typically found in manufacturing systems.The same BN model can be used for both prediction and diagnosis.The main challenge of training a BN is in the construction of the tree structure and several methods including expert opinion have been proposed to mitigate this challenge [14].

Artificial Neural Network
Artificial neural network is a non-parametric machine learning algorithm inspired by the functioning of the human central nervous system [15].The adaptive nature provides a powerful modeling capability suited for non-linear relationships among features.ANN has been used for many manufacturing FD applications.For complex problems with multiple layers and nodes, the network's training time might be significant.To decrease this training time, Barakat et al [16] developed a self-adaptive fault diagnosis ANN which adjusts the number of nodes according to the network's input parameters and terminates the training process according to a set of criteria.The idea was illustrated in the detection and isolation of disturbances in a chemical reactor simulator.Demetgul et al [17] also proposed an optimal configuration algorithm for neural networks used for FD.The algorithm combined genetic algorithm (GA) and ANN to eliminate the trial and error process for selection of the fastest and most accurate ANN configuration by using a fitness function to keep the number of hidden layer(s) and nodes at the minimum possible.The performance of Demetgul's [17] proposed system in FD was evaluated using experimental data collected from a bottle capping pneumatic work cell.
To add FD capabilities to control charts, Zhao & Camelio [18] integrated a neural network (NN) with a statistical process control (SPC) chart.The authors incorporated process knowledge and measurements of a single sheet metal part to detect and diagnose fixture location fault in an assembly system as a proof of concept.Zhao & Camelio [18] also applied the SPC and NN in an automotive assembly process by using measurement data from the door subassembly to detect potential sources of variation during the assembly of the door to the vehicle.
The direct use of data from analog sensors or multivariate data sensors for FD applications requires a signal processing technique or dimension reduction techniques in the case of multivariate data in conjunction with a machine learning technique.Hong et al [19] proposed an algorithm that combines principle component analysis (PCA), modular neural network and Dempster-Shafer (D-S) theory to detect fault in an etcher system in semiconductor manufacturing.Process data was acquired by sensors and PCA reduced the dimensionality of the multivariate tool data set [19].Zhang et al [20] also proposed a critical component monitoring method that utilizes Fast Fourier Transform to extract features from sensor signals followed by training an ANN with the transformed data to predict machine degradation and identify component faults.
Yu et al [21] used clustering as an unsupervised procedure to obtain informative features from vibration sensor signals attached to the motor housing of a machine to determine the bearing condition.An ANN was used for diagnosing machine faults based on these feature vectors [21].Fernando & Surgenor [22] utilized an unsupervised ANN based on Adaptive Resonance Theory (ART) for FD and identification on an automated O-ring assembly machine testbed.Sensor data was collected while the machine was operating under different conditions (normal condition as well as faulty conditions) and features extracted from the raw sensor data.ART ANN could achieve excellent FD performance with minimal modeling requirements [22].
ANN's non-parametric nature and its capability to model nonlinear complex problems with high degree of accuracy has made ANN applicable to FD problems.The model is easy to initialize as there is no need to specify the tree structure like in the case of BN.However, disadvantages include the "black box" nature which makes it difficult to interpret the model.Also, ANN often cannot deal with uncertainty in inputs and is computational intensive making convergence typically slow during training.ANN is prone to overfitting and requires large diversified training data to prevent this problem.

Support Vector Machine
SVM uses different kernel functions like radial basis function (RBF) or polynomial kernel to find a hyperplane that best separates data into their classes, and has good classification performance when used with small training sets [23].Successful areas of application of SVM range from face recognition, recognition of handwritten characters, speech recognition, image retrieval, prediction, etc. [24].Application of SVM exist in fault localization, although it is not as common as BN and ANN [25].The technique was used by Hsueg & Yang [26] to diagnose tool breakage fault in a face milling process under varying cutting conditions.Kumar et al [27] created a MapReduce framework for automatic diagnosis for cloud based manufacturing using SVM as the classification algorithm and validated this with a case study of fault diagnosis using the steel plate manufacturing data available on UCI Machine Learning Repository [28].Demetgul [23] used SVM to classify 9 fault states in a modular production system (MPS) using data obtained from eight sensors, and experimented with 4 different kernel functions namely RBF, sigmoid, polynomial and linear kernel functions, and got 100% classification rate on all except for sigmoid kernel which had 52.08% classification rate.Decision tree technique developed using QUEST (Quick, Unbiased and Efficient Statistical Tree), C&RT (Classification and Regression Tree), and C5.0 algorithms were also applied to the same dataset and 100% classification rate was obtained, and 95.83% for Chi-square automatic interaction detection (CHAID) [23].Demetgul [23] concluded that SVM and decision tree algorithms are very effective monitoring and diagnostic tools for industrial production systems.
SVM is an excellent technique in modeling both linear and nonlinear relationships.Computation time is relatively fast when compared with other nonparametric techniques, such as ANN.Availability of large training datasets is a challenge in machine learning, however SVM tends to generalize well even with a limited amount of training data.Also, NIST is actively developing use cases that are representative of common manufacturing processes to support prognosis and health management research [29].

Hidden Markov Model
Hidden Markov Model is an extension of the Markov chain model used to estimate the probability distributions of state transitions and that of the measurement outputs in a dynamic process, given unobservable states of the process [30].HMM has been used in fault diagnostics of both continuous and discrete manufacturing systems.In the continuous case, Yu, [31] proposed a new multiway discrete hidden Markov model (MDHMM) for FD and classification in complex batch or semi batch production processes with inherent system uncertainty.Yu [31] applied the proposed MDHMM approach to the fed-batch penicillin fermentation process which classified different types of process faults with high fidelity.For the discrete case, HMM was applied by Boutros & Liang [32] to detect and diagnose tool wear/fracture and ball bearing faults.The model correctly detected the state of the tool (i.e., sharp, worn, or broken) and correctly classified the severity of the fault seeded in two different engine bearings [32].In addition to the fault severity classification, a location index was developed to determine the fault location (inner race, ball, or outer race) [32].
As with analogue sensor signal used along with ANNs, Yuwono et al [33] also used HMM along with advanced signal processing techniques to discover the source of defect in a ball bearing.The algorithm was based on Swarm Rapid Centroid Estimation (SRCE) and HMM and the defect frequency signatures extracted with Wavelet Kurtogram and Cepstral Liftering were used to achieve on average the sensitivity, specificity, and error rate of 98.02%, 96.03%, and 2.65%, respectively, on bearing fault vibration data provided by Case School of Engineering, Case Western Reserve University [33].
HMM is a probabilistic model that is excellent at modeling processes with unobservable states such as chemical processes or equipment's health status, thus a good fit for FD.However, the training process is usually computationally intensive.Training process is usually computationally intensive.

Conclusion and Future Research Direction
A summary of the different techniques' advantages and disadvantages are presented in Table 2. Most data used was process data acquired using sensors and training was done through designed experiments either in a laboratory or in an industry setting.The authors that proposed using data from QMSs did not validate their proposal with data from real applications because of the difficulty in obtaining real data or the challenge in mining Quality Information System (QIS) data such as corrective action reports, or warranty information, which are mostly in paper form.Also, most of the case studies in the papers were limited to FD in a single machine; therefore diagnosis of a whole factory or manufacturing line consisting of multiple machines was not considered.BN and HMM techniques are both excellent at modeling fault diagnosis with hierarchical levels consisting of multiple causes and effects.BN requires less computational power than HMM.ANN produced very accurate results and several approaches were proposed to reduce the large training time it required.ANN was also used in conjunction with a signal processing technique for applications with analogue sensor data.Unlike the other models, ANN is a black box and it is not easy to visualize what is occurring in the model.ANN is also prone to overfitting.Although not often used in fault diagnosis in comparison to other machine learning methods surveyed, SVM produced high fault classification rate at less computation time than ANN.Future work will explore using real QIS data to further improve the diagnosis process as well as extending the single stage diagnosis to multi stages to include the entire manufacturing factory.

Table 1 .
Summary of BN Applications

Table 2 .
Pros and Cons of each Technique