Image-Driven Decision Making with Application to Control Gas Burners

+ Abstract. Our aim is to propose a model-free approach to decision making that is based on the direct use of images. More, precisely, a content of each image is used – without further processing – in order to cluster them by the K-medoids method. Then, decisions are attached to each cluster by an expert. When a new image is acquired, it is ﬁrstly classi-ﬁed to one of the clusters and the corresponding decision is made. The approach is conceptually rather simple, but its success in on-line applications depends on the way of organizing learning and decision phases. We illustrate the approach by the example of a decision-making system for industrial gas burners.


Introduction
Most decisions made by human beings and animals are based on images provided by the eyes. In most cases these decisions are made on an intuitive level as when we are walking or grasping a cup of tea. In somewhat more complicated situations, e.g., when we see a red light at a crossroads, our decisions are more consciously made. Even more image interpretation work is necessary when a policeman is governing the traffic at a crossroads 1 . The number of cases when processing images and their interpretation in decision making is so broad that it is impossible to consider all of them in one paper. We concentrate on image-driven decision making in industrial applications. At least the following three classes of problems can be distinguished: 1. simple signals given by workers to each other (start, stop, go on etc.) and safety monitoring by monitoring the presence in dangerous areas -we skip discussing these topics, since they are relatively simple and well-developed, 2. quality monitoring of produced items, 3. on-line control of continuously running industrial processes. Quality monitoring of a produced item is also relatively well developed (see [3] and extensive bibliography cited therein and [10], [11] for more recent contributions). However, there are still large areas of potential applications of machine vision techniques in quality control both of item by item and continuously running industrial processes. In particular, the approach proposed in this paper can also be used for these purposes. The distinguishing feature of machine vision techniques in quality monitoring is that the resulting decision are in the most cases binary -confirms requirements or not. The second feature that differentiates them from problems of on-line control with camera in the loop is that in quality monitoring usually there is no automatic feedback in order to improve a production process. Instead, experts analyse past data, trying to find reasons of poor quality. In this paper we concentrate on item 3. of the above list. Advanced applications of control systems with camera in the loop are described rather rarely (see [2], [12], [5], [5], [6] for several more recent examples). Approaches to control systems with a camera in the loop can roughly be classified as follows: model-based -a mathematical model of a system to be controlled is known and a camera provides information that is inaccessible by classic sensors (see [9]), model-free -a mathematical model of a process is not available and images provided by a camera are either: intensively processed in order to extract problem-specific features usefull in decision-making [8], [14] OR roughly clustered using only a general (dis-)similarity measure between images in order to cluster them into system states that require the same (or similar) decision (action). The latter approach is the main topic of this paper. We shall call this approach model-free, image-driven decision making (IDDM). As far as we know, the approach that is completely model-free decision making, which is solely based on images, but without their intensive processing, does not have its counterpart in the literature. For this reason we shall not provide comparisons of the IDDM with other methods that require much more a priori information. Its main advantages of the IDDM are the following: the IDDM approach is relatively fast, since the learning (clusterization) phase can be separated from an on-line decision making, the decision-making phase can be implemented as a low-cost microprocessor system, a low level of a priori knowledge is necessary in order to design a IDDM system. The IDDM is based on K-medoids clustering, but other clustering methods might also be used. It is worth mentioning that K-medoids clustering in image processing was used mainly for clustering pixels of one image (see [1] and the bibliography cited therein), while the proposed version aims to cluster images as whole entities. The paper is organized as follows: in the next section we provide a general description of the IDDM approach, then, this approach is applied to design a decision-making system for control industrial gas burners using a camera, finally, we discuss possible extensions of the IDDM approach. We remark that the problem of decision-making for industrial gas burners using a camera is an area of recent active research: [14], [8], [16], [13], but the approach proposed here is different. Namely, we do not use timeconsuming image processing operations other than verifying to which cluster acquired images belong.
2 Image-driven decision making -a general idea and algorithm The proposed approach consists of the following two phases: off-line clustering phase, on-line decision making. Below we describe these phases in more details.

Off-line clustering phase
The first step of this phase is collecting a large number of images that are representative for for the system states. We shall call these images the set of representative images (SRI). This step is crucial for a proper functioning of a decision system. The SIR must contain images that represent all important system states. The number of images illustrating each group of important states should be sufficiently large. A strategy of clustering images and decisions. The second step is to cluster the set of images into an appropriate number of clusters (see below for details). The third step is to attach decisions (control actions) to each cluster. We discuss steps two and three in common, since one can consider the following two strategies of attaching (linking) decisions and images. Cluster and attach decisions. The SRI is firstly clustered into, say, K > 1 disjoint sets of similar states represented by images. Then, an expert attaches control actions (decisions) to each of them. We shall use this strategy in the present paper. Attach decisions and cluster. This strategy is more laborious for the expert, since he/she has to attach a decision to each image in the SRI. Then, pairs (image, decision) are clustered, using a (dis-)similarity function (metric) that takes into account both the similarity of images as well as the similarity of decisions. This requires that decisions can be in some sense ordered or their closeness can be defined. After clustering pairs (image, decision) one may find some inconsistencies, since it may happen that two images that are attached to the same cluster have different decisions attached. Thus, before using this approach it is necessary to unify decisions. This can be done by the majority voting among decision labels in each cluster. However, if in a given cluster we have, e.g., two decision labels in almost equal proportions, then it is expedient to consider splitting this cluster into two and further join it with another cluster (sub-cluster) with the same decision label. As one can notice, this strategy needs to be carefully elaborated in order to avoid decision clashes. Distance functions for clustering images. Images can formally be clustered in the same way as other objects. The only specificity is in selecting a similarity measure between images that should take into account both the correspondence of pixels in space as well as different contents and ways of coding images. We require that a similarity function ρ(A, A ) is a metric (distance function) that is defined on the Cartesian product of an appropriate space of images A, A , . . . , as it is illustrated by examples listed below. It is assumed that all images in the SRI are of the same type (binary, gray levels or color) and of the same dimensions I × J, say. Concerning color images we additionally assume that all of them use the same color coding scheme (e.g., RGB or HSI).
Color RGB images. Color images, coded as RGB (red, green, blue) channels, are represented as: Again, cijγ and c ijγ are usually restricted to [0, 1] or to [0, 255] for each channel. Later on we shall use the following distance function for RGB color images: Clearly, gray-level and binary counterparts of (1) can also be used. Details of clustering images -K-medoids. Having selected a distance function between images, one can cluster SRI using the well known K-means algorithm. However, for our purposes the method of K-medoids (see [4], [7] for recent implementations of this algorithm) seems to be better suited. The reason is in that it returns -as its output -K images which are present in SIR. This would be of special interest when the second strategy (attach decisions to images and cluster) would be used. However, also when the first strategy is preferred (as in our case study), the K-medoids approach works better than the K-means for the following reasons: 1. the K-medoids method is more robust to noise and outliers (as it has features similar to the one-dimensional median), 2. the distances between images are calculated only once (important for large images), 3. for a long sequence of images the K-medoids method can be faster than the K-means method. Denote by C (n) , n = 1, 2, . . . , N the images in the SRI and let C be the collection of them. An imageĈ ∈ C is called the medoid of this set (see, e.g., [15] and the bibliography cited therein), if it minimizes the following distance function: ρC (Ĉ, C (n) ), where the sum in (2) is taken over all C (n) ∈ C such thatĈ = C (n) . We shall use this definition also for subsets of C.
For arbitrarily selected C k ∈ C, k = 1, 2, . . . , K define clusters C k , k = 1, 2, . . . , K by attaching to C k all the images C (n) , n = 1, 2, . . . , N such that they are closer to C k than to other C l , k = l in ρC distance. Define the total distance as follows: We are looking forĈ k ∈ C andĈ k k = 1, 2, . . . , K that minimize (3). Below we provide a skeletal version of the K-medoids algorithm that is adapted to split all images from C into K disjoint clusters. It can also serve as the definition of the K-medoids method.
Step 0 Calculate and store a N × N distance matrix D = [dnm], where dnm = ρC (C (n) , C (m) ). Select initial medoids C k , k = 1, 2, . . . , K by drawing them at random from C and insert them into a collection, CCM say, of considered candidates for medoids.
Step 1 Form clusters C k , k = 1, 2, . . . , K by attaching to C k all the images from C that are closer C k than to other temporary medoids C l , k = l.
Step 2 Calculate the total distance L between images and their temporary clusters as follows: where ρC (C k , C (n) ) is read out from the appropriate element of matrix D.
Step 3 Select at random an image, say C ( * ) , from C, which is different than those in CCM. C ( * ) is a candidate for a new medoid center. Using distances from matrix D find the medoid C(j), say, that is closest to C ( * ) among all current medoids C1, C2, . . . , CK .
Step 4 Calculate the total distance function with C(j) replaced by C ( * ) : in the same way as in (4).

If the algorithm stops at
Step 4b, then we consider its output either as the optimal solutionĈ k ∈ C andĈ k k = 1, 2, . . . , K or its its approximation. A faster implementation of K-medoids clustering of images, more in the spirit of the K-means algorithm, can be elaborated using general guidelines provided in [7]. In [4] one can find a survey of approaches to Kmedoids clustering that are based on evolutionary algorithms.

On-line decision making
Medoids and clustersĈ k ∈ C andĈ k k = 1, 2, . . . , K form a base for on-line decision making, when a new image, Ccur say, is captured by a camera. As it was mentioned earlier, at this stage an expert has to attach decisions, d k , k = 1, 2, . . . , K say, to each cluster. Thus, we have the sequence (Ĉ k , d k ) k = 1, 2, . . . , K at our disposal.
Algorithm 2 -On-line decision making algorithm.
Step 3 Decision: read out the corresponding decision d kcur and go to the step Image acquisition. An outline of this algorithm is presented in Fig. 1. Recognition (Step 2) of this algorithm can be formalized in a number of ways. At least three of them are worth mentioning. R1) Features can be extracted from medoidsĈ k k = 1, 2, . . . , K and compared with the same features extracted from Ccur. R2 Among medoidsĈ k k = 1, 2, . . . , K find the one that is closest to Ccur in ρC metric. R3 Calculate representative images,C k say, for clustersĈ k k = 1, 2, . . . , K, other thanĈ k 's and findC k that is closest to Ccur. In the case study presented in the next section we have selected the arithmetic means of images contained inĈ k 's as the representative images C k , k = 1, 2, . . . , K. Notice that this way is different than using Kmeans, since at the first stage we use the more robust K-medoids method for defining clusters and only after that -they are averaged. For R3) approach (applicable also to R2)), one can estimate the proba-bilitiesp1,p2, . . . ,pK of the events: Ccur is from k-th cluster. A natural, compatible with ρC , way of doing this is as follows.
Remark 1 By construction, the largestp k corresponds to thatk, for which ρ(Ccur,C k ) is the smallest one. However, values ofp k 's still have diagnostic properties. Namely, ifpk is not too much larger than (a) certain other(s)p k 's, then we may consider this classification as uncertain. In the next section the following rule is applied: if for at least one k 0.8pk <p k , then the classification is uncertain and -instead of taking decision dk -one may consider other actions.

Image-driven control of a gas burner -a case study
In this section we apply the methodology of image-driven control (Sec. 2) to support the decision on the rate of air supply to a gas burner. We refer the reader to [14], [8] for the problem statement and specific features of industrial gas burners control using a camera. For an alternative approach see [13]. Images of flames of an industrial gas burner (N = 100) are clustered into K = 5 groups using ρC similarity measure between images. The clustering was done according to general rules of K-medoid clustering (see Algorithm 1), implemented in Wolfram's Mathematica ver. 11. Then, representative images for each class were calculated, according to R3) -by averaging. The resulting images are shown in Fig. 2. In order to test the resulting cluster-based classifier a testing sequence containing 54 images (other than those in the learning sequence) was passed as its input. According to the rule described in Remark 1, nine images displayed in Table 1 (right panel) are selected as those that are potentially incorrectly classified. This means that about 84% of them were classified without doubts. A more careful analysis of the probabilities corresponding to potentially misclassified images (see Table 1 -right panel) reveals that six of them have been classified to class labelled as 2, while possible misclassifications were to classes 1 or 3. As we shall see, these potential misclassifications do not lead to essential problems. Thus, we can say that we have a sufficiently reliable and fast tool for the direct recognition of images. Now, it remains to attach decisions to clusters and then, one can apply Algorithm 2 on-line. The analysis of Fig. 2 indicates that a proper mode of operating the burner is somewhere between Class 1 and Class 2, since Class 1 flames are blue, but it happens that the burner is loudly roaring. Flames from Class 2 indicate a slightly too low air supply. In order to ensure that the decisions are not too frequently changed, we assume that Class 2 is the proper one. Images from Classes 3-5 indicate a too low or much too low air supply. Thus, as decisions attached to each class, one may consider the following: d1 -slightly reduce the air supply rate, d2 -keep the air supply rate at this level, d3 -slightly increase the air supply rate, d4 -largely increase the air supply rate, d5 -highly increase the air supply rate.
The above decisions are of a qualitative nature. Providing quantitative decisions would require much broader experimental data than we had at our disposal. Notice that the reason that leads to changes of decisions stems from fluctuations of methane contents in natural gas. A specific feature of gas burners of a moderate size that made it possible to apply the proposed approach is their almost immediate response for changing the air supply rate.

Concluding remarks
A direct approach to image-driven decision making has been proposed. It consists of the clustering phase -that can be time-consuming -and a decision-making phase, which is sufficiently fast for on-line applications, since no image processing is necessary. This approach has a wide range of possible applications. One of them, namely, air supply rate control of an industrial gas burner is reported in the paper. The approach is flexible and may have many variants that differ in the way of selecting the method of clustering and the similarity function as well as by forming images representative for classes. Being widely applicable, the IDDM cannot be universal. Its applicability is limited to cases when relatively large portions of images are changing. It can be extended to certain classes of dynamic systems when one has to take into account that a memory of older states is present in the system, but the extension is outside the scope of this paper.