Knowledge Fusion of Manufacturing Operations Data Using Representation Learning

. Due to increasingly required ﬂexibility in manufacturing systems, adaptation of monitoring and control to changing context such as reconﬁguration of devices becomes more important. Referring to the usage of structured information on the Web, digital twin models of manufacturing data can be seen as knowledge graphs that constantly need to be aligned with the physical environment. With a growing number of smart devices participating in production processes, handling these alignments manually is no longer feasible. Yet, the growing availability of data coming from operations (e.g. process events) and contextual sources (e.g. equipment conﬁgurations) enables machine learning to synchronize data models with physical reality. Common knowledge graph learning approaches, however, are not designed to deal with both, static and time-dependent data. In order to overcome this, we introduce a representation learning model that shows promising results for the synchronization of semantics from existing manufacturing knowledge graphs and operational data.


Introduction
The ubiquitous availability of data empowers manufacturing companies to embrace advanced data analytic technologies that allow to monitor, predict, and optimize manufacturing operations. Still, ensuring semantic interoperability within hardware-software integrated cyber-physical systems (CPS) and management applications requires extensive manual data modeling effort, thus introducing and maintaining these technologies is challenging for manufacturers [7]. For example, today, deploying a new device for machine condition monitoring at a shop floor means manual effort to model this device and all of its signals throughout several software applications (e.g. SCADA, MES). Otherwise, physical reality is not correctly reflected in existing models and there is no semantic interoperability between applications. Recently, descriptive data models have been revitalized as part of a digital representation of physical systems, the so-called digital twin, which allows systems to discover, inherit, evaluate and share information across different sub-systems [2]. From a data modeling perspective, structured information of digital twins can be represented as knowledge graph (KG), where relations and entities follow well-defined vocabularies and semantics. Knowledge graphs are commonly understood as publicly-accessible Linked Data resources -prominent examples are Wikidata 4 and WordNet 5 . Similarly, Manufacturing Execution Systems (MES) and engineering platforms that are built upon sizable relational databases can be seen as domain-specific knowledge graphs, when lifted to a semantic schema [5]. Such a manufacturing knowledge graph should be able to automatically acquire updated information based on different operational data sources (e.g. SCADA, PLCs, etc.), even if these data sources are not aware of their semantics.
Continuing the machine monitoring example: By observing data coming from the newly added device (e.g. events) the KG should automatically recognize the type of device, its location, or its capabilities and therefore allow other applications to adapt to this updated context. Machine Learning in KGs has emerged recently with the goal to enable automated integration of new facts into KGs without manual modeling efforts [9]. When multiple data sources are used to extract information, the problem further extends to so-called knowledge fusion [3]. The same problems apply to models in manufacturing systems that need to be in-sync with physical reality reflected by multiple operational data sources [4]. In this paper, we present an approach to support fusion of information coming from operational data sources with manufacturing KGs by learning latent representations of entities. The goal is to offer automated recommendations on how to integrate unknown entities into the existing structure of the KG and thus keeping the digital twin in-sync without manual modeling effort. Ultimately, this is beneficial to monitoring and management applications that rely on a immediately aligned digital representation of the manufacturing system.

Motivation Scenario
In this section we present an example scenario that motivates the application of machine learning (knowledge fusion) to manufacturing KGs in conjunction with operational data sources.
Consider an automated production line at a discrete manufacturing facility, consisting of multiple production units that can be configured to produce several variants of a product. The manufacturing KG (e.g. provided by an MES) of this production line gives information about device topology and processes executed by each of the production units, whereas a SCADA system observes sequences of events during operation. As shown in Figure 1, at the bottom, sequences of events are continuously generated and aligned to entities in the manufacturing KG. Entities and their relations are denoted as triples (head-entity, relation, 4 wikidata.org 5 wordnet.princeton.edu  Figure. The schema (classes and relations) of the KG is shown on top of the entities using a simplified class diagram notation. For example, the triple (Event 1, occurs at, Conveyor) in the KG states that entity Event 1 occurs at entity Conveyor. Additionally the conveyor entity is modeled as device that is involved in the board assembly process. Assuming a new device is deployed to the production line to monitor temperature measurements of the conveyor. As production resumes, events of this new device are continuously observed, but they are lacking semantic alignment to the existing KG. Figure 2 shows a new sequence of events, where unknown entities in the triples are denoted with question marks. Here, the class of the unaligned event Event 2 and its source (device) are unknown, (Event 2, is-a, ?), respectively (Event 2, occurs at, ?). However, the distribution of events in the sequence data should give an indication about which device is most likely to be hold responsible (in this case the conveyor). Since other conveyor events are assumed to co-occur in similar fashion as the new monitoring events, this information can be exploited to re-engineer semantics. Presuming one could obtain a vector representation of all involved entities (events, devices, etc.), it would be possible to calculate a similarity between Event 1 and Event 2 that would allow to infer that both are related to the conveyor entity in the KG. The representation learning approach in the following is motivated by learning latent entity embeddings that reflect such similarity.

Problem Statement
In this section, we formally define the problem of learning joint representations of entities in KGs and operations data of manufacturing systems. A knowledge graph, denoted as KG is a directed graph with labeled edges. Knowledge Graph Embeddings Given KG, the problem of learning knowledge graph embeddings is to encode all entities in E and relations in R in a continuous low-dimensional vector space, i.e. h, t ∈ R d and relation r ∈ R d . In order to learn useful representations, a meaningful distance measure has to be employed, e.g. in the original TransE model [1], h + r ≈ t. This means that translating entity h with relation r should end up close at its tail entity t in the latent d-dimensional space. It has been shown that these translation embeddings can be effectively learned by using a ranking loss with the intuition that h + r ≈ t should be close for true triples and far apart for false/unknown ones. Formally, the learning objective is formulated as minimizing a margin-based ranking loss: where dist(·) is some distance function (e.g. Euclidian) and S h,r,t is a set of negative samples, i.e. artificially constructed false triples by replacing h or t with a random entity. This loss is minimized when the translation of correct triples is closer than that of unknown ones by a constant margin, here 1.

Sequential Data Embeddings
Given D, the problem of learning sequential embeddings of entities x i is similar to knowledge graphs, i.e. encode all entities in the same low-dimensional vector space, x i ∈ R d , where semantically similar entities should end up close to each other in this latent space. Learning this kind of embeddings follows the distributional semantics hypothesis which states that similar entities occur in similar context. This has been one of the key ideas in the field of Natural Language Processing (NLP), since these embeddings tend to exhibit natural relations between words (e.g. capture synonymous meanings) [6]. Distributed representations are obtained by assuming that similarity between entities in the data can be modeled with a distribution, formally P (x i |W ), i.e. the occurrence of entity x i depends on and can be predicted from its surrounding window events W . Figure 3 displays how Event 3 can be modeled from its surrounding events in a sliding time window of length m through the event sequences. It is assumed that events having similar causes and effects share similar semantics.
Mathematically, the probability distribution of predicting target entity x i from its surrounding entities can be expressed by a categorical distribution, e.g. the Softmax function: , where x i is the vector representation of entity x i and S(·) is some similarity function between entities and their surrounding window entities represented as matrix W i . The objective function in terms of loss is given by the negative log likelihood: Joint Embeddings As the goal of this approach is to jointly model entities in the knowledge graph as well as in the sequential data, we propose a joint learning model that is trained by simply adding both loss terms: Minimizing the joint loss L Joint should result in solid embeddings of both, entities in the knowledge graph and the sequence data set. In reality, joint loss minimization is approximated using a state-of-the-art stochastic gradient descent optimizer. The key idea here is that entity embeddings are shared across both tasks and therefore the outcome should reflect co-occurrence of sequential data as well as the structure of the knowledge graph. The architecture of the joint embedding approach is shown in Figure 4, where the |E|-by-d matrix of (h, r, t) Entity Embeddings

Prototype Evaluation
We evaluated this approach on a real-world manufacturing KG data set coming from an automated assembly line. The event sequences are taken from a SCADA-level Alarms & Events database, whereas the initial KG was extracted from several spreadsheet files and CAD models. The final KG ended up with a size of about 3,700 triples about processes, equipments, and events, whereas the sequential data consisted of 57 thousand events occurrences. A prototypical implementation of the representation learning was implemented using the TensorFlow TM library. For performance evaluation, the usual criteria are (cf. [1]): -Mean Rank: The average predicted rank of the head or tail entity that would have been the correct one (1 indicating perfect rank) -Hits Top-10: The fraction of predicted ranks that were in the top 10 We compare two models, KG (knowledge graph embeddings only) and KG + Seq (joint embeddings). In Figure 5, the performance of KG and KG + Seq are visualized during model training on a hold-out (unseen) test data set of incomplete triples, e.g. (Conveyor, involved in, ?). It can be seen that the joint model performs better in terms of lower mean rank and higher hits top-10 percentage.

Related Work
We divide related work into two categories, limited to applications and techniques that are close to the one in this work.
Model Learning in Manufacturing Machine learning has been used to discover influencing factors of manufacturing processes [14]. Other works of adapting to changing context have studied monitoring processing times in flexible production systems [11,10] and more high-level architecture proposals for context extraction and self-adaption of production systems [12]. However, the authors do not specify a concrete methodology on how to extract context knowledge and align it with existing models.
Learning of Knowledge Graph Embeddings Existing learning methods for KGs such as [1] and [9] have been extended to include many-to-many relationships [8] and to incorporate textual information to improve entity representation learning.
Recently, word co-occurrences as sequential data were used in KG completion tasks [13]. In contrast to our approach, these works are focused on large-scale knowledge graphs containing noisy information.