Diagnosis of Complex Active Systems with Uncertain Temporal Observations

. Complex active systems have been proposed as a formalism for modeling real dynamic systems that are organized in a hierarchy of behavioral abstractions. As such, they constitute a conceptual evolution of active systems, a class of discrete-event systems introduced into the literature two decades ago. A complex active system is a hierarchy of active systems, each one characterized by its own behavior expressed by the interaction of several communicating automata. The interaction between active systems within the hierarchy is based on special events, which are generated when speciﬁc behavioral patterns occur. Recently, the task of diagnosis of complex active systems has been studied, with an efﬁcient diagnosis technique being proposed. However, the observation of the system is assumed to be linear and certain, which turns out to be an over-assumption in real, large, and distributed systems. This paper extends diagnosis of complex active systems to cope with uncertain temporal observations. An uncertain temporal observation is a DAG where nodes are marked by candidate labels (logical uncertainty), whereas arcs denote partial temporal ordering between nodes (temporal uncertainty). By means of indexing techniques, despite the uncertainty of temporal observations, the intrinsic efﬁciency of the diagnosis task is retained in both time and space.


Introduction
Often, dynamic systems can be modeled as discrete-event systems [4]. Seminal works on diagnosis of discrete-event systems (DES's) maximize offline preprocessing in order to generate an efficient online diagnoser [21,20]. However, this requires generating the global DES model, which is bound to be impractical for large and distributed systems.
Other approaches, like diagnosis of active systems (AS's) [1,13,15], avoid generating the global model of the system by reconstructing online only the behavior which is consistent with the observation. Still, in the worst case, the number of behavior states is exponential with the number of components. This is why efficient techniques need to be designed in order to mitigate the explosion of the reconstructed behavior.
This paper deals with diagnosis of a class of DES's called complex active systems (CAS's), based on a class of observations called uncertain temporal observations.
In the literature, the term complex system is used to encompass a research approach to problems in a variety of disciplines [7,2,10,8,22,6,19]. Generally speaking, a complex system is a group or organization which is made up of many interacting components. In a complex system, interactions between components lead to an emergent behavior, which is unpredictable from a knowledge of the behavior of the individual components only. Inspired by complex systems in nature and society, complexity has been injected into the modeling and diagnosis of active systems [13], a special class of discrete-event systems [4], which are modeled as networks of interacting components, where the behavior of each component is described by a communicating automaton [3]. To this end, the notion of context-sensitive diagnosis was first introduced in [14] and then extended in [18], for active systems that are organized within abstraction hierarchies, so that candidate diagnoses can be generated at different abstraction levels. Active systems have been equipped with behavior stratification [16,17], where different networks of components are accommodated within a hierarchy. This resembles emergent behavior that arises from the vertical interaction of a network with superior levels based on pattern events. Pattern events occur when a network performs strings of component transitions matching patterns which are specific to the application domain. Complex patterns are also considered in research on cognitive systems [24,5,23].
Recently, diagnosis of complex active systems has been addressed, with an efficient diagnosis technique being proposed [11]. However, the observation of the system is assumed to be linear and certain, which turns out to be an over-assumption in real, large, and distributed systems. This paper extends diagnosis of complex active systems to cope with uncertainty in temporal observations. An uncertain temporal observation is a DAG where nodes are marked by candidate labels, whereas arcs denote partial temporal ordering, as proposed in [12] for diagnosis of (plain) active systems. In virtue of specific indexing techniques, the efficiency of the diagnosis task introduced in [11] is kept in both time and space when uncertain temporal observations come into play.

Active Systems
An active system A is a network of components, each one being defined by a topological model and a behavioral model. The topological model embodies a set of input terminals and a set of output terminals. The behavioral model is a communicating automaton where each state transition is triggered by an event ready at one input terminal. When triggered, the transition may generate events at some output terminals. Since each An active system (AS) A can be either quiescent or reacting. If quiescent, no event occurs and, consequently, no transition is performed. A becomes reacting when an external event occurs, which can be consumed by a component in A. When reacting, the occurrence of a component transition moves A to a new state, with each state being a pair (S, E), where S = (s 1 , . . . , s n ) is the array of the states of components in A, whereas E = (e 1 , . . . , e m ) is the array of events within links in A. 3 We assume that, sooner or later, A becomes quiescent anew.
The sequence of component transitions moving A from the initial (quiescent) state to the final (quiescent) state is the trajectory of A. Given the initial state a 0 of A, the graph embodying all possible trajectories of A, rooted in a 0 , is the behavior space of A, written Bsp(A). A trajectory h = [ t 1 (c 1 ), t 2 (c 2 ), . . . , t q (c q ) ] in Bsp(A), from initial state a 0 to final state a q , can be represented as: A complex active system A is a hierarchy of interacting active systems A 1 , . . . , A k . In order to make AS's interact with one another, four actions are required for each AS A: 1. Definition of a set of input terminals, each one being connected with an input terminal of a component in A; 2. Definition of a set of output terminals; 3. Specification of a set of patterns, with each pattern being a pair (p(ω), r), where p is a pattern event, ω an output terminal of A, and r a regular expression whose alphabet is a set of component transitions in A. Given a pattern (p(ω), r), pattern event p is generated at output terminal ω of A when a subsequence of the trajectory of A matches regular expression r. Since there is a link from output terminal ω of A to an input terminal of A , which is in its turn connected with an input terminal of a component c of A , it follows that the occurrence of p is bound to trigger a transition of c . This way, the behavior of A is doomed to influence (although not completely determine) the behavior of A . Like an active system, a complex active system A can be either quiescent or reacting. A is quiescent when all AS's in A are quiescent and all generated pattern events (if any) have been consumed. When reacting, the occurrence of an AS transition moves A to a new state. Each state of A is a triple (A, E, P), where: -A = (a 1 , . . . , a n ) is the array of the states of AS's in A, namely A 1 , . . . , A n ; -E = (e 1 , . . . , e m ) is the array of pattern events within links between AS's in A; -P = (p 1 , . . . , p k ) is the array of states of pattern-event recognizers. 5 The sequence of AS transitions moving A from the initial (quiescent) state to the final (quiescent) state is the trajectory of A. Given the initial state α 0 of A, the graph embodying all possible trajectories of A, rooted in α 0 , is the behavior space of A, written Bsp(A). A trajectory h = [ t 1 (A 1 ), t 2 (A 2 ), . . . , t q (A q−1 ), t q (A q ) ] in Bsp(A), from initial state α 0 to final state α q , can be represented as: Fig. 3 is a power transmission line. Each side of the line is protected from short circuits by two breakers, namely b and r on the left, and b and r on the right. Both b and b (the primary breakers) are connected to a sensor of voltage. If a short circuit (for instance, a lightning) strikes the line, then each sensor will detect the lowering of the voltage and command the associated breaker to open. If both breakers open, then the line will be isolated, thereby causing the short circuit to vanish. If so, the two breakers are commanded to close in order to restore the line. In doing so, it also informs the monitor on the right-hand side to perform the same action on recovery breaker r . For safety reasons, once opened, recovery breakers cannot be closed again, thereby leaving the line isolated.

Example 2 Displayed in
The protected line can be modeled as the CAS outlined in Fig. 4, called L, which is composed of four AS's, namely: P (the protection hardware on the left, including sensor s and breaker b), P (the protection hardware on the right, including sensor s and breaker b ), M (the monitoring apparatus, including monitors m and m , and recovery breakers r and r ), and L (including line l). Arrows within AS's denote links between components. For instance, P includes a link from s to b, meaning that an event generated by s can be consumed by b.
For the sake of simplicity, we assume that, when an event is already present in a link, no transition generating a new event on the same link can be triggered. Links between m and m allow monitors to communicate to one another. Instead, arrows between AS's denote links aimed at conveying pattern events. For instance, the link from P to M makes pattern events (occurring in P ) available to m in M .
Models monitor and line are displayed in Fig. 5. As such, monitor involves input terminals E and I, and output terminals O and R. Terminal E is entered by the link exiting the protection hardware (either P or P ), conveying the pattern events occurring in the latter.   Pattern events have the following meaning. di: the protection hardware disconnects the side of the line; co: the protection hardware connects the side of the line; nd: the protection hardware fails to disconnect the side of the line; nc: the protection hardware fails to connect the side of the line; nr, nr : the left/right side of the line cannot be reconnected; ni, ni : the left/right side of the line cannot be isolated; ps , ps: the short circuit persists on the left/right side of the line.
Patterns for P , P , and L are listed in Table 1 Table 1, the alphabet of r is defined as follows. For P and P , the alphabet of r is the whole set of transitions of the involved components (breaker and sensor). For M , the alphabet equals the set of transitions involved in r only.

Uncertain Temporal Observations
During its trajectory, a CAS A, embodying AS's A 1 , . . . , A k , generates a sequence of observable labels, called the trace of the trajectory. Observable labels are generated for observable transitions only. In this respect, the trace is the projection of the trajectory on the labels associated with observable component transitions. Still, what is perceived by the observer, and given in input to the diagnosis engine, is a relaxation of the trace called an uncertain temporal observation. An uncertain temporal observation is an array . k], being the subsequence of T involving the labels of T which are associated with transitions of components in A i ; Table 1. Specification of patterns by regular expressions The mode in which a trace is relaxed into an uncertain temporal observation is not under the control of the observer; therefore, the original trace generated by the CAS is, generally speaking, unknown to the observer. Within a DAG O A representing the uncertain temporal observation of an AS A, the notion of precedence '≺' between two nodes is used. This is defined as the smallest relation satisfying the following two conditions (where n, n , and n denote nodes, while n → n denotes an arc from n to n ): (1) if n → n is an arc then n ≺ n ; (2) if n ≺ n and n ≺ n then n ≺ n .
The extension of a node n in O A , denoted n , is the set of labels associated with n. A candidate trace T c of O A having set N A of nodes, is a sequence of labels so defined: where nodes n are chosen based on the partial order defined by arcs, while ε labels are removed. The extension of O A is the set of candidate traces of O A , written O A .
Likewise, the notion of extension can be defined for an uncertain temporal observation O = (O 1 , . . . , O k ) as follows: It is possible to prove that, for each T i in array T * , i ∈  1. The empty set ∅ is a prefix of O i ; 2. If p is a prefix of O i and n ∈ N i − p where ∀n ∈ N i such that n ≺ n we have n ∈ p, then p ∪ {n } is a prefix of O i .
The index of a prefix p, denoted I(p), is the subset of p defined as follows: Given an index I of p, the set of nodes in p is denoted by I −1 . An index I is final

Problem Formulation
Once a real system is modeled as a CAS A composed of AS's A 1 , . . . , A k , it can be diagnosed based on the uncertain temporal observation O = (O 1 , . . . , O k ). In this paper we focus on a posteriori diagnosis. That is, we assume that O is relevant to a complete trajectory of A, which moves A from the known initial (quiescent) state to an unknown final (quiescent) state. In order to match O against the behavior of A it is essential to know which are the observable transitions of components and their associated observable labels. This is specified by a viewer of A, namely V = (V 1 , . . . , V k ), which is the array of the local viewers of the AS's, with each local viewer V i , i ∈ [1 .. k], being a set of pairs (t, ), where t is an observable transition of a component in A i and an observable label.
The projection of a trajectory h of A on viewer V = (V 1 , . . . , V k ) is the array of AS traces defined as follows: Similarly to a local viewer which specifies observable transitions, faulty transitions are specified by a local ruler, a set of pairs (t, f ), where t is a component faulty transition and f a fault. The array of all local rulers R i , one for each A i in A, gives rise to the ruler of A, namely R = (R 1 , . . . , R k ).
For each candidate trajectory h of A, there is a candidate diagnosis of A, denoted h [R] , which is the set of faults associated with the faulty transitions within the trajectory: A diagnosis problem for A is a quadruple: where α 0 is the initial state of A, V a viewer of A, O the uncertain temporal observation of A, and R a ruler of A.
The solution of ℘(A), namely ∆(℘(A)), is the set of candidate diagnoses δ associated with the candidate traces of A that are consistent with O: However, the diagnosis engine is not expected to generate the solution of a diagnosis problem based on eqn. (8). In fact, eqn. (8) relies on Bsp(A), the behavior space of A, whose generation is, generally speaking, practically infeasible. Still, the set of candidate diagnoses generated by the diagnosis engine shall equal ∆(℘(A)). In other words, the diagnosis technique shall be not only efficient but also sound and complete.
Example 6 With reference to Example 2, we define a diagnosis problem for L as: where: -In initial state λ 0 , breakers are closed, sensors are idle, and monitors are watch (see component models in Fig. 1 and Fig. 5);

Preprocessing
For efficiency reasons, it is convenient to perform some preprocessing on the CAS specification before the diagnosis engine is operating. The extent of such offline preprocessing is varying and depends on the performance requirements of the application domain. In particular, in order to detect pattern events, we need to maintain the recognition states of patterns. Since patterns are described by regular expressions, specific autoamta-based recognizers are to be generated as follows: 1. For each pattern (p(ω), r), a pattern automaton P equivalent to r is generated, with final states marked by p(ω); 2. The set P of pattern automata is partitioned based on AS and the alphabet of r; 3. For each part P = {P 1 , . . . , P h } in P, four actions are performed: (3a) A nondeterministic automaton N is created by generating its initial state n 0 and one empty transition from n 0 to each initial state of P i , i ∈ [1 .. h]; (3b) In each P i , i ∈ [1 .. h], an empty transition from each non-initial state to n 0 is inserted (this allows for pattern-matching of overlapping strings of transitions); (3c) N is determinized into P, where each final state d is marked by the pattern event that is associated with the states in d that are final in the corresponding pattern automaton (in fact, each state d of the deterministic automaton is identified by a subset of the states of the equivalent nondeterministic automaton; besides, we assume that only one pattern event at a time can be generated); (3d) P is minimized into the pattern space of part P. The tabular representation of the resulting pattern space P P is outlined on the right-hand side. Listed in first column are the states (0 is the initial state), with final states being shaded. For each pair state-transition, the next state is specified. Listed in the last column are the pattern events associated with final states. Pattern space P P is generated in the same way.
Since regular expressions of pattern events for M are defined on different alphabets, six additional pattern spaces are to be generated: P nr , P ni , P ps , P nr , P ni , and P ps .

Problem Solving
Behavior reconstruction in diagnosis of CAS's avoids materializing the behavior of the CAS, that is, the automaton whose language equals the set of CAS trajectories. Instead, reconstruction is confined to each single AS based on the local observation and the interface constraints on pattern-event occurrences coming from neighboring inferior AS's within the hierarchy of the CAS. The essential point is that such pattern events come with diagnosis information from inferior AS's, which is eventually combined with the diagnosis information of the superior AS, thereby allowing for the sound and complete solution of the diagnosis problem.
Intuitively, the flow of reconstruction in the hierarchy of the CAS is bottom-up. For an AS A with children A 1 , . . . , A k , the behavior of A, namely Bhv (A), is reconstructed based on the interfaces of the children, namely Int(A 1 ), . . . , Int(A k ), and the local observation of A, namely O A . The interface is derived from the behavior. Thus, for any AS A, both Bhv (A) and Int(A) are to be generated (with the exception of the root, for which no interface is generated).
As such, the notions of behavior and interface depend on each other. However, such a circularity does not hold for leaf nodes of the CAS (e.g. P and P in Fig. 4): given a leaf node A, the behavior Bhv (A) is reconstructed based on O A only, as no interface constraints exist for A. On the other hand, the behavior of the root node (e.g. L in Fig. 4) needs to be submitted to further decoration-based processing in order to distill the set of candidate diagnoses.
In short, four sorts of graphs are required in reconstruction: unconstrained behavior (for leaf nodes), interface (for non-root nodes), constrained behavior (for non-leaf nodes), and decorated behavior (for root node). Fig. 4 and the diagnosis problem defined in Example 6, namely ℘(L) = (λ 0 , V, O, R), we first need to generate the unconstrained behavior of AS's P and P based on local observations O P and O P , respectively. Displayed in Fig. 8 are the index space, the unconstrained behavior, and the interface relevant to P (left) and P (right). Consider the generation of Bhv (P ). As detailed in the right-hand side of Table 2, each state is identified by five fields: the state of sensor s , the state of breaker b , the event (if any) ready at terminal I(b ), the state of pattern space P P , and the state of the index space Idx (O P ). In what follows, a diagnosis δ is a set of faults. We make use of the join operator between two sets of diagnoses, namely ∆ 1 and ∆ 2 , defined as follows:

Example 8 With reference to CAS L in
Example 9 Shown on the right of each behavior in Fig. 8 are interfaces Int(P ) and Int(P ), derived from the corresponding behavior as follows.
1. The identifier of a component transition t(c) marking an arc of the behavior and associated with a pattern event is replaced by: is faulty, with f being the associated fault. 2. Interpreting as ε-transitions those transitions which are not associated with pattern events, the obtained nondeterministic automaton (NFA) is determinized so that each state of the resulting deterministic automaton (DFA) contains the ε-closure in all its structure (rather than the subset of NFA states only, as is in the classical determinization algorithm [9]). 3. Within each state d of the DFA, each NFA state n is marked by the diagnosis set generated by all paths starting at the root state in d and ending at n, while identifiers of component transitions are eventually removed. 4. Let p be the pattern event marking a transition t exiting a state d in the DFA, ∆ p the diagnosis set associated with p in step 1, and ∆ the diagnosis set associated with the NFA state in d from which t is derived in the determinization process. ∆ p is replaced by ∆ ∆ p .
Example 10 Displayed in Fig. 9 is the constrained behavior Bhv (M ), which is generated based on index space Idx (O M ). Each state of Bhv (M ) includes two sorts of additional information: the pattern events ready (if any) at input terminals of M , and the pair (i, i ) of interfaces states relevant to interfaces Int(P ) and Int(P ), respectively. A final state needs the additional condition that both states in (i, i ) are final in the respective interface. When reconstructing a transition triggered by a pattern event, the latter is required to mark a transition exiting the corresponding state in the interface, otherwise the transition cannot be reconstructed. Compared with Example 9, the derivation of interface Int(M ) shown in Fig. 9 exhibits two peculiarities: step 2 creates a DFA state resulting from two NFA transitions, exiting states 6 and 8 respectively, both marked by pair (b 1 (r), nr(O)), and step 3 shall account for the diagnosis sets associated with pattern events.
Once generated, the behavior shall be decorated by sets of diagnoses associated with states, in a way similar to step 3 in marking NFA states within interface states.

Example 11
Outlined on the right of Fig. 9 is the decorated behavior Bhv * (L). Starting from the singleton {∅} marking state 0, the candidate set associated with state 1 is ∆(1)

Conclusion
As shown in [11], despite their complexity, CAS's can be diagnosed more efficiently than monolithic DES's, whose diagnosis is affected by exponential complexity (in the number of components), either offline when generating the diagnoser [21,20], or online when reconstructing the system behavior [1,13]. Specifically, in [11], complexity (in time and space) is shown to be linear with the number of components within the CAS. The contribution of this paper is to extend diagnosis of CAS's introduced in [11] by means of uncertain temporal observations. Unlike a (certain) temporal observation, which consists of a sequence of totally ordered observable labels, within an uncertain observation, observable labels are both uncertain and partially ordered. Despite this dissimilarity, an uncertain temporal observation can be thought of as a set of candidate temporal observations. The notion of index space allows the diagnosis engine to account for all candidate temporal observations by means of a scalar value, the observation index , which is a surrogate of an integer indexing a temporal observation within states of the reconstructed behavior. In other words, when uncertain temporal observations are considered, the integer is replaced by a scalar value identifying a state of the index space. Consequently, neither space nor time complexity is expected to deteriorate when uncertain temporal observations come into play in diagnosis of CAS's.
Despite being introduced based on a simple reference example, diagnosis of CAS's is a general technique which can be applied to any real system that can be conveniently modeled as a CAS, in terms of interconnected AS's and relevant pattern events.
Diagnosis of CAS's is still in its infancy. Further research is expected in several directions. Offline preprocessing can be performed in order to accelerate the online diagnosis engine by generating in a suitable form the behavior space of each AS embedded in the CAS. Also, monitoring-based diagnosis can be envisioned, where the diagnosis engine does not operate a posteriori but, rather, it reacts to each fragment of available observation by providing the diagnosis information relevant to such a fragment only.