Active Learning of Extended Finite State Machines

Once they have high-level models of the behavior of software components

Once they have high-level models of the behavior of software components, engineers can construct better software in less time.A key problem in practice, however, is the construction of models for existing software components, for which no or only limited documentation is available.In this talk, I will present an overview of recent work by my group -done in close collaboration with the Universities of Dortmund and Uppsala -in which we use machine learning to infer state diagram models of embedded controllers and network protocols fully automatically through observation and test, that is, through black box reverse engineering.
Starting from the well-known L * algorithm of Angluin [6], our aim is to develop algorithms for active learning of richer classes of (extended) finite state machines.Abstraction is the key when learning behavioral models of realistic systems.Hence, in practical applications, researchers manually define abstractions which, depending on the history, map a large set of concrete events to a small set of abstract events that can be handled by automata learning tools.Our work, which builds on earlier results from concurrency theory and the theory of abstraction interpretation, shows how such abstractions can be constructed fully automatically for a restricted class of extended finite state machines in which one can test for equality of data parameters, but no operations on data are allowed [2,1].Our approach uses counterexample-guided abstraction refinement (CEGAR): whenever the current abstraction is too coarse and induces nondeterministic behavior, the abstraction is refined automatically.In the talk, I will compare our approach with the related work of Howar et al [8,9] on register automata.
Using the LearnLib [11,10] tool from Dortmund in combination with Tomte [1], a prototype implementation of our CEGAR algorithm, we have succeeded to learn models of several realistic software components, such as the SIP protocol [3,1], the new biometric passport [5], banking cards, and printer controllers.
Once we have learned a model of a software component, we may use model checking technology to analyze this model and model-based testing to automatically infer test suites.This allows us to check, for instance, whether no new faults have been introduced in a modified version of the component (regression testing), whether an alternative implementation by some other vendor agrees with a reference implementation, or whether some communication protocol is secure.Using a well-known industrial case study from the verification literature, the bounded retransmission protocol [7], we show how active learning can be used to establish the correctness of protocol implementation I relative to a given reference implementation R. Using active learning, we learn a model M R of reference implementation R, which serves as input for a model based testing tool that checks conformance of implementation I to M R .In addition, we also explore an alternative approach in which we learn a model M I of implementation I, which is compared to model M R using an equivalence checker.Our work uses a unique combination of software tools for model construction (Uppaal), active learning (LearnLib, Tomte), model-based testing (JTorX, TorXakis) and verification (CADP, MRMC).We show how these tools can be used for learning these models, analyzing the obtained results, and improving the learning performance [4].

Fig. 1 .
Fig. 1.Use of automata learning to establish conformance of implementations