Automatic Transition System Model Identi(cid:12)cations for Network Applications from Packet Traces

. A wide range of network management tasks such as balancing bandwidth usage, (cid:12)rewalling, anomaly detection and diﬀerentiating traﬃc pricing, depend on accurate traﬃc classi(cid:12)cation. Due to the diversity and variability of network applications, port-based and statistical signature detection approaches become ineﬃcient and hence, behavioral classi(cid:12)cation approaches have been considered recently. However, so far, there is no automated general method to obtain the behavioral models of applications. In this research, we propose an automatic procedure to infer a transition system model from generated traﬃc of an application. Our approach is based on passive automata learning theory and evidence driven state merging technique using the rules of the network domain. We consider the behavior of well-known network protocols to generate the model which includes unobserved behaviors and excludes invalid ones as much as possible. To this aim, we present a new equivalence relation regarding the given protocol behaviors to induce proper state merging conditions. This idea has led the time complexity order of the algorithm to be linear rather than exponential. Finally, we apply the model of some real applications to evaluate the precision and execution time of our approach.


Introduction
The importance of traffic classification for network administration tasks such as ensuring the security and quality of service of applications in computer networks has long been acknowledged. The growing number of network applications and protocols has limited the efficiency of classical methods. In the past, packets were easily classified by their transport layer ports. As the use of random or nonstandard ports is dramatically increasing, payload inspection [1,2] and statistical methods [3,4] are proposed. However, drawbacks of these techniques such as insufficiency in encrypted traffic and their high computation cost lead to the emergence of behavioral classifiers. The merit of the behavioral classification is to use the behavioral pattern of an application instead of the content of packets or flow statistics. This point makes these classifiers useful for encrypted traffic or unknown protocols. But, so far, no automated method to obtain the behavioral model of applications is provided which currently requires human inspection.
There are a few studies for automating inference of the behavioral patterns which are application specific and cannot be widely used, for instance [5] has been presented for P2P-TV traffic.
To overcome the challenges of traffic classification and behavioral pattern detection approaches, we aim at providing an automatic approach to derive formal behavioral models, i.e., transition systems, for applications in the domain of the network. Our focus is mainly on programs at the application layer of the TCP/IP model [6]. We reduce our problem to the automata learning problem [7] which aims at inferring an automaton which accepts the set of given words. If the input traffic is considered as the given words of the language, then the desired model will be the identified automaton. However, the classical approaches in the literature of automata learning are not efficient to derive the most general model such that not only it subsumes valid unobserved traces as much as possible, but also disallows invalid traces. This is not achievable unless concepts of the domain are utilized to tailor the basic algorithm.
Intuitively, we assume that the behavior of an application can be identified in terms of how it executes well-known network protocols (below the application layer), abstracting the state variables of the application. Therefore, given the formal specification of well-known network protocols and execution traces of a program, we automatically generate a transition system. Hence, we customize the automata learning algorithm of [8] using rules in the network context to derive the most general model. Noting to the fact that each trace of a program is an interleaving of network protocol execution traces, called flows, the inferred model must preserve the behavior of each network protocol. In other words, the model of various applications differ in how they interleave the traces of well-known network protocols. Therefore, we take advantage of a behavioral pre-order relation in the theory of transition systems to conduct the process of model generation such that invalid traces are prohibited. Due to our abstraction (of application variables), the states of the inferred model which identify the same state with the same number of the flows for each network protocol, can be aggregated together using the counter abstraction technique [9] to include not observed behaviors.
To illustrate the applicability of our approach, we have implemented our algorithm in a tool and applied it on two version control system applications and two remote desktop sharing programs. Our results indicate that the techniques that are used to generalize the model, are sufficiently conservative such that no invalid trace is included and unobserved behaviors are covered with a high precision. Furthermore, the worst case time complexity order of our algorithm is linear rather than exponential in contrast to the related automata learning techniques.
We first explain the network concepts utilized in our proposed method in Section 2.1. We provide an overview of automata learning in Section 2.2. We define the main concepts related to transition systems in Section 2.3. The counter abstraction technique is introduced in Section 2.4 which is used to find the equivalent states.

Network Background
Each packet transferred across a network is composed of two parts: the header and the content. The header includes the control information needed by the corresponded protocol and is appended to the beginning of the content. Protocols defined over the Internet follow the TCP/IP layered architecture [6]. This model consists of four layers: Application, Transport, Internet, and Link. The layered architecture means that the packet content of each layer is the built packet of its upper layer.
To send a message over the network, at first, the Application layer receives the user message from the software which is running (e.g., email client, web browser, instant messaging software, etc.), and passes it to the lower layer. The Transport layer segments data from the upper levels, then establishes a connection between the packet's point of origin and where it has to be received, and ensures that the packets are reassembled in the correct order [10]. The Network layer is responsible for the packet's addressing and routing. Finally, the Link layer manages the formats of packets based on the mediums being used in transmitting the packets. For each layer, a number of different protocols is standardized. Protocols are divided into connection-less and connection-oriented categories. Connectionoriented are those that need to establish a connection before data transmission. Thus there are handshake (initialization) and finalization phases in these protocols. These phases are not required in connection-less protocols. They just send a request packet for each desired data. A sequence of packets which have the same value for the parameters source IP, source port, destination IP, destination port, and the protocol name is called flow. An execution of an application gives rise to initiating a number of flows. These flows are the connections which are established between the initiator system and the other end systems.

Automata Learning
There are equivalent keywords in the literature to automata learning such as grammar inference or regular inference, language or automata identification. The goal of automata learning is to find a (non-unique) smallest automaton which is consistent with the set of given examples [11]. Gold has proved that this problem when the alphabet is finite, the two input sets of positive and negative samples are given, and the number of the states of the output automaton is determined, is a NP-complete problem [7]. If all of the words with the size equal and less than n are given, then it is possible to solve the problem in the polynomial time.
The algorithms of this problem can be divided into the two categories: active and passive.
The active techniques are based on Angluin L * algorithm which solves the problem in polynomial time by asking some membership or equivalence queries [12]. It is assumed that there is an oracle than answer the required queries. Passive techniques tend to build tree-like automata, called prefix tree automata, from input examples and then by merging their states according to some heuristics evidence, achieve the smallest deterministic finite automata (this technique is called Evidence Driven State Merging (EDSM) [8]). In this category, there is no oracle and the algorithm should find the solution only from the positive and negative words of the language.
Since we aim to infer the behavioral model only from the input traces when there is no oracle system, our solution in this paper is based on the passive techniques. The most important challenge of these techniques is how to find the candidate states which should be merged to include unobserved traces. KTail algorithm merges states which have K common future (i.e., states that accept the same set of strings of length K) [13]. Several research has been conducted to decrease the O(n 2 ) search space of the states which should be merged. Red-Blue is one of them and becomes a popular framework which limits the number of pairs of states by determining sufficient conditions on their colors.

Transition Systems
In this section we define the concepts used in the proposed methodology which are related to transition systems. These definitions have been adapted from [14].

Definition 1 (Transition System).
A transition system is a tuple TS = (S, Act, →, s 0 , ↓) where S is a set of states, Act is a set of actions, → ⊆ S × Act × S is a transition relation, s 0 is the initial state, and ↓ ⊆ S is a set of final states. We use s α − → t to denote (s, α, t) ∈→. TS is called action-deterministic if for all s ∈ S, there are not (s, α, t) ∈→ and (s, α, v) ∈→, where α ∈ Act and t, v ∈ S, such that t ̸ = v.
From this definition, the transition system of Fig. 1a is action-deterministic. A finite execution fragment η = s 0 α 0 s 1 α 1 . . . α n s n+1 of TS is an alternating sequence of states and actions starting with the initial state and ending with a final state such that (s i , α i , s i+1 ) ∈→ where 0 ≤ i ≤ n. A finite sequence of actions ϱ = α 0 α 1 . . . α n of TS is an execution trace if ∃s 0 , . . . , s n+1 ∈ S such that η = s 0 α 0 s 1 α 1 . . . α n s n+1 is an execution fragment. For instance, s 0 x s 1 a s 2 x s 3 a s 4 y s 5 b s 6 y s 7 and s 0 c s 8 c s 9 x s 10 d s 11 y s 12 are the execution fragments of the transition system in Fig. 1a. By eliminating states from these sequences, the execution traces are generated (x a x a y b y and c c x d y respectively).
There is an abstraction operator which has the responsibility to hide some actions of a transition system to make them internal and thus unobservable to external entities. We formally define the abstraction operator in the following definition: The left transition system in the Fig. 1b is the result of applying an abstraction on the transition system what has been abstracting {x, y, c, d} of Fig. 1a.
To compare the behavior of transition systems, several behavioral pre-order and equivalence relations have been proposed ranging from strict to liberal ones. The Simulation relation is a finest pre-order relation which requires a transition system to precisely mimic transitions of another one [15]. In the case of existing internal actions in the system, the Weak Simulation relation is defined to relax the conditions only for the observable actions. Let τ − → * be reflexive and transitive closure of τ -transitions:

Definition 3 (Weak Simulation Relation).
A binary relation R on the set of states S is a weak simulation relation if for any s 1 , s ′ 1 , and t 1 ∈ S and α ∈ Act, s 1 R t 1 implies:

For the given transition systems TS
Hence, a minimal weak simulation relation R is not necessarily unique. An example of weak simulation relations between two transition systems has been illustrated in Fig. 1b.

Counter Abstraction
The Counter Abstraction is a technique to abstract states of a system. The idea is to represent each state as a vector of counters one per each value instead of a vector of state variables. For instance, consider there are three integer variables x, y and z. The states {x = 1, y = 2, z = 1}, {x = 2, y = 1, z = 1} and {x = 1, y = 1, z = 2} are equivalent because they have the identical counter abstracted state {count(1) : 2, count(2) : 1}. This technique has been used in various applications. In symmetry reduction which is a technique to avoid state space explosion problem, the counter abstraction has the role of finding identical clusters of states space so as to reduce the symmetry states and decrease the cost of model checking [16]. This concept is also used in [9] in order to abstract a parameterized system of an unbounded size into a finite-state system to be verifiable.

Methodology
In this section, our proposed methodology for learning the network behavioral model of an application is discussed. To derive this model, it is assumed that the only application-dependent input is the set of network traffic captured during different executions of the application. We develop our solution based on the automata learning techniques and provide an evidence driven state merging algorithm based on rules in the network context to derive the most general model which subsumes unobserved traces as much as possible. This algorithm is based on the specification of well-known network protocols.

Problem Statement
The captured traffic is the sequence of packets sent or received as the result of the execution of an application during a specified time. Each packet contains data and headers of layers as the result of encapsulation. We only consider information of the upper layer instead of the whole headers and data (for instance, we only take into account information of the application layer of HTTP packets while they subsume information of the TCP layer).
To facilitate the processing of each packet content and close up the concept to the automata theory, a function is exploited which corresponds each packet to its equivalent action-like abstract representation. This function is defined as P M apper : Packets → Act where Packets is the set of possible packets captured through the pre-processing step. For example, a received TCP packet in the handshake phase is mapped to TCPInitI which is a member of the model actions set. In Section 4 we explain how the action set is defined in terms of the packet information. Therefore, by applying the PMapper function to each captured packet, we obtain a trace of actions. Hence, one input of our problem is N executions of an application which are transformed by the PMapper into the N action traces (packet trace) denoted by PT , ranged over by π. Let π i indicate the i th action of the trace π, and len(π) show the length of the trace. We remark that the length of each input execution is arbitrary, and in potentially independent of the length of other traces.
Besides the packet traces, another important input of our problem is the specifications of K network protocols. We assume that the specifications are provided in the form of action-deterministic transition systems The goal of our problem is to derive a model in the form of transition system, where Traces(M ) is the set of the execution traces of M . In fact, each of the input traces is an execution trace of the desired transition system.

Projection Relation
Initially, a tree-like automaton which consists of all action traces is generated. Intuitively, each application needs to establish a number of connections with other systems in order to perform each of its functionality. Each connection follows a protocol specification. For instance, an execution of the Map application of Windows 8 contains four flows where two are for the DNS protocol, one for the TCP and one for the TLS protocols. Hence, each state of the initial transition system can be considered as a vector of states, each of which identifies a state of the corresponding protocol. Note that the size of the vector is equal to the number of flows. To generalize the initial transition system to cover more behavior, some states are selected to merge together. Hence, the new model accepts extra not observed valid behavior. Merged states are those called project equivalent. Two states are project equivalent if their vectors (of flow states) are identical with respect to the counter abstraction technique. For the sake of efficiency, the resulting transition system is determined.
Before describing the method, we mention some definitions and theorems. As we explained, each packet from the application execution belongs to a flow. We assume the total number of the flows of the all input traces is denoted by F . Furthermore, the auxiliary function Flow : S × Act × S → N at, defined over the initial transition system, maps each packet, specified by the transition with the corresponding action of the packet, to its flow number such that Flow (s, α, t) ≤ F , where s, t ∈ S and α ∈ Act. From the flow definition each flow has a protocol attribute. Let function Protocol identify the protocol name of a flow, denoted by

Definition 4 (Projection Relation). Let TS
be transition systems such that TS 1 ⪯ w TS 2 witnessed by a minimal weak simulation relation R. Two states s 1 and s 2 of S 1 have projection relation under TS 2 if ∃ t ∈ S 2 : s 1 R t ∧ s 2 R t. Then, we say that s 1 and s 2 are the same projection of t under the transition system TS 2 , denoted by s 1 ∼ ⇂TS 2 s 2 .
To define states that are project equivalent, the following lemma identifies the conditions under which the project relation can act as an equivalence relation, and consequently can partition states. If a transition system has a tree-like structure, any of its two states can be connected by a unique simple path.

Lemma 1. Let TS
be transition systems such that TS 1 is a tree-like transition system and TS 2 is an action-deterministic transition system without any τ -transition (i.e., τ ̸ ∈ Act 2 ). If TS 1 is weakly simulated by TS 2 , witnessed by a minimal weak simulation R, then each state of S 1 relates to only one state of S 2 under R: be transition systems such that TS 1 is a tree-like transition system and TS 2 is an action-deterministic transition system without any τ -transition (i.e., τ ̸ ∈ Act 2 ). The projection relation under the transition system TS 2 over the states of TS 1 is an equivalence relation.
See Appendix A.1 for the proof of Lemma 1 and Appendix A.2 for the proof of Theorem 1. As a consequence of Theorem 1, the states of a transition system can be partitioned into equivalance classes by a projection relation. The equivalence class for projection relation is defined in the following definition.

Definition 5 (Projection Relation Partitioning). Let TS
be transition systems such that TS 1 is a tree-like transition system and TS 2 is an action-deterministic transition system without any τtransition (i.e., τ ̸ ∈ Act 2 ). States of TS 1 are partitioned under the projection relation under TS 2 into the equivalence classes each of which is identified by the unique state t ∈ S 2 such that: where R is a minimal weak simulation relation.   Running Example. Consider the specifications of two sample protocols in Fig.  2a, we assume that two input traces x a x a y b y and a a x b y are given. In the first trace, there are three flows, two are of the protocol P 2 and one of the protocol P 1 . They are given such that the first x and the last y belongs to a flow and the second x and the first y are related together. In the second trace, there are two flows each of which is instantiated from each protocol. It is assumed that the flows are enumerated by the order of their first packets. We use this example in the rest of this section.

Step 1: Building the Initial Transition System
From Section 2, there is an execution fragment for each execution trace. We generate for each input trace π, its corresponding execution fragment η π = s 0 π 1 s π 1 π 2 . . . π len(π) s π len(π) . Note that the initial state in the fragments of the all traces are intentionally identical. In the first step, the tree-like transition system M 0 is built from aggregating the execution fragments of the input traces. Therefore, the initial transition system M 0 = (S, Act, →, s 0 , ↓) is obtained as follows: 2b is the result of performing this step. This initial transition system does not cover new execution traces of the application which are not given in the input. Therefore, some operations are needed to generalize the initial transition system and cover more execution traces. Next steps (step 2 and 3) are the efforts to reach this goal.

Step 2: Generalizing by Counter Abstraction
The generalization method in this step is addressed in two sub-steps.

Finding Equivalent States
For instance, the abstracted transition system of Fig. 2b under flow f 1 (contains  a a b and has been illustrated in the left side of Fig. 1b. We remark that if the abstraction operator is defined under protocol actions instead of a flow, then the resulting abstracted transition system may not preserve the protocol behavior due to interleaving of flows. For instance, the abstraction of the transition system in Fig. 2b under the protocol P 2 contains the sequence of x τ x τ y τ y at its left branch which does not have any weak simulation relation with P 2 . Let count(s, t) denote the number of flows like f i that the state t of the protocol P j weakly simulates s in the abstraction of M 0 under f i : We remark that each state s is uniquely simulated by a state t as the result of our projection relation.
The two states s 1 and s 2 can be aggregated together under the counter abstraction technique if and only if ∀ j ≤ K, t ∈ S j : count(s 1 , t) = count(s 2 , t).
The results of applying the counter abstraction on the states of the initial transition system of the running example are presented in Table 1. To obtain each row, at first, the projection relation under abstraction of each flow is computed. After that, the number of flows in each state of protocols is counted. First row shows that all the flows are related to the initial states of the protocols. Because of the transition (s 0 , x, s 1 ) of flow f 1 , the state of s 1 is simulated by the state v 2 of the protocol P 2 , and hence, the counter of flows in the state v 1  (s, t1) c(s, t2) c(s, t3) c(s, v1) c(s, v2) state c(s, t1) c(s, t2) c(s, t3) c(s, v1) c(s, v2) s0   Fig. 2c (without the tick transition a on the state s 3 ) is the final result of performing this step. Remark 1. After this step, the resulting transition system is action-deterministic. We have proved this fact in the appendix.

Step 3: Generalizing by Completing Transitions
The next generalization idea is completing the transitions set according to the transition systems of the network protocols. We add self-loops of each protocol state t ∈ P i , for some i ≤ K, to the state [s] if count(s, t) > 0. Adding such transitions does not affect the equivalent classes of M 1 . Then, after applying this step, the resulting generalized transition system is M g = (S ′ , Act, → g , s ′ 0 , ↓ ′ ) such that: After applying this step, the tick transition a on state s 3 is added to the Fig.  2c. The time complexity of the algorithm is linear in the size of the input. See Appendix A.4 for the psuedocode of the algorithm and a discussion of the time complexity.
To evaluate the proposed method, we have implemented our algorithm in Java and applied it to some applications. Two categories of applications,version control system and remote desktop sharing, are selected for testing our methodology. For the first category, two applications TortoiseSVN client of SVN 1 and Source Tree Client of GIT 2 are selected. The traffic of the update command of these applications are gathered as their captured packet traces. Also, we have selected two remote desktop sharing applications, namely TeamViewer 3 and JoinMe 4 , for which their traffic is encrypted. Hence, they cannot be easily identified by signature based approaches on the content of packets. Each one has run for 100 times and their network traces are captured via the Wireshark 5 tool. Packets of the application layer protocols (used by these programs), namely TCP, SSL, SSLv2, TLSv1, TLSv1.2, HTTP and UDP have been considered and the others are filtered. The more protocols are considered, the more precision will be achieved. Some preprocessing operations have been performed to eliminate the repetitive and truncated packets. Also, we have reassembled segments of fragmented packets. The mapper function which is responsible for translating the packets to their corresponding actions is defined such that it assigns the concatenation of the packet protocol name, the control phase and the direction to each packets. We divide the operation of each protocol into a set of phases to abstractly consider its progress. The control phases are assumed to be Init, Data, and Fin for connection-oriented protocols and Init and Data for connection-less ones. Intuitively, Init indicates to the establishment of the connection, Data to the transmission of data, and Fin to the termination of the connection. The direction is a binary tag which can be either I or O to indicate that the packet is sent or received, respectively. The amount of detail about packets embedded in their corresponded actions, shows how much the final generated model is sensitive to packet variations. By this mapper function, different manners of each control phase (initialization/ transferring data/ finalization) are considered to be the same.
We assume that the specifications of protocols are given in the form of transition systems and defined according to the mapper function abstraction level. By applying the mapper function on the packet traces, 100 action traces have been obtained for each application. These traces are divided into the train and test sets. The train traces are the input of our proposed method to infer the behavioral model which should accept the test traces. The overall scheme of an obtained model is shown in [?].We use the cross validation technique for 100 times to calculate the average value of precision with a reasonable confidence interval. Table 2 shows the final result of our experiments. Regarding to impossibility of measuring the real value of false positive rate (because it is not possible to gather all negative traces), researchers tend to consider the traces of the other applications which have the same functionality. Thus, we use traces of applications in the same category crossly to calculate the false positive rates. The major point is that by applying our proposed generalization steps, the false positive rate does not grow. This means that our conservative approach prevents over-generalization from occurring. Each generalization step improves the completeness of the model. Note that since the update command of SVN and GIT generates a short packet trace, their captured traffic are similar and misclassified. As a future work, we plan to map packets to parametric actions in order to enhance the precision of the classifier. Adding (self-loop) transitions has increased our precision by 31 percent in the worst case. In the next step, we aim to relax unnecessary interleaving which stems from the concurrent development of applications or parallel network connections. Such a step which is our future work increases our precision to 100 or 91 percent. Now, we have applied the step manually, by examining the counter examples of the previous step. Those traces which can be covered by the generated model via modifying the orders of packets, is counted as the successful result for this step. We plan to automate this idea so as to automatically induce strict orders among transitions and relax the unnecessary ones in our future work. Our approach fails to recognize 9 percent of test traces (the last row of the Table 2) which are mainly those that include new unpredictable subsequences based on the train set.

Comparison with other packet classification methods.
To clarify the applicability of our methods, it should be compared with other packet classification techniques which we have described in Section 1. Portbased detection method does not have the ability of detecting most of the current applications because that they tend to use random or non-standard ports. Due to growing usage of encrypted traffic payload inspection methods become useless and it can not be used in our dataset. Furthermore, the proposed behavioral classification methods are application specific (e.x. for P2P applications) and they are not enough general to apply to our selected applications. Finally statistical classification methods are the only related work which we can compare our work with. To this aim, Netmate 6 is used to obtain the feature vectors of flows of captured traffic. Then, using Weka tool-set 7 , the average precision of classification and false positive metrics among three algorithms SVM, Native Bayes and C4.5 were measured. The final result of these metrics are reported in Table 3.

Related Work
Two research areas are related to our problem. In the following we explore related work in each area. Automata Learning. Some research has been conducted to extend the expressiveness of inferred models. The KTail algorithm is extended in [17] with the aim to generate models from methods invocation traces. This approach is conducted in four steps. At first, the traces with the identical sequence of methods (those differ in the values of parameters) are merged together. Next, constraints on parameters are obtained via Daikon invariant detector [18]. At the third step, a prefix tree automaton is built. Finally, the states are merged according to a criterion which can be equivalence of method and parameters, weak subsumption or strong subsumption for their next k actions. In [19], the authors extends the Angluin L * algorithm to infer relationships between input and output parameters in the form of the Mealy machines. In [20], the automate learning problem is extended to infer deterministic timed automata. Some studies address the application of automata learning problem. Among them, [21] is the most related work to ours which elaborates on inferring mealy machine models of communication protocols. The authors indicate that the parameters in the message format of protocols such as sequence number, configuration parameter and session id, result in infinite-states model. To minimize the state space, the abstract representation of protocol states are derived automatically in terms of operations that a requester and responder may perform. Hence, they have a similar assumption to ours which is the existence of protocol specifications. Their algorithm is based on query evaluation (active automata learning), while, we have extended the passive automata learning. Also, there are other applications of automata learning in different areas, especially in software specification mining [22,23] which are not directly related to our work and we do not elaborate on.
Reverse Engineering of Protocol Specification. In this part we enumerate the works that focus on inferring protocol specifications from traffic. These works are related to ours because of their restriction on inferring a model by observing the behavior of the application in a black-box style. In [24], a probabilistic method was investigated to obtain a finite state machine of a protocol. It was assumed that the format of protocol messages is not determined. At the first step, messages are segmented into l-length bytes and clustered with the aim of recognizing their control parts. Next, the most frequent patterns are selected as message units by statistical analysis. Then, the main messages of the protocols are defined by computing the centers of the clusters. Finally, the finite state machine is constructed whose states are the main messages and probabilistic transitions are the frequencies of each pairs of messages.
In ReverX algorithm [25], a prefix tree automaton is built from traces and then the states which are the destination of identical transitions, are merged. Therefore, transitions with the same source and destination are created. They claim that if these transitions are merged the parameters of message headers are induced. Actually, despite their work is similar to us in using passive automata learning, we differ in the conditions for state merging. If the states are just similar in their 1-future action, they merge them, while we have investigated a domain specific condition based on well-known protocol.

Conclusion
The classical methods which identify the traffic based on packet header information or statistical metrics, are not effective anymore. Classification approaches based on the behavioral patterns of applications are of a new trend to this problem. No general and automated method to derive behavioral models has been provided. We proposed a method to reach this goal based on the automata identification problem and evidence driven state merging technique combined by transition system theories, namely behavioral pre-order relations and the counter abstraction. Intuitively, we assumed that the behavior of an application can be identified in terms of how it executes well-known network protocols, abstracting the state variables of the application. Hence, we have introduced our merging conditions to identify the equivalent states based on the specification of a set of well-known network protocols such as TCP, TLS, SSL, etc. To this aim, we have provided the projection relation to identify the states with the same number of the flows for each network protocol which can be counted together using the counter abstraction technique.
We have presented two extra steps to complete the inferred model to cover unobserved behaviors of the software. At first, the model is completed by including the self-loop behaviors of the network protocols. After that, the possible valid interleaving of the packets based on the repetition of their orders is predicted. The model is extended to subsume such predicted orders. We also implemented and evaluated our procedure which does not require human inspection. The experiments show very encouraging results that the generalization steps significantly increase the accuracy from 0% to 91% in the worst case. The future work is to mechanize the last step which induces the essential orders with the aim of relaxing the unnecessary ones. We plan to extend our case study and compare the result of our method with the real traffic classification tools.