On Learning Mobility Patterns in Cellular Networks

. - This paper considers the use of clustering techniques to learn the mobility patterns existing in a cellular network. These patterns are materialized in a database of prototype trajectories obtained after having observed multiple trajectories of mobile users. Both K-means and Self-Organizing Maps (SOM) techniques are assessed. Different applicability areas in the context of Self-Organizing Networks (SON) for 5G are discussed and, in particular, a methodology is proposed for predicting the trajectory of a mobile user


Introduction
The new generation of mobile and wireless systems, known as 5th Generation (5G), intends to provide solutions to the continuously increasing demand for mobile broadband services associated with the massive penetration of wireless equipment while at the same time supporting new use cases associated to customers of new market segments and vertical industries (e.g., e-health, automotive, energy).As a result, the vision of the future 5G Radio Access Network (RAN) corresponds to a highly heterogeneous network with unprecedented requirements in terms of capacity, latency or data rates, as identified in different fora [1] [2].To cope with this heterogeneity and complexity, the RAN planning and optimization processes can benefit at a large extent from exploiting cognitive capabilities that embrace knowledge and intelligence.
In this direction, legacy systems already started the automation in the planning and optimization processes through Self-Organizing Network (SON) functionalities [3].In 5G, considering also the advent of big data technologies [4], it is envisioned that SON can be further evolved towards a more proactive approach able to exploit the huge amount of data available by a Mobile Network Operator (MNO) and to incorporate additional dimensions coming from the characterization of end-user experience and end-user behavior [5].Then, SON can be enhanced through Artificial Intelligence (AI)-based tools, able to smartly process input data from the environment and come up with knowledge that can be formalized in terms of models and/or structured metrics that represent the network behavior.This will allow gaining in-depth and detailed knowledge about the whole 5G ecosystem, understanding hidden patterns, data structures and relationships, and using them for a more efficient network management [6].AI-based SON involves three main stages [6]: (i) the acquisition and preprocessing of input data exploiting the wide variety of available data sources; (ii) the knowledge discovery that smartly processes the input data to come up with exploitable knowledge models that represent the network/user behavior; and (iii) the knowledge exploitation stage that applies the obtained models to drive the decisionmaking of the SON functions.This paper focuses on the knowledge discovery stage and, in particular, on automatically learning the mobility patterns of the mobile users, trying to identify if the traffic across the cells in a scenario follows specific patterns that can be characterized in terms of prototype trajectories followed by many users.
Different works of the literature have addressed the analysis of trajectories in different contexts such as hurricane trajectories, animal movements, public transportation, etc. Various tools have been considered, such as Self-Organizing Maps (SOM) together with visual analysis [7], density-based clustering [8][9] or Principal Component Analysis [10].In wireless networks, [11] proposed a trajectory prediction strategy to deal with routing in mesh sensor networks.It is based on clustering similar trajectories followed by wireless nodes and using them for making predictions of other nodes.However, the concept of trajectory in [11] is defined by the set of nodes that a mobile node would associate with to send or receive data along a path, but not by the geographical locations.Instead, in our work we intend to derive a deeper knowledge about trajectories based on analyzing the geographical coordinates.In turn, [12] and [13] address the problem of classifying the trajectory followed by a mobile terminal based on a set of reference trajectories in order to optimize the handover process in LTE.However, while [12] and [13] use a simple method for building the set of reference trajectories, based on monitoring certain users with a given probability and adding their trajectories to the set, in our approach we propose the use of clustering techniques, which are more powerful for identifying the most representative trajectories.
In this context, the approach proposed in this paper considers the use of clustering techniques, namely K-means and SOM, to learn the mobility patterns existing in a cellular network.These patterns are materialized in a database of prototype trajectories obtained after having observed multiple trajectories of mobile users.Different applicability areas for these patterns in the context of 5G-SON are discussed and, in particular, a methodology is proposed for predicting the trajectory of a mobile user.
The rest of the paper is organized as follows.Section 2 describes the proposed methodology based on clustering tools for learning mobility patterns.Section 3 discusses the applicability areas and describes the approach for identifying the trajectory of a mobile user.Proposed approach is evaluated in Section 4, while Section 5 summarizes the concluding remarks.

Mobility pattern knowledge discovery
Current cellular networks like 4G already include the capability that the User Equipments (UEs) provide geolocation information, including both geographical coordinates and altitude, as part of the radio measurement reporting processes [14].Location information can be obtained from UEs in connected mode, who periodically transmit measurement reports to the network.Furthermore, thanks to the use of Minimization of Drive Tests (MDT) feature [15], UEs in idle mode can log measurements and transmit them later on when the UE enters in connected mode.These capabilities enable MNOs to collect large amounts of data that include valuable knowledge about the spatio-temporal traffic distribution across the cells.This paper proposes a methodology to analyze this data and identify the existing mobility patterns of the UEs.
The approach for learning mobility patterns is graphically illustrated in Fig. 1.It operates on a long-term basis after having observed a large amount of connected and idle mode UEs in different time periods of a certain geographical area and analyzes the collected location information from these UEs to identify the existence of prototype trajectories.As shown in Fig. 1 the first step is the pre-processing, which analyzes consecutive reports for each UE and extracts the geolocation information in order to build a trajectory for this UE.A trajectory is defined here as the concatenation of N coordinates at consecutive time instants t 1 ,...,t N .Then, assuming for simplicity twodimensional (2D) coordinates (x,y), the trajectory for the j-th UE is given by the vector of dimension B=2N denoted as r j =[x j (t 1 ),y j (t 1 ),…, x j (t N ),y j (t N )].The result of the pre-processing task will be a total of J trajectories r j , j=1,...,J.The second step is the clustering, which processes the set of J trajectories by grouping them in K clusters in a way that trajectories of the same cluster are similar among them and different from the trajectories of the rest of the clusters.Two alternative clustering techniques are considered in this work:  K-means: This strategy belongs to the family of partitioning methods.It groups the J input trajectories in K clusters by trying to maximize that the similarity between trajectories of the same cluster and to minimize the similarity between trajectories of different clusters, using the Euclidean distance as a metric of similarity.The process can be summarized as follows (see [16] for further details): (a) The algorithm starts by selecting randomly K out of the J input trajectories.Each of these K trajectories represents an initial cluster.For each cluster k, the algorithm computes the centroid s k .At this initial stage, where each cluster contains only one trajectory, the centroid s k equals the selected trajectory for the k-th cluster.(b) Each of the remaining J-K trajectories is assigned to the cluster to which it is the most similar, based on Euclidean distance between the trajectory and the centroid of each cluster |r j -s k |.Once all the J trajectories have been clustered, the new values of the centroids s k are recomputed.In particular, the i-th component of s k is the average of the i-th components of all the trajectories belonging to the k-th cluster.(c) Using the new values of the centroids s k , each of the J trajectories r j is reassigned to the cluster with lowest distance |r j -s k |.The new centroids are recomputed and this step is iteratively repeated until convergence (i.e. until there are no changes in the obtained clusters after two consecutive iterations).(d) At the end of the process, each cluster k=1,...,K will contain a number of input trajectories N k and its centroid s k will be the so-called prototype trajectory that is taken as a representative of all the trajectories belonging to this cluster. Self-Organizing Map (SOM): This clustering strategy relies on a neural network model with a total of K neurons and where each neuron is characterized by a Bdimensional weight vector s k .The process can be summarized as follows (see [17] for details): (a) The weight vectors s k are initialized.This can be done randomly or through the linear initialization method described in [17].(b) An iterative unsupervised learning process is used to update the values of the weight vectors s k of the different neurons according to the Kohonen's algorithm [17] based on the input trajectories r j .In essence, at iteration t the algorithm identifies, for each trajectory r j the winning neuron as the one with the lowest Euclidean distance |r j -s k |.Then, the algorithm updates the weight vector of this winning neuron k as s k (t+1)=s k (t)+(t)(r j -s k (t)) where (t) is a scalar-valued adaptation gain that decreases with successive iterations.A similar update is performed for the weight vectors of the rest of neurons k'k but in this case the adaptation gain (t) is multiplied by a neighborhood function that decreases with the distance between neurons k' and k.The process is repeated for a certain number of iterations.(c) At the end of the process, all the input trajectories that have neuron k as winning neuron form the k-th cluster.The number of trajectories in the k-th cluster is N k , and the prototype trajectory of this cluster is the weight vector s k .
As shown in Fig. 1, the prototype trajectories obtained as a result of the clustering will be stored in the database.In addition, two statistical indicators are also included for each cluster to assess how representative this cluster is:  Percentage of hits (A k =N k /J): It is the percentage of input trajectories that belong to the cluster k.The prototype trajectories of clusters with a high value of A k will be more frequent and representative of the scenario. Average squared Euclidean distance of the trajectories in k-th cluster (E k ): It is a metric that captures the degree of similarity between trajectories of the same cluster with respect to the prototype trajectory s k of the cluster.A high value of E k reflects a higher dispersion in the cluster, meaning that the prototype trajectory is less representative of the clustered trajectories.It is defined as:

Exploitation of mobility patterns
It is envisaged that the identification of prototype trajectories as explained in previous section can have applicability for different 5G-SON functions.
For example, prototype trajectories can be used in the context of self-planning to decide appropriate cell locations and antenna settings.For example, if there is a well identified representative trajectory, a sector of a cell site can be pointed in the direction of this trajectory.Typically, this can be the case of a cell site providing coverage over a main street.Despite one could argue that a radio engineer could easily identify such a situation and take such a common sense decision, the interest of the proposed use case remains in the fact that SON involves automatization.That is, self-planning and self-configuration means the capability for the system to automatically identify the trajectories and propose the adequate values for the parameters of a new cell.
Similarly, the learnt mobility patterns can also have applicability in the selfoptimization of several functions such as handover, load balancing or admission control.For example, by identifying the trajectory of a UE or group of UEs in relation to a known prototype trajectory it is possible to anticipate the cell that the UEs are heading to and configure these functions so as to avoid call droppings and overload situations.In the following, we focus on proposing a methodology to predict the future positions of a certain UE based on analyzing the actual locations reported by the UE in relation to the learnt prototype trajectories.

Mobility prediction
The proposed approach is illustrated in Fig. 2 and is executed on an individual UE basis.The criterion to decide which specific UEs are analyzed is out of the scope of this paper and it will depend on the specific self-optimization function under consideration.For example, the optimization of load balancing may predict the trajectory of UEs that demand a high bit rate in order to anticipate the arrival of these UEs to a cell and take the appropriate actions to ensure there are sufficient resources for these UEs in the cell.Similarly, it is also possible to predict the trajectory of high priority UEs to ensure that they will not experience problems in handovers, etc.
The process of Fig. 2 starts from the measurement reports provided by the UE whose trajectory is being predicted.First, pre-processing stage is carried out to extract the geolocation information and build the trajectory u that is currently being observed for this UE.The trajectory u is a vector of dimension C=2M composed by the concatenation of M pairs of coordinates followed by the UE at consecutive time instants u=[x(t 1 ),y(t 1 ),…, x(t M ),y(t M )].Without loss of generality, let us consider that the dimension of u is lower than the number of elements of the prototype trajectories s k (i.e.CB).This reflects that, in case that the UE was following a prototype trajectory, the actual location of the UE is somewhere within the prototype trajectory.
The mobility prediction process of Fig. 2 intends to determine the likelihood that the UE is following one of the learnt prototype trajectories.This is done by assessing the similarity between the trajectory u followed by the UE and the prototype trajectories s k according to the Euclidean distance.Given that CB, all the possible portions of C consecutive elements of the vectors s k (k=1,…,K) need to be considered when assessing this similarity.The α-th portion of s k is then defined as the vector [s k (1+α),…, s k (C+α)] with α=0,…,B-C, where s k (i) denotes the i-th component of s k .
Then, the squared Euclidean distance between the α-th portion of s k and trajectory u is computed as: Then, the similarity between u and s k is computed as the minimum Euclidean distance between u and the possible portions of the prototype trajectory s k , that is: A low value of m k indicates that the trajectory u is very similar to some portion of vector s k .Then, the likelihood L k that the UE is following the prototype trajectory s k is defined here as: A high value of L k reflects that the UE is following a trajectory very similar to a portion of s k .Therefore, s k provides information about the positions that the UE may likely follow in the future.

Results
This section provides some results to illustrate the performance of the proposed approach.The considered scenario is shown in Fig. 3 and represents an urban area in the intersection between two main streets.The mobility of multiple UEs has been considered including a wide variety of situations as shown Fig 3a .For example, some UEs move straight along a street, others move straight and turn right, left or move back.
For each kind of trajectory, 100 realizations have been generated by considering UE trajectories that are not perfectly straight but they have lateral movements simulating e.g.cars changing the lane in the road.It is assumed that the distance between two consecutive positions of the trajectory is a random value (simulating that the user speed may be variable).Moreover, 100 realizations of users that move a short distance and stop at a particular position (represented by black arrows in Fig. 3a) have been also generated.Finally, a group of 100 static users (represented by black dots in Fig 3a) have also been placed randomly in each of the four corners of the scenario.

Cl
The    reless netrajectories present applicability in different areas, such as self-planning and self-optimization.In this respect, the paper has proposed a strategy for predicting the mobility of specific users based on the obtained prototype trajectories.Results reflect that both K-means and SOM techniques are able to properly identify the different trajectories existing in the considered scenario.

Fig. 2 .
Fig. 2. -Exploitation of learnt patterns for predicting the trajectory of a UE