Optimal Link Deployment for Minimizing Average Path Length in Chain Networks

. This study considers chain-topology networks, which has certain inherent limitations, and presents an optimization model that augments the network by the addition of a new link, with the objective of minimizing Average Path Length (APL). We built up a mathematical model for APL, and formulated our problem as Integer Programming. Then, we solved the problem experimentally by brute-force, trying all possible topologies, and found the optimal solutions that minimize APL for certain network sizes up to 1000 nodes. Later on, we derived analytical solution of the problem by applying Linear Regression method on the experimental results obtained. We showed that APL on a chain-topology network is decreased by the proposed optimization model, at a gradually increasing rate from 24.81% to asymptotic value of 41.4% as network grows. Additionally, we found that normalized length of the optimal solutions decreases logarithmically from 100 % to 58.6048 % as network size gets larger.


Introduction
Many industries, today, prefer chain topology on their networks comprised of nodes connected each other consecutively throughout both long and narrow deployment areas like railways [1], highways [2], underground mines [3,4], as well as in some special type of wireless sensor and mesh networks [3][4][5][6][7] and backbones of telecommunication systems [8].Similarly, it is also a well-known implementation to connect (wi-fi) routers in daisy-chain topology to provide internet access at each floor in towers or high buildings.
In this paper, we examine AP L for chain networks, and propose an optimization model based on an additional link deployment to the network, with the objective of minimizing AP L. We derive analytical formulation for AP L prior to and subsequent to optimization process, as well as obtain numerical results which precisely agreed with analytical analysis.
Suppose we have a chain-topology network with n nodes, containing bidirectional links between consecutive nodes.This network can be represented as a path graph, P n , with undirected edges1 as depicted in Figure 1.
Fig. 1: A path graph representing a chain-topology network with n nodes Suppose, we aim at augmenting this network further by adding a new performance enhancing link between a certain pair of nodes on the network2 as illustrated in Figure 2. We should notice at this point that the augmentation process (i.e.adding a new link) is intentionally confined by just one new link in order to keep the optimization cost minimum, and that the implementation cost of such a new link can be assumed fixed regardless of the distance between connected nodes, which is true especially for the leased lines obtained from ISPs.We are now ready to ask our optimization problem: Main Problem: Which nodes should be connected to reach the objective of minimizing average path length (APL) on the network?
Not only does the proposed optimization model minimize AP L, but also it improves robustness on chained networks by means of generating alternative routes, as well as reduces cost of packet transmissions.

Chain Networks
Given the side effects of unbalanced energy consumptions at nodes in chain networks used in underground mines or on trains, the studies of [1,3,4,6] proposed different protocols or node deployment strategies, aiming to provide balanced energy consumptions at nodes in order to increase network lifetime.
Agbinya [2] discussed a specific application of chain networks on highways, and addressed certain characteristics of the network such as interference level, coverage area and path loss; on the other hand, Zhou et al. [5], in a recent study, considered Chain-typed Wireless Sensor Networks (CWSN) deployed in coal mines, and proposed a source-aware redundant packet forwarding scheme for emergency information delivery in CWSN.
Leu and Huang [7] proposed a mathematical model that calculates the maximum throughput of a Wireless Mesh Network in chain-topology, dealing with signal interference, hidden nodes and STDMA time slots among nodes.
Flammini et al. [8] considered the construction of wireless ATM layouts for a chain of base stations, and showed that the problem studied was NP-complete for special instances, and provided optimal solutions for certain cases.

Average Path Length (APL)
Several researchers derived analytical formulation of AP L for different type of networks.For instance, Kleinrock and Silvester [21] considered random graphs; Fronczak et al. [18] and Guo et al. [14] studied a large class of uncorrelated random networks with hidden variables; Zhang et al. [17] examined Apollonian networks; Peng [16] dealt with Sierpinski pentagon; Gulyás et al. [13] focused on the networks with given size and density; Chen et al. [11] investigated Barabási-Albert scale free model; Zhi-guang et al. [10] discussed belt-type networks; and Gao et al. [22] analysed Sierpinski gasket in a recent article.
In the field of logic design, Butler et al. [20] studied AP L of binary decision diagrams by deriving the AP L for various functions, and showed that the AP L for benchmark functions is typically much smaller than for random functions.
Mao and Zhang [19] considered the computation problem of AP L for large scale-free networks, and presented a dynamic programming model to solve the load-balancing problem for coarse-grained parallelization.Yen et al. [12] presented an efficient method for updating the closeness centrality of each vertex and the AP L of a network, where edges change dynamically as in the case of social networks.In a recent study, Reppas et al. [15] introduced rewiring rules to tune AP L on a network while keeping the degree and clustering coefficient distribution unchanged.
To the best of our knowledge, ours is the first study to propose an optimization model aiming to minimize AP L for chain networks by optimal deployment of an incremental link.

Pure Path
Average path length, AP L, of a network is an important parameter showing the efficiency of information transmission on the network, and can be calculated by finding the shortest path between all pairs of nodes, adding their lengths3 up, and then dividing by the total number of pairs.
To find the mathematical expression for AP L of a chain network, let P n be a path graph including n vertices indexed in sequence from 1 to n, like v 1 , v 2 , ..., v n , as depicted in Figure 1.It is obvious that the shortest path between a certain pair of nodes on a path graph is the subpath, having no alternative, between this pair of nodes.Moreover, the length of such a subpath is equal to the number of edges on itself.Thus, since the vertices are indexed in order, length of a subpath (P L) between vertices of v j and v k on P n can be stated rigorously as follows.
Then, the Equation 1gives the sum of path lengths for all (unordered) pairs4 After rewriting the Equation 1, and dividing by the number of all pairs, which is n(n−1)

2
, we find the AP L for the path graph of P n as given in Equation 2.
According to Equation 2, the AP L for a chain-topology network is linearly proportional with the length of the chain or the number of nodes, i.e.O(n), and almost equal to one third of network diameter.

Path with an Additional Edge
Let P n be a graph obtained by adding a new edge (v x , v y ) to the path graph P n as depicted in Figure 2. Rigorously, To built a general mathematical expression for AP L on P n , we first studied on small networks (e.g.around 10 nodes), manually calculated AP L, and produced a sketchy formula for AP L.Then, we extended our work with larger networks, as repeatedly checking accuracy of the formula, and revised it when needed until the formula persistently gave correct values for all networks investigated.This process yielded Equation 3. Yet we also verified its correctness via experiments as described in the following sections.
where h = y − x, t = n − h + 1, and Thus, we obtained analytical expressions for AP L prior to and subsequent to additional link attachment into a path graph, as in Equations 2 and 3, which allows us to formulate our problem in the form of Integer Linear Programming (IP) as follows: minimize AP L P n subject to n, x, y are integer x < y where AP L P n is given in Equation 3.
It is known that IP is NP-hard [23], which implies that there is no known polynomial-time solution for IP problems.Yet, in the following sections, we will solve certain instances of the problem above by experimentally in the first place, and then, construct a general analytical solution for any value of network size (i.e.n) by means of linear regression method.

Numerical Solutions by Experiment
To find optimal solutions for certain cases of the problem introduced, we prepared an experimental set-up shown as pseudo-code in Figure 3. return AP L In the experiment, we incremented network size from 3 nodes to 1000 nodes, and varied attachment points (i.e.vertices) of the additional link for all possible cases as a brute-force approach.At each step of network size, we first defined network topology by entering adjacency list for the network, for all possible deployment of the additional link as varying variables of x and y, which represent the relative location of vertices v x and v y .Then, for each topology, we found the shortest paths for all pairs by implementing Dijkstra's well-known shortest path algorithm [24,23].Notice that Dijkstra determines the shortest path between only one pair of nodes, and for this reason, we iteratively employed it for all pairs in the graph (i.e.topology).After calculating lengths (i.e.hop counts) of the shortest paths for all pairs, we took average of them, and thus found APL.Finally, we identified the minimum APL among all calculated APLs yielded as varying locations of v x and v y .This experimental process was repeated for certain network sizes of 3, 5, 10, 20, 50, 100, 200, 500 and 1000 nodes.Table 1 contains some of the numerical results acquired in the experiments, including optimal solutions that minimize APL as well as results belong to ring topologies (i.e. the cases in which the first and the last nodes of paths are connected each other by the additional link).The first column in the table includes network size in terms of the number of nodes, while the second and the third columns contain AP L for pure path (P n ) and ring topology (i.e.P n ∪ (v 1 , v n )) respectively.Notice that ring topology occurs when the first and the last nodes on a path are connected each other.The fourth column involves minimum AP L which appears when the additional link is placed optimally (i.e.P n ∪ (v x opt , v y opt )).The last column shows optimal values of (v x , v y ) that minimize AP L. Verification of the mathematical model: One might doubt the accuracy of our mathematical model presented in Section 4, i.e.Equation 3. To verify correctness of this mathematical expression, we first computed AP L values by using Equations 3 as assigning all possible values to the variables up to network size of 1000 nodes, and then searched out the instances giving minimum AP L for each network size.Afterwards, we compared minimum AP L values computed in Equation 3 with the AP L values yielded from the experimental calculations for certain network sizes as listed in Table 1.We eventually observed that both the mathematical model and the experimental calculations give precisely the same outcomes for AP L, which shows the consistency between these two different approaches.

Analytical Solution by Linear Regression
Table 1 contains numerical results of optimal solutions for certain network sizes.However, to make a comprehensive analysis including asymptotic behaviour of optimal solutions and other variables, we need to establish analytical relations between these variables.For this purpose, we applied a linear regression method on the numerical results at hand, based on least square technique, and consequently, found the following relations.
x opt = Round(0.207174* n − 0.0251311) ( 5) where Round(z) is a function which returns the nearest integer to z.
In fact, Equations 5 & 6 give precise answers to the main problem asked at the beginning of this paper.Equation 4, on the other hand, yields exact outcome for AP L when an optimal solution is applied.

Average Path Length (APL)
Figure 5 depicts AP L for both P n and P n when optimal values of (v x , v y ) is applied, as network size varies from 3 nodes to 1000 nodes.As seen in the figure, AP L linearly increases for both cases as network size grows.However, notice that P n has higher slope than P n , which means that adding extra edge reduces AP L on a network.Notice that there is also model fit (i.e.regression line) which is obtained by linear regression.Goodness of fit can even be visually evaluated in Figure 5, as the fitted line and numerical data exactly matches each other.

Improvement
Figure 6 exhibits the Improvement, i.e. the rate of decrement, on AP L when an additional edge is placed to the network at optimal positions.As can be followed in the figure, the improvement rate begins with a slow growth at around 24.81 % when n = 3, followed by a period of moderate growth, and then back to a period of slow growth asymptotically approaching to 41.4 %, which is consistent with the analytical analysis below.

Optimal Solutions
Equations 5 & 6 are optimal solutions in analytic form, while values on the fifth column in Table 1 are numeric solutions for certain network sizes.Thanks to Equations 5 & 6, one can readily determine optimal values of (v x , v y ) for any network size.It is interesting to observe that the optimal solutions, when n = 3 and n = 5, are two end points of the path (i.e.(v x , v y ) equals to (v 1 , v 3 ) and (v 1 , v 5 ) respectively) .As network size grows, the optimal values of v x and v y slide gradually towards the center of the network.This observation motivated us to investigate normalized length between two end points (i.e.v x and v y ) of optimal solutions in the next part.
Another observation here is that there is only one (i.e.unique) optimal solution when the network size (n) is odd, whereas there may emerge many optimal solutions when n is even, as can be observed in the fifth column of Table 1.We discovered that alternative optimal solutions for the same network yield isomorphic graphs when they are applied.

Normalized Length of Optimal Solutions (NLOS)
N LOS represents the normalized distance between two end points (i.e.v x opt and v y opt ) of optimal link deployments that minimize AP L. The normalization process is performed with respect to network size.Figure 6 includes N LOS as network size logarithmically grows.It can be deduced from the figure that the N LOS is reduced logarithmically beginning from 100 % to around 58.6 %, which is consistent with the analytical analysis below.This means that the optimal solutions occur at around two end points of chain-topology when network size is small, whereas attachment points of optimal solutions move away from this end points as network size grows.

Conclusion
Chain-topology networks performs poorly in certain performance metrics such as throughput, robustness, energy efficiency in data transmissions [1][2][3][4][5][6][7].This is mostly due to fact that average path length (AP L) in chain-topology is extremely high, which is almost one third of network size as we showed.
In this study, we aimed at compensating this deficiency by presenting an optimization model in which incremental link deployment was considered, with the objective of minimizing AP L on a chain network.For this purpose, we first discovered mathematical expression of the objective, as well as formulated it in the form of Integer Programming (IP).Then, we prepared an experimental setup in order to determine AP Ls of all possible topologies generated by placing an additional link to varying locations on a chain-topology network.Thus, we found optimal solutions that minimize AP L for specific network sizes up to 1000 nodes, and also verified accuracy of our mathematical model.Through the experiments, for each specific network size, we implemented Dijkstra's shortest path algorithm for all pairs, and took average of their lengths in terms of hop count to calculate corresponding AP L values.Afterwards, we derived analytical solution by implementing Linear Regression method on the data obtained experimentally, which allowed us to see asymptotic behaviour of the solutions.
Our analyses showed that the optimization model proposed was able to reduce AP L on chain-topology networks at a rate of between 24.81% and 41.4%, with gradually increasing ratio as network size grows.Moreover, we found that normalized length of the additional link for optimal solution asymptotically approached to 58.6% of network size.
Besides contribution of such an additional link optimally implanted for minimizing the APL, further research is required to improve other performance characteristics of chain-topology networks, such as ensuring load balancing.

Fig. 2 :
Fig. 2: Adding a new enhancement link (i.e.edge) connecting v x and v y

Fig. 3 :
Fig. 3: Experimental Setup for Determining AP L1: for n ← 3 to 1000 do Varying network size 2: Enter Adjacency List of the Graph Defining network topology for all cases 3: function AP L(Graph) 4: Find the minimum AP L among all calculated values at each step of n 5: function APL(G) Determines AP L for G 6: c ← 0 Counter 7: AP L ← 0 8: for all Pairs (vi, vj) in G do 9: function Dijkstra(vi, vj) Runs Dijkstra 10: return Shortest Path (SP) between vi and vj 11: AP L ← (AP L * c + Length(SP )/(c + 1) 12: c ← c + 1 13: return AP L

Figure 4 Fig. 4 :
Figure4shows experimental results in the form of 3-dimensional color mapping when network size equals to 100 nodes.As can be seen in the figure, AP L has the minimum value (i.e.dark blue color) at around x = 21 and y = 80, or equivalently, vice versa.Notice that the red area from left bottom corner to right upper corner represent Path topology, whereas the points both at the left up corner and at the right down corner produce Ring topology.

Fig. 5 :
Fig. 5: Average Path Length (APL) for different topologies as network size grows

Table 1 :
Experimental Results Obtained by Brute Force Computation