Simplifying Calculation of Graph Similarity Through Matrices

. A method to simplify the calculation in the process of measuring graph similarity is proposed, where lots of redundant operations are avoided in order to quickly obtain the initial tickets matrix. In this proposal, the element value of the initial tickets matrix is assigned to 1 when it is positive in corresponding position of the paths matrix at the first time. The proposed method calculates the initial tickets matrix value based on the positive value in the paths matrix in a forward and backward way. An example is provided to illustrate that the method is feasible and effective.


Introduction
The complex objects with structural properties can be naturally established as graphs in chemistry, bioinformatics, data mining, social network, image processing and other fields [1][2][3][4][5][6].For example, the science citation can be represented as a graph with the literatures as vertices and the indexes as edges.As the large amount of graph data is increasing, how to characterize the difference of the graph data becomes an important problem [7].The graph similarity method that efficiently solves the problem can be applied to classify the graphical data.
In order to measure the graph similarity, we usually calculate the number of the common paths of the graph, such as random walk graph kernel (abbreviated as RWGK) [8] [9], the shortest path graph kernel (abbreviated as SPGK) [10], and the function of the common paths or tickets [4], etc.The function of the common paths or tickets, in the spirit of the neighborhood counting and all common subsequences in measuring the sequence similarity [11][12][13], measures graph similarity by calculating all common paths or tickets matrices between two graphs [14][15][16].Compared with RWGK and SPGK, this method considers more information of vertices and edges in graph.This method is remarkable in accuracy and running time, in addition, it is not restrictive and can be applied in larger graph dataset.Also, it avoids the tottering phenomenon [17].In the process of calculating the tickets matrices, the initial tickets matrix is raised from all paths matrix, which is calculated by Floyd-Warshall Algorithm [18][19][20].When the element value in the paths matrix is positive, by using the method in [4], the value in the initial tickets matrix is always 1 in the corresponding position, in spite of the multiple calculations of matrices multiplying and addition.However, this method still continues calculating the element value of the paths matrix, and it generates lots of redundant calculations.
For this problem, in this paper, we propose a simplified method to calculate the initial tickets matrix.When the element value in the paths matrix of graph is positive at the first time, this method initializes the value as 1 in the corresponding position in the initial tickets matrix, instead of continuing calculating in [4], and set the value of the initial tickets matrix based on the positive value of the paths matrix in a forward and backward way.This method will avoid lots of redundant calculations, and can be applied in a large-scale complex graph data.An example is provided to illustrate its feasibility and efficiency.
This paper is organized as follows: Section 2 describes the related concepts and notations.Section 3 proposes a simplified calculation method to obtain the initial tickets matrix.Section 4 presents the analyses process of the proposed method.The paper is concluded with a summary and an outlook for future work in Section 5.

Preliminaries
A graph is usually represented as G = (V, E), where the vertices set is then there exists a path between v i and v j .If the length of two paths is equal, and the vertices and edges are the same, then we call the two paths as the common paths.The more these common paths in two graphs, the more similar the two graphs are [4].Two graphs are perfectly similar when they share all paths, while two graphs are perfectly dissimilar when they share no paths.A graph can be denoted by its adjacency matrix A as the following equation: a(i, j) = 1 denotes there exists a distinct 1-long path from v i to v j , Σ ij a(i, j) denotes the number of all distinct 1-long paths in graph.
For three adjacency matrices A, B, C with n rows and n columns, if C = A × B， then c(i, j) is the element of C in i-th row and j-th column, and it can be calculated as Obviously, it can be seen that a 2 (i, j) denotes a 2-long path from i to j passing the vertex k, which is calculated by the matrices multiplying of A 1 and A 1 , a 2 (i, j) = Σ k a 1 (i, k)a 1 (k, j).In other words, a 2-long path from i to j consists of a 1-long path from i to some k and a 1-long path pass vertex from that k to j.A 2-long path exists if and only if both a(i, k) and a(k, j) are positive, i.e. a(i, k) > 0, and a(k, j) > 0. The element in the matrix A 2 (A 2 = A 1 × A 1 ) is the number of the 2-long paths in graph, so all possible 2-long paths from i to j is Σ ij a 2 (i, j).In general, a k (i, j) denotes the number of distinct k-long paths from v i to v j , and Σ ij a k (i, j) denotes the total number of k-long paths in graph, where 1≤ k ≤ n.For example, A n = A 1 × A n-1 , the element in A n is the number of the possible n-long paths in graph.If C = A + B, we know c(i, j) = a(i, j) + b(i, j), then the number of the distinct paths from i to j can be calculated as follows: P n (i, j)≥ 0 denotes the number of the distinct paths from i to j that are at most n-long.Σ ij p n (i, j) denotes the total number of paths in graph.
The measurement of the paths is usually restrictive, so we need to measure the tickets instead of measuring the paths.A ticket is a contracted path by deleting any number of nodes from a path, which is obtained from graph by deleting and contracting the isolated edges [4] [21].If the length of a path between v i and v j is more than 1, then there exists a 1-long ticket v i v j.The all paths matrix is a 1-long tickets matrix that can be calculated as follows: By Eq. ( 3), there is T 1 = P n .We call the matrix T 1 as the initial tickets matrix.The method of calculating all common tickets is similar to the method of calculating all the common paths.The more these common tickets, the more similar the graphs are [4].
In the paper, the graph is directed and acyclic, and the length of the path we consider is finite.If a graph is a cycle graph, the length of the path with the cycle is infinite, so it cannot be used to measure all common paths or tickets.

A Simplified Calculation Method to Calculate Tickets Matrix
This section proposes a simplified calculation method of measuring graph similarity in order to quickly obtain the initial tickets matrix.In the process of calculating the initial tickets matrix, when the element value of the paths matrix is positive at the first time, this method sets the value as 1 in the corresponding position of the initial tickets matrix, and does not calculate the value again.It avoids the redundant calculations of matrices multiplying and addition.
In order to obtain the initial tickets matrix, based on Eq. ( 2) and Floyd-Warshall Algorithm in [19], we need to calculate all paths matrix P n .When P n is known, we can obtain the initial tickets matrix T 1 by using Eq. ( 3).Obviously, we note that the element value in the initial tickets matrix is always 1 when the value is positive in the corresponding position of the paths matrix.If the value in the paths matrix is positive at the first time, in spite of lots of calculations of matrices multiplying and addition in this position, the value is always positive.So we can conclude that if the value in the paths matrix is positive at the first time, the value in the initial tickets matrix must be equal to 1.It does not need the calculation in the corresponding position of the paths matrix again.The aim of the method in the paper is to reduce the redundant calculations when the element value in the paths matrix is positive.
An adjacency matrix of the graph G is shown in the Fig. 1, where a(i, j) = 1 denotes that there is a distinct 1-long path from v i to v j .Σ ij a(i, j) denotes the number of all distinct 1-long paths in graph G.
The directed and acyclic graph G and its adjacency matrix A 1 .

Simplified Calculation When 1-long Paths is Fewer
When the number of 1-long paths is fewer than n 2 in the adjacency matrix of graph, where n 2 denotes the element number of the adjacency matrix, based on the adjacency matrix A 1 , we calculate the initial tickets matrix T 1 as follows: For a 1 (i, j) = 1, it represents that there exists a 1-long path between v i and v j .In other words, there exists a ticket t 1 (i, j) = 1, and we don't calculate t 1 (i, j) again.It reduces the redundant calculations in i-th row and j-th column of T 1 .When the element value in T 1 is 0, we can calculate the value in i backward and j forward step.

1
( , ) 1 ( , ) 0 ( , ) 0 if a k i and a k j t k j otherwise If a 1 (k, i) = 1, where 1 ≤ k ≤ n, k ≠ i and k ≠ j, it shows that v k v i is a path in the graph, i is the backward vertex pointed by k.It implies that there exists a 1-long ticket between v k and v j , then t 1 (k, j) = 1.
If a 1 (j, k) = 1, where 1 ≤ k ≤ n, k ≠ i and k ≠ j, it shows that v j v k is a path in the graph, j is the forward vertex pointing to k.It implies that there exists a 1-long ticket between v i and v k , then t 1 (i, k) = 1.By using Eq. ( 4) and Eq. ( 5), we can calculate element values of the initial tickets matrix T 1 in accordance with all paths matrix P n .Because the number of {1≤ i ≤ n， 1≤ j ≤ n | a 1 (i, j) = 1} is small, the calculation of t 1 (i, j) is not complicated, which means this method reduces redundant calculations of matrices multiplying and addition in comparison with Floyd-Warshall Algorithm.

Simplified Calculation When 1-long Paths is More
When the number of 1-long paths is closer to n 2 in the adjacency matrix of graph, where n 2 denotes the element number of the adjacency matrix, based on the adjacency matrix A 1 , we also calculate the initial tickets matrix T 1 as follows: t 1 (i, j) = a 1 (i, j), where 1≤ i ≤ n，1≤ j ≤ n.When the element value in T 1 is 1, we don't calculate the value again.It reduces the redundant calculations in the position of the value.For a 1 (i, j) = 0, it indicates that there doesn't exist a 1-long path between v i and v j , we can calculate the t 1 (i, j) value of the initial tickets matrix T 1 in i forward and j backward step.
is the forward vertex pointing to k.The problem that whether there exists a path between v i and v j is changed to the problem that whether there exists a n-long path between v k and v j , where n ≥ 1, then t 1 (i, j) recursively is equivalent to a 1 (k, j).
If a 1 (k, j) = 1, where 1 ≤ k ≤ n, k ≠ i and k ≠ j, it shows that v k v j is a path in the graph, j is the backward vertex pointed by k.The problem that whether there exists a path between v i and v j is changed to the problem that whether there exists a n-long path between v i and v k , where n ≥ 1, then t 1 (i, j) recursively is equivalent to a 1 (i, k).
By using Eq. ( 6) and Eq. ( 7), we can calculate element values of the initial tickets matrix T 1 in accordance with all paths matrix P n .Because the number of {1≤ i ≤ n， 1≤ j ≤ n | a 1 (i, j) = 1} in the adjacency matrix is large, then the number of {1≤ i ≤ n， 1≤ j ≤ n | a 1 (i, j) = 0} is small, so the calculation of t 1 (i, j) also is not complicated.In other words, this method also reduces redundant calculations of matrices multiplying and addition comparing to Floyd-Warshall Algorithm.
In the spirit of measuring the sequence similarity by all common subsequences, a graph similarity measure method is proposed by calculating all common tickets.By means of the measuring sequence accuracy, the accuracy is more precise by calculating all common tickets than calculating all common paths.In order to calculate all common tickets in the process of measuring the graph similarity, how to quickly obtain the initial tickets matrix is a key problem.The method mentioned above, which avoids lots of redundant calculations of matrices multiplying and addition in the process of calculating the initial tickets matrix, can effectively solve the problem.

Method Analyses
This section gives two algorithms to simplify calculation of the initial tickets matrix, and illustrates its feasibility by providing an example.

The Algorithm Analyses
Calculating all paths matrix P n directly obtains the complexity is O(n 4 ), but based on the Floyd-Warshall algorithm, an algorithm has been designed which has a complexity of O(n 3 ) [19].Floyd-Warshall Algorithm: Let A 1 = a(i, j) be an n×n adjacency matrix of graph.1. P ←A 1 2. for k = 1 to n do 3.
for i = 1 to n do 4.
for j = 1 to n do 5.
end for 8. end for 9. return P If A 1 is the adjacency matrix of a graph G, then P gives the number of all paths in graph G.If A 1 is the common adjacency matrix of two or more graphs, then P gives the number of all common paths.The return matrix P is used to calculate the initial tickets matrix.
As presented in Eq. ( 4) and Eq. ( 5), when the number of the 1-long paths is fewer than n 2 , we present the idea of simplifying calculation of the initial tickets matrix as follows: Algorithm 1: The simplified calculation of the initial tickets matrix T 1 Let A = a(i, j) be an n×n adjacency matrix of graph.1. T 1 ← A 1 2. for i = 1 to n do 3.
for j = 1 to n do 4.
t 1 (i, j) = q ← Q // Take out an element from queue Q 12.
for (k = 1, k ≠ i and k ≠ j) to n do 13.
if t 1 (j, k) = 1 and t 1 (i, k) ∉ q // There is a path from j to k in the matrix A and t if t 1 (k, i) = 1 and t 1 (k, j) ∉ q // There is a path from k to i in the matrix A and t 1 (k, j) is not in queue Q 17.
end for 20.
As shown in Eq. ( 6) and Eq. ( 7), when the number of the 1-long paths is closer to n 2 , we present the idea of simplifying calculation of the initial tickets matrix as follows: Algorithm 2: The simplified calculation of the initial tickets matrix T 1  Let A 1 = a 1 (i, j) be an n×n adjacency matrix of graph.1. T 1 ← A 1 2. for i = 1 to n do 3.
for j = 1 to n do 4.
t 1 (i, j) = q ← Q // Take out an element from queue Q 13. m = j; // Assign the subscript j to m 14. h = i; // Assign the subscript i to h 15.
for (k = 1, k ≠ i and k ≠ j) to n do 16.
if t 1 (k, j) = 1 // Find out k pointing to j in the matrix A 17. j = k; // j points to the forward vertex k 18.
if t 1 (i, j) = 1 //It shows that there is a n-long path from i to j, where n ≥ 1, so there is a ticket from i to m, in other words, q = 1 19.
if t 1 (i, k) = 1 // Find out k pointed by i in the matrix A 24. i = k; // i points to the backward vertex k 25.
if t 1 (i, j) = 1 //It shows that there is a n-long path from i to j, where n ≥ 1, so there is a ticket from h to j, in other words, q = 1 26.
end for 31.
The return matrix T 1 is the initial tickets matrix, in which the value of element is 1 or 0. The matrix T 1 is used to calculate the number of all common tickets in two or more graphs for measuring graphs similarity.The queue Q is used to store t 1 (i, j).It can be seen from line 10 in Algorithm 1 and line 11 in Algorithm 2, the smaller the length of the queue, the more quickly two algorithms implement.So, Algorithm 1 is suitable to the adjacency matrix that the number of the 1-long paths is fewer than n 2 , and Algorithm 2 is suitable to the adjacency matrix that the number of the 1-long paths is closer to n 2 .Compared with line 5 in Floyd-Warshall Algorithm, Algorithm 1 and Algorithm 2 can effectively solve the redundant calculations in the loop of line 12 and line 15, respectively.The two algorithms usually could be easily implemented and it can effectively handle the big graphs.

Example
For the directed and acyclic graph G as shown in Fig. 1, we calculate the initial tickets matrix of G by using Eq. ( 2) and Eq. ( 3).Based on we can obtain the 2-long paths matrix A 2 and the 3-long paths matrix A 3 as shown in Fig. 2.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 By using Eq. ( 2) and Eq. ( 3), we can obtain all paths matrix P n and the initial tickets matrix T 1 as shown in Fig. 3, and its computational complexity is O(n 4 ).We also calculate the initial tickets matrix T 1 by using Floyd-Warshall Algorithm.By using line 2 to line 8 in Floyd-Warshall Algorithm, we can calculate the number of all paths p(i, j), where 1≤ i ≤ n，1≤ j ≤ n.After calculating all paths matrix P, we obtain the initial tickets matrix T 1 by using Eq. ( 3), and its computational complexity is O(n 3 ).
For the adjacency matrix A 1 as shown in Fig. 1, the number of 1-long paths is 6, it is fewer than 25(n = 5, n 2 = 25), so we apply Algorithm 1 to calculate the initial tickets matrix T 1 as follows: Step 1: In accordance with line 2 to line 9 in Algorithm 1, we set the initial tickets matrix T 1 = A 1 as shown in Fig. 4(a) and the queue Q:{t 1 Step 2: For the q ∈Q, when q = t 1 (v 1 , v 2 ) = 1, there exists t 1 (v 2 , v 5 ) = 1 and t 1 (v 3 , v 1 ) = 1, it shows that v 1 , v 2 , v 5 and v 3 , v 1 , v 5 are 2-long paths.In other words, v 1 v 5 and v 3 v 5 are 1-long tickets.By using the line 12 to line 19 in Algorithm 1, we can calculate new tickets t 1 (v 1 , v 5 ) = 1 and t 1 (v 3 , v 5 ) = 1 as shown in Fig. 4(b).
Similarly, we can calculate the final T 1 as presented in Fig. 4(c), the values identified by the box are the new tickets.When t 1 (v i , v j ) = 1, where 1≤ i ≤ n，1≤ j ≤ n, Algorithm 1 doesn't need to repeatedly calculate t 1 (v i , v j ).It reduces redundant calculations of matrices multiplying and addition comparing to Floyd-Warshall Algorithm.For a directed and acyclic graph G ' and its adjacency matrix A '1 as shown in Fig. 5, the number of the 1-long paths is 18, it is closer to 25(n = 5, n 2 = 25), so we apply Algorithm 2 to calculate the initial tickets matrix T '1 as follows:  Fig. 5.The directed and acyclic graph G ' and its adjacency matrix A '1 .
Step 1: In accordance with line 2 to line 10 in Algorithm 2, we set the initial tickets matrix T '1 = A '1 as shown in Fig. 6(a) and the queue Q: {t '1 (v 1 , v 5 ) = 0, t '1 (v 3 , v 5 ) = 0}.In this paper, the path doesn't contain a cycle.In other words, we don't consider the cycle v i , … , v k , … , v i , where 1 ≤ i ≤ n, 1 ≤ k ≤ n.We set t '1 (v i , v i ) as 0 in the initial tickets matrix T '1 .So, in line 7 of Algorithm 2, we don't need to put t '1 (v i , v i ) in the queue Q.
Step 2: For the q ∈Q, when q = t '1 (v 1 , v 5 ) = 0, there exists t '1 (v 1 , v 2 ) =1 and t '1 (v 2 , v 5 ) = 1, it shows that there is a 2-long path from v 1 to v 5 .In other words, v 1 v 5 is a 1-long ticket.By using the line 15 to line 30 in Algorithm 2, we can calculate new ticket t '1 (v 1 , v 5 ) = 1 as shown in Fig. 6(b).Similarly, we can calculate the final T '1 as shown in Fig. 6(c), the values identified by the box are the new tickets.When t '1 (v i , v j ) = 1, where 1≤ i ≤ n，1≤ j ≤ n, Algorithm 2 doesn't need to repeatedly calculate t '1 (v i , v j ).It reduces redundant calculations of matrices multiplying and addition comparing to Floyd-Warshall Algorithm.), approximately O(n 2 ) in the best case.However, the computational complexity of using Eq. ( 2) and Eq. ( 3) is O(n 4), and Floyd-Warshall Algorithm is O(n 3 ).In Table 1, we list the complexity of calculating the initial tickets matrix algorithms.Obviously, Algorithm 1 and Algorithm 2 are more effective.

Conclusions
This paper proposes a simplified calculation method in the process of measuring graph similarity.This method sets the element value of the initial tickets matrix as 1 when the element value of the paths matrix is positive at the first time, and reduces lots of redundant calculations of matrices multiplying and addition comparing to Floyd-Warshall Algorithm.Depending on the number of 1-long paths in the adjacency matrix, this paper presents two algorithms to calculate the initial tickets matrix.The two algorithms calculate the element value of the initial tickets matrix based on the positive value of the paths matrix in a forward and backward way.By the algorithm analyses and the given example, these algorithms are proved to be feasible and effective.When we calculate the initial tickets matrix of the complex graph with large number of 1-long paths in the adjacency matrix, Algorithm 2 is more applicable, because the length of the stored queue is small.
In the near future, we will compare the method in this paper with graph kernel, random walk graph kernel, and the kernel function of all common paths or tickets in classification accuracy and running time through experiments.By means of easy calculation of this method, we will apply it to the undirected graph or the complex graph datasets.

Fig. 3 .
Fig. 3.All paths matrix P n and the initial tickets matrix T 1 .

Fig. 4 .
Fig. 4. (a) Assigned T 1 using the adjacency matrix A 1 ; (b) when q = t 1 (v 1 , v 2 ), we calculate T 1 using Algorithm 1; (c) we obtain the final T 1 based on Algorithm 1.The values identified by the box are the new tickets.
Let l denote the length of the queue Q, l is usually a constant in the best case.In Algorithm 1, the computational complexity is O(n 2) in the loop of line 2 and line 3 and O(lk) in the loop line 10 and line 12, where 1 ≤ k ≤ n, k ≠ i and k ≠ j.Similarly, in Algorithm 2, the computational complexity is O(n2) in the loop of line 2 and line 3 and O(lk) in the loop of line 11 and line 15, where 1 ≤ k ≤ n, k ≠ i and k ≠ j.So the computational complexity of Algorithm 1 and Algorithm 2 is O(n 2 + lk) and less than O(n 3

Table 1 .
The computational complexity of calculating the initial tickets matrix algorithms