RTGEN : A Relative Temporal Graph GENerator

Graph management systems are emerging as an efficient solution to store and query graph-oriented data. To assess the performance and compare such systems, practitioners often design benchmarks in which they use large scale graphs. However, such graphs either do not fit the scale requirements or are not publicly available. This has been the incentive of a number of graph generators which produce synthetic graphs whose characteristics mimic those of real-world graphs (degree distribution, community structure, diameter, etc.). Applications, however, require to deal with temporal graphs whose topology is in constant change. Although generating static graphs has been extensively studied in the literature, generating temporal graphs has received much less attention. In this work, we propose RTGEN a relative temporal graph generator that allows the generation of temporal graphs by controlling the evolution of the degree distribution. In particular, we propose to generate new graphs with a desired degree distribution out of existing ones while minimizing the efforts to transform our source graph to target. Our proposed relative graph generation method relies on optimal transport methods. We extend our method to also deal with the community structure of the generated graphs that is prevalent in a number of applications. Our generation model extends the concepts proposed in the Chung-Lu model with a temporal and community-aware support. We validate our generation procedure through experiments that prove the reliability of the generated graphs with the ground-truth parameters.


Introduction
Graphs are the most natural model to describe real world interactions and are currently used in a myriad of application domains such as citation [1], transportation [? ], and sensor networks [2] to cite just a few.These graphs are managed by a graph management system whose performance is usually evaluated through graph-centered benchmarks that address different performance metrics such as ingestion throughput, space usage and query execution time.In this context, practitioners refer to realworld and synthetic graphs to use in the benchmarks.Indeed, available graph generation techniques fill the gap between real and synthetically generated graphs by trying to mimic the characteristics of real graphs such as controlling the degree distribution [3,4,5,6,7,8].Besides, a number of existing graph generators are communityaware in the sense that they group vertices that are more densely connected between each other than they are with the rest of the graph, in separate or overlapping sub-graphs called communities [9,10,11].
Real graphs, however, are dynamic [12] such that their topology is subject to continuous changes.In this context, a new emphasis is being placed to support time as a first class citizen in graph management systems [13,14,15,16].Most of these systems rely on real-world temporal graphs to evaluate their proposed methods.Real-world graphs, however, do not often fit the scale requirements.Therefore, practitioners must rely on a temporal graph generator that is able to produce large scale graphs whose evolution correlates with that of real world temporal graphs.To tackle this challenge, we proposed RTGEN: a relative temporal graph generator that produces large scale temporal graphs by controlling a number of key features that characterises the evolution of real-world graphs.That is, our generation procedure, controls the evolution of the degree distribution by extending a very common generation technique [17] referred to as the Chung-Lu model with temporal and community-aware support.
We model a temporal graph by a sequence of snapshots  = {0, . . .,  } where  is the graph snapshot at timestamp  and characterized by a degree distribution that is generated from sampling user-defined temporal parameters.Having this, our relative graph generation procedure consists of transforming −1 into  by applying a stream of atomic graph operations with respect to the desired degree distribution at time instants −1 and .Based on the fact that a strong correlation exists between successive snapshots [18,19,20], we propose to minimize the number of graph operations that have to be applied in order to transform a graph snapshot into its successor.The main idea consists of minimizing the distance between degree distributions of successive snapshots.We achieve this goal by relying on an optimal transport solver which provides a transportation plan capable of transforming a "mass" from source distribution to target distribution with a minimum of work.In order to apply the obtained transportation plan, we proposed a straightforward generalization of the well-known Chung-Lu's model, also known as the CL model, that was first discussed in [21] and formalized in [17,22].We choose to extend this model for the reasons of simplicity and scalability.We also extended the CL model to partition the graph into ground-truth communities that coexist with the aforementioned time-dependent degree distribution.Our contributions are validated through experimental results showing the evolution of the degree distribution and community structure with respect to ground-truth input parameters.
The rest of the paper is organized as follows, Section 2 provides an overview of the generation procedure.Section 3 introduces the baseline generation procedure of the CL model.Section 4 describes the proposed communityaware extension of the CL model.Section 5 presents a detailed description of the proposed generation procedure.Section 6 provides an experimental evaluation of the synthetically generated temporal graphs.Section 7 describes the related work.Section 8 concludes the work.

Overview
In this section, we describe the overall generation procedure.Given the characteristics of a series of graph snapshots, our relative generation procedure produces the series of graph snapshots {1, . . ., } whose characteristics approximate the given ones.These graph snapshots are relatively computed by applying a number of graph updates on each snapshot in order to produce its successor snapshot.To clarify, we apply a number of graph updates on a graph snapshot −1 to produce another graph snapshot  whose characteristics approximate the given parameters assigned for the th graph snapshot.
Formally, we define a graph snapshot  valid at a time instant  as the tuple {  ,   ,   ,   } where   is the set of vertices,   is the set of edges,   is a degree distribution and   is the density community matrix.For instance, we consider   of the form {(   1 ,    1 ), . . ., (    ,     )} as a discrete distribution over N where     refers to the degree of a node and     refers to the total number of vertices in the graph whose total number of edges is equal to     .A density community matrix   defines the community structure of the generated graphs, each element  of which is equal to the density of edges between the source community  and the target community .
Given the number of vertices in each graph snapshot  ∈ {1, . . ., }, a stochastic community matrix  and a sequence of degree distributions {1, . . ., }, we generate a sequence of graph snapshots {1, . . ., } such that each snapshot  is relatively generated by transforming −1.This transformation is based on morphing the −1 = {( ), . . ., ( 1 ,   1 ), . . ., (   ,    )} and preserving the community structure that is represented by the stochastic community matrix  such that  −1 =   =  .Note that each element  of M is equal to the probability of edge creation between the source and target communities  and .Figure 1 illustrates the relative graph generation procedure.Each graph snapshot  is relatively generated by transforming its ancestor −1.This transformation is based on computing a transportation matrix  that minimizes the cost of morphing  −1 into .The computation of the transportation matrix reduces to an optimal transport problem.Based on the computed transportation matrix, each vertex belonging to the graph −1 is assigned with a linkage or breakage probability to indicate the probability of adding or removing an edge.This phase is followed by creating or removing edges to or from the graph −1 to produce the graph .These graph updates follows the linkage or breakage probabilities assigned for each of the vertices.Finally, the graph  is computed by applying the generated updates on −1.Note that, the generation procedure depicted in this Figure shows a simplified scenario where the number of vertices does not change.However, if that number changes, a phase consisting of the addition or deletion of vertices should precede the computation of the transportation matrix to assure the following constraint: This constraint implies that the sum of weights of distributions   and  should be equal.

Graph generation with given expected degree distribution
In this section, we describe the generation procedure of random static graphs with a given degree distribution.Random graphs were introduced by Erdős and Rényi [23].The popularity of this model, also known as the  model, stems from its simple generation procedure that consists of generating a number of vertices and connecting them by an edge after picking each endpoint with a fixed probability .However, this model produces graphs whose degree distribution follows a binomial distribution with a mean degree equals to ( −1) where  is the total number of vertices.Hence, it fails to mimic real-world graphs that usually follow a power-law degree distribution.To tackle this limitation, the edge configuration model [24] consists of generating a random graph whose degree distribution matches, approximately, a given degree distribution.That is, each vertex is assigned with a number of stubs equal to its desired degree that is drawn independently from the given degree distribution.Having this, pairs of stubs are linked randomly forming edges between their endpoints.Although this technique approximately matches any given degree distribution, a relaxed version known as the Chung-Lu model was introduced in [21].This model consists of generating a random graph that approximately matches a given degree distribution relying on a simple generation procedure that can be considered as a variant of the  model.For simplicity, we will refer to this model as the CL model in the following description.
Consider the degree distribution  as the input parameter to the CL model and the undirected, unweighted and unlabeled graph  = {, , } as the output where  denotes the degree distribution of ,  and  denote the set of vertices and edges, respectively.Having this, the CL model produces a graph  such that  is an approximation of .The main idea is to pick each endpoint of an edge with a certain probability such that, at the end of the generation procedure, the total number of incident edges to each vertex is close to its assigned degree.Hence, the starting phase consists of assigning each vertex  ∈  with a degree   and a linkage probability   ∝   .Considering that  is the sum of the degrees extracted from , we define the CL linkage probability   in the following Equation: Subsequently, a linkage phase consists of picking || =  2 pairs of vertices to connect such that for a sufficiently large || the random variable denoting the degree of vertex  is Poisson distributed with a mean equals to   .Iterating the linkage phase || times where an edge is equally likely to be chosen in both directions for undirected graphs, the insertion probability of an edge connecting vertex  and vertex  is    = 2     2 .The edge insertion probability can be rewritten in the more convenient form: For optimisation sake, we gather all vertices sharing the same degree together in a pool   = {| ∈  ∧   = } that we use as a subsidiary generation component.Each vertex in a pool is equally likely to be chosen assuring that the aforementioned linkage probability   is not affected for a sufficiently large number of vertices.After the degree assignment phase, vertices are distributed throughout the pools having each the following linkage probability: Now, instead of picking vertices a pool is first picked It should be highlighted that self-loops or multi-edges can be created since each endpoint of an edge is picked independently.The number of these edges, however, is independent of the number of vertices and thus can be neglected for large scale graphs.

Community-aware graph generation with given expected degree distribution
Although the CL model produces graphs with respect to a given degree distribution, it is not aware of the community structure existing in most real-world graphs.Hence, we propose a community-aware extension of the CL model based on the stochastic block model (SBM).Since a community is not quantitatively well defined, many definitions where provided in literature.Intuitively, one can consider a community as a subgraph which vertices are more densely connected between each other than they are with the rest of the graph.Let's consider the set of communities  = {} and suppose that a vertex should belong to one community and edges should be differentiated into within and between edges: • Given a community , an edge  is called a within edge if the source vertex ∈  and the target vertex ∈ .• Given two communities  and , an edge  is called a between edge if the source vertex ∈  and target vertex ∈  or vice versa.
To insure that vertices belonging to a community are more densely connected to each other than they are with the rest of the graph, the within and between edge creation probabilities     and     of  must satisfy the condition     >     , ∀ ∈ .

Stochastic block model
In this section, we formulate the SBM model [9] (also known as the planted partition model) which is commonly used for the generation of random graphs with a given community structure.Hence, this generation procedure only considers controlling the community structure of the graph and overlooks the resulting degree distribution.The input of the generation procedure is a stochastic community matrix  , each element  of which defines the probability of edge creation between the source community  and the target community .
The output is a graph  = {, , } where  is the obtained density community matrix, each element    of which defines the relative edge density between the source community  and the target community .The generation procedure starts with the distribution of vertices between the planted communities such that each vertex belongs to a single community.Now, the linkage probability between a vertex belonging to community  and another vertex belonging to community  is equal to .However, the extracted community density matrix  from the resulting graph  is an approximation of  .That is, each element    is binomially distributed with mean equals to  and Poisson distributed with the same mean for a sufficiently large number of edges.

Stochastic block model with given degree distribution
In this section, we propose a static graph generation procedure which controls both the community structure and degree distribution.Given a degree distribution  and a stochastic community matrix  , our proposed model generates a graph  which degree distribution  is an approximation of  and density community matrix  is an approximation of  .In the following, we provide a description of our generation mechanism that extends the stochastic block model depicted in Section 4.1.
Since the generated graph  is undirected, the matrix  is symmetric such that  = .Having this, we define  =  = 2 and  = .Furthermore, we assign each community  with a within edge creation probability     , a between edge creation equal to     and a probability of edge creation   such that: We define the linkage probability   of choosing a vertex  belonging to community  as follows: where   is the sum of the degrees of vertices belonging to community  and   is the probability of choosing .The linkage probability of a vertex is the product of the probability   of choosing the community to which the vertex belongs and the probability     of choosing the vertex  in that community.Hence, Equation 3 assures the approximation of the community matrix.However,   should be equal to 1) to assure the approximation of the degree distribution.Therefore, we define the following condition in order to reduce Equation (3) to Equation (1): in the original CL linkage probability (Equation 1) which assures the control of the degree distribution, we obtain Equation 3 which assures the control of the community structure.Having this, the duality of the linkage probability given in Equations ( 1) and (3) insures that both requirements are satisfied by our generation procedure.
For performance amelioration, we consider the selection of pools rather than vertices such that a pool is local to one community.That is vertices having the same degree variation and belonging to the same community  are grouped in a pool    = {| ∈  ∧  = } such that the probability of a pool selection for edge insertion is:

Hierarchical community structure
The specification of the stochastic matrix is not straightforward and imposes an exhaustive number of userdefined parameters.Hence, we define an auto-generative procedure that fills the matrix with no exogenous effort.Considering a static graph, we construct a stochastic matrix that reflects a hierarchical community structure with only two given parameters.In a hierarchical community matrix, communities recursively embed subsequent communities in a self-similar fashion such that the community structure is represented by a hierarchical tree where each node represents a community.Each non-leaf node is expanded into  other nodes until reaching a desired tree height ℎ (Figure 2).The ending recursion results in  =  ℎ leaf-nodes referencing the finest scale communities having a linkage probability  proportional to the distance between  and .The distance between two communities, (, ), is equal to the number of hops traversed in order to reach the least common ancestor of these communities.In order to satisfy the condition stating that within edge linkage probability must be higher than between linkage probability (    >     ), we define  as follows: where  is a tunable parameter which calibration steers the difference between within and between edge densities.The effect of varying  is further highlighted in the Section 6.

Relative graph generation
In order to control the evolution of the degree distribution of the generated temporal graphs, we propose in this section an extension of the CL model that is based on the optimal transport to compute the minimal distance between the degree distributions of each pair of successive graph snapshots.

Earth mover's distance
The Earth mover's distance can be defined as a measure of distance over a domain  between two distributions of the form {(1, 1), ..., (, )} where  ∈  and  is the density of .Having this, the problem reduces to the computation of an optimal flow (transportation matrix)  = [] between two distributions  = {(1, 1), ..., (, )} and  = {(1, 1), ..., (, )} such that  is the mass transported between  and  which minimizes the overall cost: where  = (, ) is a measure of distance between  and .The following constraints must hold for the optimal flow  : Once the optimal flow  is found, the EMD between  and  is computed as follows: The EMD is fundamental in our generation procedure since it is used to compute the distance between two degree distributions as described in the following Section.

Baseline relative graph generation
In this section, we provide the baseline procedure of transforming a graph  with degree distribution  into  ′ with degree distribution  ′ which we refer to as the Baseline relative graph generation.Note that, we use this technique for generating temporal graphs such that  and  ′ corresponds to successive graph snapshots.For generalisation purposes, however, we remove the notion of time in this section.This transformation is enabled by a set of atomic graph operations including the addition and deletion of a vertex or an edge.Following the assumption that temporal graphs gradually evolve, this number of graph operations between successive snapshots should be minimized which is assured in our model by applying an optimal transport method.
Consider the input graph  = {, , } and degree distribution  ′ , the generated output graph  ′ = { ′ ,  ′ ,   ′ } such that   ′ is an approximation of  ′ .We define the distance between two degree distributions  and  ′ as the earth mover's distance  (,  ′ ).
Consider  = | ′ | − | | as the total number of vertices to be added to or removed from the graph based on whether  is a positive or negative number, respectively.When adding a new vertex, this vertex is assigned with a degree equals to 0 and deleting a vertex consists of removing the vertex with its corresponding incident edges.This transformation phase assures that  and  ′ share the same number of vertices, hence, enables the transformation of  into  ′ .In order to morph  into  ′ , a transportation matrix  is computed, where each row corresponds to a degree  in the set of degrees in the source distribution  and each column corresponds to a degree  ′ in the set of degrees in the target distribution  ′ .Now, each cell consists of the portion of vertices having a degree  for which links are to be inserted or removed in order to be assigned a total number of edges equals to degree  ′ .That is, a vertex , with a degree   = , will be assigned a degree variation of   =  ′ −  resulting in a total number of edge insertions and deletions defined as  + and  − , respectively.We assign, for each vertex , a linkage probability  +   or a breakage probability  −   defined as extensions of the CL linkage probability (1): We collect vertices sharing the same degree variation  =  ′ −  into a linkage pool if  > 0 and in a breakage pool if  < 0. Consider  → ′ = {| ∈  ∧   =  ′ − } to be the pool containing vertices having a degree  that should be transformed into  ′ .We compute the probability of picking a linkage or breakage pool  +  → ′ and  −  → ′ as follows: However, breaking an edge might be impossible in situations where the source degree variation  is negative and the sum of the negative degree variations of its neighbors is higher than .For the sake of illustration, we present in Figure 3 a graph in which the number of edges to remove from a node is higher than the sum of the number of edges to remove from its neighboring vertices.That is, the transformation of this graph implies removing 2 edges from vertex 1 since 1 = −2.However, the number of the edges that have to be removed from the neighboring vertices of 1 is equal to  2 = −1 since  3 = 0 and  4 = 1.To overcome this, we repeat the morphing procedure until EMD(,  ′ ) reaches a desired threshold.Our simulations have proved that the value of EMD(,  ′ ) converges rapidly towards the minimum threshold after a tolerable number of iterations.This statement will be further highlighted in Section 6.

Relative community-aware graph generation
A more complex version of the previously described relative graph generation, consists of preserving the graph community structure in the transformation procedure.That is, the input of our community-aware relative graph generator is the graph  = {, , , }, the desired degree distribution  and the stochastic block matrix  .However, the output consists of a graph  ′ = { ′ ,  ′ ,   ′ ,   ′ } where   ′ is an approximation of  and   ′ is an approximation of  .Recall that the generation procedure depicted in section 4.2 produces a graph with a given expected degree distribution and stochastic community matrix based on the proposed linkage probability duality presented in Equations ( 1) and ( 3).Indeed, a relative community-aware graph generation is based on an extension of the aforementioned duality by taking into consideration the degree variation of a vertex instead of the its degree.That is, the following linkage and breakage probabilities present a straightforward extension of Equations ( 4) and ( 5): Where  +  and  −  are the total number of edge insertions and deletions in , respectively.From the transportation matrix defined in section 5.2, we find  as the portion of vertices with degree variation  =  − .However, finding the portion    of vertices in community  should satisfy three conditions detailed bellow.Each condition  results in a system of linear equations of the form  =  where  is a vector composed of    such that  = { where  is the total number of communities.
Condition 1: For each community  ∈ , conditions stating that  +  =  +   and  −  =  −   must hold, where  + and  − are the total number of edge insertions and deletions in all communities of , respectively.Incorporating    in the previous condition translates to the following equality: where  and  ′ are the source and target degree distributions.By solving the concatenated system of equations obtained from the previous conditions (1, 2, 3) = (1, 2, 3), we find the vector , hence the values of    .Pools are created on a local basis in each community such that vertices with the same degree variation  =  ′ −  and belonging to the same community  are collected in a single pool   → ′ .We compute the probability of picking a linkage or breakage pool  +  → ′ , and  −  → ′ , as follows: Now, an addition or deletion graph update between the chosen vertices is added to the list of logs using functions addEdge and removeEdge whether the vertices where chosen from the linkage or breakage pools.However, breaking an edge might be impossible in some situations as shown in Figure 3.In such a use case, no graph update is added to the list of logs .Finally, the EMD distance  is computed between the obtained degree distribution  ′  and the desired one .If  ′ is higher than  and the number of repetitions _ has not yet reached _, the same algorithm is repeated on the newly computed graph snapshot  ′ .The computation stops when  ′ is lower than or equal to  or the number of repetitions has already been reached.

Accuracy of the generation procedure
In order to measure how far the characteristics of the generated graphs are from the ground truth parameters, we define two distance metrics   and .
The first metric   measures the inaccuracy of approximating the degree distributions of the generated graphs with the given sequence of degree distributions.That is, it measures the root mean square of the EMD distances between each degree distribution  in the given sequence {1, . . ., } and its corresponding degree distribution   in the sequence { 1 , . . .  } extracted from the generated graphs.Having this,   is computed as follows: Whereas, the second metric  measures the inaccuracy of approximating the community density matrix of the generated graphs with a given stochastic matrix.That is, it measure the root mean square of the difference between the Frobenius norms of the given stochastic matrix  and the stochastic matrix   extracted from every generated graph snapshot.Having this,  is computed as follows: where  ( ) is the Frobenius norm of the stochastic community matrix  .We recall that the Frobenius norm of a matrix  of dimensions (,) is defined as follows:

Experimental evaluation
We conducted a number of experiments to validate the efficiency of our generator RTGEN.We also provide an insight on how changing the input parameters can steer the characteristics of the generated temporal graphs.Note that the source code of RTGEN is publicly available 1 .
Besides the source code, we also provide the instructions describing how to use the tool to generate temporal graphs.For instance, users can pass the input parameters to describe the desired sequence of degree distributions or stochastic community matrix and the format of the generated output files to RTGEN using a terminal command.RTGEN proposes two output types: snapshotbased and event-based.The snapshot based type consists a sequence of graph snapshots represented each in a separate file.Whereas, the event-based type, consists of generating the sequence of graph updates (events) that we applied between successive snapshots to transform one snapshot into the next one.

Experimental setup
The experiments were conducted on a single machine equipped with Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz 1.90 GHz, 16 GB memory and 500 GB SSD.We used Go 1.17.5 and Python 3.8.0.Besides, we referred to the optimal transport solver proposed in [25].The graphs shown in this section are visualized using Gephi tool [26] which offers network visualization facilities and community detection algorithms [27].

Preliminaries
In the following experiments, we refer to two types of common degree distributions: Gaussian  and Zipfian  that are defined as follows: We consider a special case where the value of a parameter  ∈ N in iteration  depends on the its value in the previous iteration  − 1 such as  = −1 +  such that  = −1 +.This is applied on the parameters of the degree distributions , ,   , ,  and  denoting the total number of vertices.That is,  denotes the number of vertices to be added or removed from the graph in the relative generation process.Note that, RTGEN also generates the first snapshot which implies that the parameters of the degree distribution of the first snapshot should be given.

Controlling the evolution of the degree distribution
In this experiment, we show the evolution of the degree distribution of a sequence of graph snapshots generated with the relative generation procedure given a set of input parameters.Hence, we consider Gaussian and Zipfian degree distributions with different parameters and plotted the obtained degree distributions in Figures 4, 5 and 6. Figure 4 shows the evolution of the degree distribution of a generated sequence of 10 graph snapshots given the following parameters: { 0 = 10,  0 = 30,  0 = 2,  = 10,  = 5,  = 0.1}.By setting  to 5, we increase the average degree by 5 between each pair of snapshots.This indeed, can model a growth-only graph where the average edge degree tend to regularly increase as the time elapses.However, some real-world graphs are not growth-only in the sense that they are subject to edge deletions.This is indeed the case of human-proximity or transportation graphs where an important number of short-term connections is only valid during peak hours.To model this characteristic, RTGEN also supports edge deletions.The evolution of the degree distribution with edge deletions is presented in Figure 5.Let the following parameters define the evolution of degree distribution for  ∈ [0, 4]: { 0 = 1,  0 = 60,  0 = 4,  = 0,  = 5,  = 0} Whereas the following parameters define its evolution for  ∈ [5,9]: { 0 = 10,  0 = 80,  0 = 2,  = 0,  = −5,  = 0}.Indeed, setting  to −5 indicates that the average degree decreases by a value of 5 between each pair of successive graph snapshots.Since real-world temporal graphs usually exhibit a power law degree distribution, we also generated graphs with an evolutionary Zipfian degree distribution composed of 10 graph snapshots as shown in Figure 6.For this generated temporal graph, we set the following parameters { 0 = 50,  0 = 2.5,  0 = 10,  0  = 10,  = 50,  = 0,  = 0,  = 5}.By setting parameter  to 5, we consider that the maximum degree of nodes increases by a value of 5 between each pair of successive snapshots.Whereas, the value of  indicates that 50 new nodes join the graph between successive snapshots.These parameters reflect the growth of a large number of real-world temporal graphs where new nodes join the graph and new connections are created as the time elapses.

Controlling the community structure of the generated graphs
In this experiment, we show the generated community structure with different parameters of the stochastic community matrix and the effect of varying parameter  of the hierarchical tree.As described in Section 4. where     and    are the within and between linkage probabilities of a community .Furthermore, Figure 8 presents the modularity in function of parameter  which we vary from 0 to 32.The modularity is a measure to quantify the goodness of community structure.Its formula compares, for all the communities, the fraction of edges that falls within the given community with the expected fraction if edges were distributed at random.It is clear from the results that the modularity increases with the increase of .This is justified by the fact that  is proportional to the difference between within and between edge linkage probabilities     −    .

Generating graphs with deletions between snapshots
As mentioned in Section 5, the relative graph generation procedure may incur a number of edge deletions.This can be cumbersome when the number of edges to delete for a given vertex is higher than the total sum of edges to delete from its neighboring vertices.We solve this problem by repeating the generation process until reaching an acceptable error threshold that is defined by the EMD between the obtained and desired degree distributions.Figure 12 shows the variation of the number of iterations and the execution time of the generation process in function of the threshold error defined by the EMD.The obtained results show that our generation procedure converges rapidly to a tolerable threshold.That is, a threshold equals to 0.001 can be reached with only 7      iterations.By comparing the execution time of 1 iteration and 7 iterations, we can notice that the difference is lower than the execution time of a single iteration.Indeed, the execution time resulting from repeating the generation is lower than the first iteration since the majority of modifications are added in the first iteration and only the remaining vertices whose linkage probability does not satisfy the sum of the linkage probabilities of its neighboring vertices are considered in the next iteration.Note that these results are obtained from the generation of two successive snapshots with the following input parameters of a Gaussian degree distribution: {0 = 500, 0 = 60, 0 = 2,  = 0,  = −30,  = 0}.

Accuracy of the generation procedure
We quantify the accuracy of the generated graphs with the given parameters by computing the distance metrics   and  defined in Section 5.4.We generated a sequence of  = 5 snapshots with the following parameters of Gaussian degree distribution: {0 ∈ {10, 100, 500, 1 }, 0 = 30, 0 = 2,  = 0,  = 10,  = 0}.Besides, we controlled the community structure by fixing the following parameters of a hierarchical tree: ℎ = 2,  = 2,  = 4,  = 0. Figures 9, 10 and 11 plot the execution time, value of   and  in function of the total number of created edges from applying the Gaussian distribution whose parameters are given above.It is clear that the execution time increases with the number of the generated edges.The distance metric, however, decreases implying that RT-GEN approximates more accurately the given sequence of degree distribution and community structure as the total number of edges grows.

Related work
Synthetic graphs are important for developing benchmarks for assessing the performance of graph-oriented data platforms, when real graphs are not publicly avail-able or expensive to obtain.This has been the incentive to design models and generators, which are very useful for evaluating the efficiency of graph management techniques as storage, query evaluation, indexing, partitioning, etc.
An extensive work has been posited for the generation of static graphs.For instance, a special emphasis has been placed to control the degree distribution of the generated graphs.In this context, many graph generators were designed such as RTG [3], RMAT [4] and its generalisation Kronecker [5] producing only Power-Law distributions.Since real-world graphs are not limited to power-law distributions, BTER [6] and its extension Darwini [7] and GMark [8] produce graphs with any user defined distribution.
Another graph generation model producing a given degree distribution is the CL model, forming the basis of the RTGEN tool.This model can be regarded as a successor of the Erdos-Rényi model [23] that is designed for the generation of random graphs and a variant of the edge configuration model of Newman et al. [24].It was extensively discussed and reused [17,28,29,30].We choose to extend this model for its simplicity and scalability.
Besides, a number of existing graph generators are community-aware in the sense that they collect vertices that are more densely connected between each other than they are with the rest of the graph, in separate or overlapping subgraphs called communities [9,10,11].Although these generators preserve a given community structure, they fail to produce a graph with respect to a given degree distribution.In this paper, we overcome this limitation by allowing not only the generation of a given community structure but also a given degree distribution.
Despite the extensive work posited on the generation of non-temporal graphs, the generation of temporal graphs has received much less attention.For instance, DANCer [31] is capable of generating temporal, community-aware property graphs.It separates operations performed on communities (macro operations) from operations performed on vertices and edges (micro operations).ComAwareNetGrowth [32] is a community-aware graph generator that is capable of creating growth only graphs.APA (Attribute-Aware Preferential Attachment) [33] is a graph generator capable of creating growth-only property graphs based on a non-conventional triangle closing.Instead of closing a triangle based on a uniform probability given as an input parameter, their proposed model consists of closing a triangle based on the similarity between the candidate edge's endpoints.While GMark [8] generates static graphs, EGG (Evolving Graph Generator) [34] proposes an extension including evolving properties attached to each vertex.EGG, however, disregard the topological changes to the network and narrow the temporal evolution of the graph to property updates.DSNG-M (dynamic social network generator based on modularity) [35] is a graph generator that is capable of generating temporal graphs by flipping the direction of edges of a given graph in order to satisfy a randomly chosen modularity value assigned to a single graph snapshot.Some of the aforementioned graph generators produce temporal graphs with properties on nodes or vertices, which we do not address in this paper.None of them, however, allows the control of the evolution of the degree distribution given ground truth parameters that describe this evolution.This challenge lead to the elaboration of the RTGEN tool that allows the approximation of any given sequence of degree distributions that describes the evolution of the graph.We firmly believe, that the degree distribution is a key feature that characterizes graphs, hence, it should not be disregarded in graph generation tools.

Conclusion
In this paper, we addressed the generation of temporal graphs that represents a critical challenge in the design of benchmarks specific for evaluating temporal graph management systems.That is, we proposed RTGEN, a temporal graph generator that produces a sequence of graph snapshots whose community structure and evolution of the degree distribution results from approximating user defined parameters.This generation procedure consists of relatively generating a graph snapshot from a previous one by applying a number of atomic graph operations.Our generation technique relies on an Optimal transport solver to approximate a user-defined sequence of degree distributions while minimizing the number of operations needed to transform one snapshot into its successor.We conducted a number of experiments that validated the efficiency and accuracy of our generation procedure.In the future, we are planning to include a dynamic community structure to RTGEN.Indeed, the communities found in real-world graphs are subject to splits, merges, shrinks or expansions which should also be modelled in synthetic graphs.

Figure 2 :
Figure 2: Hierarchical community tree with height ℎ and branching factor .

Figure 3 :
Figure 3: Graph representing the case of a non-possible edge breakage.

Condition 2 :Condition 3 :
This condition states that the sum of all portions of vertices with degree variation  −  ∀ ∈  in  should be equal to the portion    of vertices in  having a degree  resulting in the following equality: This condition states that the portion  of vertices with degree variation  −  in the graph must be equal to the sum of all portions    ∀ ∈ .

Figure 4 : 8 Ti
Figure 4: Gaussian degree distribution of a growth only graph

Figure 5 : 8 Ti
Figure 5: Gaussian degree distribution of a graph with edge deletions

Figure 6 :
Figure 6: Zipfian degree distribution of a growth only graph

Figure 7 :
Figure 7: A visualization of the generated graphs with a hierarchical community structure with parameters:  = 4, ℎ = 2,  = 4 and a varying .

Figure 8 :
Figure 8: Modularity value in function of parameter  ranging from 0 to 32.

Figure 9 :
Figure 9: Execution time in function of the number of edges.

Figure 10 :
Figure 10:   in function of the number of edges.

Figure 11 :
Figure 11:  in function of the number of edges.

Figure 12 :
Figure 12: The variation of the number of iterations and execution time in function of the EMD.
Whereas, the output is a new graph snapshot  ′ .Note that, the value of _ is equal to 0 in the first iteration.The transportation matrix  is computed using the function getTransportMatrix by taking the degree distributions  and  as input.The function getVector, computes  and  based on the Conditions 1, 2 and 3 and solves the system of equations defined by  =  to find the vector .The total number of edges to add ( + ) and delete ( − ) are then computed based on the transporation matrix  .The function getCDFComs computes the cumulative distribution function   based on the density community matrix  .Then, vectors    + and    − representing the cumulative density functions of the linkage and breakage pools and a list of logs (graph updates)  are initialized.The function getCDFPools is used to compute the cumulative distribution functions    + and    − based on the probabilities  +  → ′ , and  −  → ′ , .The process of adding and removing edges is repeated  + and  − times, respectively.In each iteration, communities  and  are picked based on   and vertices  and  are picked using    + [] and    − [].
< 0 Algorithm CRGG depicts the relative community aware graph generation procedure.The input parameters are the graph snapshot , desired degree distribution , density community matrix  , threshold of the EMD distance between  and , maximum number of repetitions _ and the current number of repetitions _.