Expected Values Estimated via Mean-Field Approximation are 1/N-Accurate

Mean-field approximation is a powerful tool to study large-scale stochastic systems such as data-centers -- one example being the famous power of two-choice paradigm. It is shown in the literature that under quite general conditions, the empirical measure of a system of N interacting objects converges at rate O(1√N) to a deterministic dynamical system, called its mean-field approximation. In this paper, we revisit the accuracy of mean-field approximation by focusing on expected values. We show that, under almost the same general conditions, the expectation of any performance functional converges at rate O(1/N) to its mean-field approximation. Our result applies for finite and infinite-dimensional mean-field models. We also develop a new perturbation theory argument that shows that the result holds for the stationary regime if the dynamical system is asymptotically exponentially stable. We provide numerical experiments that demonstrate that this rate of convergence is tight and that illustrate the necessity of our conditions. As an example, we apply our result to the classical two-choice model. By combining our theory with numerical experiments, we claim that, as the load rho goes to 1, the average queue length of a two-choice system with N servers is log2 1/(1--ρ) + 1/(2N(1-ρ) +O(1/N2).


INTRODUCTION
Mean-field approximation is a powerful tool for studying systems composed of a large number of interacting objects. The idea of mean-field approximation is to replace a complex stochastic system by a simpler deterministic dynamical system. This dynamical system is constructed by considering that each object interacts with an average of the other objects (the mean-field). When each object has a finite or countable state-space, this dynamical system is usually a non-linear ordinary differential equation (ODE)ẋ = f (x ), where x i (t ) denotes the fraction of objects in state i at time t and f is the drift of the system.
Mean-field approximation is widely used to study the performance of computer-based systems: queueing networks [1], wireless networks [5], dissemination algorithms [6], caching [12], SSDs [24],. . . An important area of application is the analysis of resource allocation strategies in server farms or data centers: such a 16:2 • Nicolas Gast system is composed of a large number of servers that interact because of scheduling or routing strategies [10,17,19,20,22,25]. A typical example is the power of two-choices: Mean-field approximation has been used in [20,25] to show that routing a task to the least loaded of two randomly sampled servers significantly reduces the response time compared to a purely random allocation.
Mean-field approximation is known to be asymptotically exact for many systems. For these systems, the fraction of objects in a given state i at time t, X (N ) i (t ), converges to a deterministic quantity x i (t ), as the number of objects N goes to infinity. The rate of convergence of X (N ) i to x i has been studied by several papers, e.g. [2,15,26], that show that the expected distance between X (N ) and x is of the order of 1/ √ N : This result is a like a central-limit-theorem for mean-field systems: X (N ) (t ) is equal to x (t ) plus 1/ √ N times a Gaussian noise [15]. It was originally proven for finite time-horizon and recently extended to stationary distributions in [26].
Yet, we believe that Equation (1) does not fully explain the accuracy of mean-field approximation. As an example, we provide in Table 1 results obtained by simulation on how the mean-field approximation is accurate for the power of two-choices model 1 of [20,25]. We report the average queue length in steady-state as a function of the number of servers N for ρ = 0.9. We observe that the error made by the mean-field approximation decreases as 1/N , much faster than the 1/ This discrepancy comes from the fact that the error term m N −m ∞ is the distance between the mean-field value m ∞ and an expected function of X (N ) . It is not the expectation of a distance as in Equation (1). We therefore adopt a new point of view in this paper. Instead of studying the distance between X (N ) (t ) and x (t ) as in Equation (1), we study the distance between the expectation of a function of X (N ) (t ) and its mean-field approximation. As a norm is a convex function, we expect the former to be smaller than the latter. We will show in fact that there is an order of magnitude of difference: under mild conditions, for any function h that is twice differentiable, this distance is of order 1/N : As for Equation (1), we show that Equation (2) holds for the transient regime and can be extended to the stationary regime under the same conditions as [26]. As a byproduct, Equation (1) can be recovered from Equation (2) by using h(.) = ∥. − x ∥ 2 . This result shows that an average value estimated via mean-field approximation is 1/N -accurate. In a queuing network such as the two-choice model, the average queue length can be expressed as E[h(X (N ) )]. Equation (2) shows that the average queue length converges at rate O (1/N ) to its mean-field approximation. This is what is observed in Table 1, where E[h(X (N ) )] ≈ h(x ) + 4/N .
Contributions. We prove our result for a generalization of the classical population model of Kurtz in which each object can have a countable state-space.
The classical methods to obtain Equation (1) for transient [2,15] or stationary regime [3] rely on martingale concentration arguments. Our approach is different and uses an idea inspired by Stein's method and [14,21,26] to relate Equation (2) and the convergence of the generators of the Markov chain.
From a theoretical perspective, the main contribution of our paper is to provide a unified framework in which Equation (2) is true: our results hold for the transient and stationary regime and for finite or infinite-dimensional models. Using Stein's method, we show that the O (1/N ) rate holds if the derivative with respect to its initial condition of t 0 Φ s xds exists and is Lipschitz-continuous (where t → Φ t x denote the unique solution of the ODĖ x = f (x ) that starts in x at time 0). We show that, for the transient regime, this holds as soon as the drift f has a derivative that is Lipschitz-continuous. For the stationary regime, it holds when, in addition, the ODE has a unique attractor that is exponentially stable. In the transient case, [14] obtains a O (1/N ) like Equation (2) for a simplified version of our model where a transition affects at most one object. For the stationary regime, a (1) is obtained in [26]. Our proof is inspired by the methodology of [26] but we obtain a stronger result by working with a generic function h. Note that infinite-dimensional models arise naturally when one considers queuing systems with unbounded queues. They are not considered in [14,26].
Another contribution of our paper compared to [14,26] is to characterize what happens when the derivative of the drift is not Lipschitz-continuous. We show that if the drift is differentiable and its derivative is α-Hölder continuous, then the convergence rate occurs at rate O (1/ √ N 1+α ). We provide a numerical example in Section 4 that shows that in general this exponent is tight and that having a Lipschitz-continuous drift is not sufficient to obtain Equation (2). This contrasts with the convergence rate of Equation (1) that is known to be O (1/ √ N ) as soon as the drift is Lipschitz-continuous (or one-sided Lipschitz-continuous) for the transient case. Hence, only Lipschitz-continuity (or one-sided Lipschitz-continuity as in [11,22]) is not sufficient to guarantee a 1/N -accuracy of the mean-field estimates.
From a performance analysis perspective, one of the main insights provided by our result is that for many mean-field models and performance functions h, we have This provides a refinement of the mean-field approximation: the performance of a system of size N can be precisely estimated from two quantities: its mean-field approximation h(x ) and the constant C corresponding to the error of the mean-field approximation. The constant C can be estimated analytically or by simulation for a small system size and then extrapolated to larger system sizes. As an example, we study in detail the convergence rate of the classical power-of-two-choice model of [20,25]. This model shows that routing an incoming job to the shortest of two randomly sampled queues greatly reduces the average queue length : as N goes to infinity, for a system with load ρ ≈ 1, the queue length equals Θ(log 1/(1 − ρ)) compared to 1/(1 − ρ) in the case of purely random allocation. We show that for any fixed ρ, the average queue length in a N -server two-choice system is at distance d ρ /N of its mean-field approximation. We provide numerical evidence that this d ρ grows as 1/(1 − ρ) as ρ goes to one. As a result, we conjecture that the average queue length in a N -server two-choice system is where Θ(1/(1 − ρ)) is equivalent to 1/(2 − 2ρ) as ρ goes to 1. The above equation is constructed by assuming first that N goes to infinity and then that ρ goes to one. This result should be contrasted with the heavy-traffic limit of [9], who study the case where the load depends on N and goes to one as N goes to infinity: ρ N = 1 − 1/η N with lim N →∞ η N = ∞. The authors show that the mean-field approximation always under-estimates the average queue length. They also establish a heavy-traffic regime when η N grows sub-linearly. Our result suggests that for this scaling, the difference between the average queue length of the system of size N and its mean-field limit is of order 1/(N (1 − 1ρ N )) = η N /N → 0. A comparison of both results is not straightforward as the considered metric is different. Yet, our result confirms that when η N grows sub-linearly, there is little difference between the system of finite N and the mean-field.
Reproducibility. The code to reproduce the paper -including simulations, figures and text -is available at https://github.com/ngast/meanFieldAccuracy.
Roadmap. The paper is organized as follows. We introduce the model in Section 2. We provide the main theoretical results in Section 3: the transient regime and its extension to stationary regime when the ODE has a unique exponentially stable attractor. We illustrate the necessity of the regularity assumptions in Section 4. We apply these results to the two-choice model in Section 5. Section 6 contains the proof of the most technical lemmas. Finally, we conclude in Section 7.

MEAN-FIELD MODEL 2.1 Population Processes
We consider population processes described by the classical model of density-dependent population process of [15,16]. We recall the definition here. A population process is a sequence of continuous-time Markov chains (CTMC) (X (N ) ). For each N , X (N ) evolves on a subset of some Banach space 2 (E, ∥.∥). We assume that there exists a set of vectors L ∈ E and a set of functions β ℓ : E → R + such that X (N ) jumps from x to x + ℓ/N at rate N β ℓ (x ) for each ℓ ∈ L.
We will refer to the chain X (N ) as the system of size N , although N does not a priori correspond to the system's size. For such a system, we define the drift f as This definition means that f (x )dt is the expected variation of a chain X (N ) that would start in state x during a small time interval dt. It does not depend on N .
Mean-field interacting system. A particularly interesting subclass of population processes is the case of meanfield interacting systems. Such a system is composed of N objects. Each object lives in a finite or countable state-space S. The state of the object n at time t is denoted by S (N ) n (t ). Let X (N ) (t ) be the empirical measure at time t: for i ∈ S, X (N ) i (t ) is the fraction of objects in state i at time t: We say that this system is a mean-field interacting system if (X (N ) ) is a population process. The process X (N ) lives in P (S), the set of probability measures on S. Note that if all objects are exchangeable, we have The two-choice model is an example of mean-field interacting system (see Section 5).
Imperfect population processes. In some cases, it is useful to consider models in which the set of transitions L or the transition functions β ℓ vary with the system size. We refer to this case as the imperfect population processes and we add a superscript N to all quantities that depend on N . The drift of an imperfect model depends on N and will be denoted by

Notations
For a function h : E → F , we will denote by Dh the derivative of h (if it exists). For α ∈ (0, 1], a function h is α-Hölder continuous with constant L if for all x, y ∈ E: Lipschitz-continuity corresponds to the case α = 1: a function is Lipschitz-continuous if it is 1-Hölder continuous. The set C 1+α (E, F ) denotes the set of functions h : E → F that are differentiable and such that the first derivative of h is α-Hölder continuous. The set C 1 (E, F ) denotes the set of functions h : E → F that are differentiable and such that the first derivative of h is continuous and bounded.
In the theorems, we assume that f is Lipschitz-continuous. This implies that, for any initial condition the ODĖ x = f (x ) has a unique solution. For t ≥ 0, we denote by Φ t x the value at time t of the solution that starts in x at time 0. It satisfies We denote by L (N ) the generator of the system of size N and by Λ the generator of the ODE. They associate to each function h that is differentiable, two functions L (N ) h and Λh that are defined as By definition of the drift: (Λh)(x ) = ℓ ∈ L β ℓ (x )Dh(x ) · ℓ.

CONVERGENCE RESULTS
The classical approach to show the convergence of a stochastic system to its mean-field approximation is to obtain a bound on the expected distance between X (N ) (t ) and its deterministic limit Φ t x. Our approach is different in the sense that we obtain a bound of the distance between the expectation of X (N ) (t ) and Φ t x. We show that the latter is in general much smaller than the former:

Transient Regime
Our first result concerns the transient regime. We show that in this case, the Lipschitz-continuity of the derivative of the drift suffices to show that an average value estimated by the mean-field approximation -h(Φ t x ) -is at distance O (1/N ) from the true value -E[h(X (N ) (t ))]. One application of this result is when one wants to compute the marginal law of an object of a mean-field interaction model. This result combined with Equation (4) implies that the marginal law of one object converges at rate O (1/N ) to its mean-field approximation: where x (t ) denotes the value at time t of the unique solution of the ordinary differential equation starting in x ≈ X (N ) (0). This result complements functional central limit theorems -such as the results of [16] -that show that the sample paths of X (N ) are at distance O (1/ √ N ) of x. Note that the conditions to apply Theorem 3.1 are easy to verify and can be done by a syntactic analysis of the model. In Section 3.4, we show that this result can be modified in the case of drifts whose derivatives are α-Hölder continuous.
Assume that the derivative of the drift exists and that both the drift and its derivative are Lipschitz-continuous. Let h : E → R be a differentiable function such that both h and its derivative are Lipschitz-continuous with constant L.
Proof. The proof can be decomposed in three steps. The first step is to reduce Equation (5) to a convergence of the generators. For that, we use Lemma 6.1 that shows The second step is to remark that if the derivative of a function д is Lipschitz-continuous with constant L, then L (N ) д converges to Λд at rate N −1 . Indeed: The third step is provided by Lemma 6.2 which shows that when the derivative of the drift is Lipschitzcontinuous, the derivative of the function x → Φ s x exists and is also Lipschitz-continuous with constant A s . Hence, the derivative of x → h(Φ s x ) is Lipschitz-continuous with some constant LA s . We can then plug Equation (7) The statements and the proofs of the two lemmas are postponed to Section 6.1. □

Stationary Regime
We now focus on the stationary distribution of X (N ) and how it concentrates on the stationary regime of the ODE. It is shown in [2] that if the ODE has a unique attractor (i.e. a fixed point to which all trajectories converge), then the stationary measure of the system of size N concentrates on this fixed point x * as N goes to infinity. In this section, we show that the rate at which it concentrates is O (1/N ) under the condition that the ODE has a unique attractor that is locally exponentially stable (as in [26]).
We say that a point x * is a locally exponentially stable attractor if there exists a bounded neighborhood V of x * , and two constants a, b > 0 such that In practice, condition (A1) is easy to verify, as it depends mainly on whether the ODE is locally stable. This can be done by a numerical procedure that linearizes the drift f around its fixed point and computes the eigenvalues of its Jacobian matrix. The most difficult condition to verify is (A2): proving that the fixed point of an ODE is a global attractor is difficult because it depends not only on the form of the ODE but also on precise values of its parameters [2,7]. Note (A2) is necessary to apply the classical concentration results of [2] which means that our conditions are not more difficult to verify than the ones of [2].
We assume that for N large enough, the stationary measure of the system of size N exists. Moreover, in order to deal with possibly non-compact state-space E, we assume that, as N goes to infinity, the stationary measure concentrates on a bounded set B of E at rate O (1/N 2 ), i.e.
(A3) There exists a bounded set B and a constant B > 0 such that P[X (N ) B] ≤ B/N 2 (in steady-state).
If the system evolves in a compact space, assumption (A3) is trivially satisfied by taking B = E. In the case of unbounded state-space E, we know from [2] that the stationary measure converges on the Dirac measure in x * as N goes to infinity. The condition (A3) is a bound on how far X (N ) can deviate from x * . For the two-choice model, we show it by bounding X (N ) by a system of N independent M/M/1 queues and then by using a large-deviation type results (see the end of Section 6.4).
Theorem 3.2. In addition to the assumptions of Theorem 3.1, assume that • the ODE has a unique attractor x * that is locally exponentially stable (assumptions (A1) and (A2)) • either E is bounded or the stationary measure concentrates on a bounded set B at rate O (1/N 2 ) (Assumption (A3)).
Then for any h ∈ C 2 (E, R) bounded, the constant C (t ) in Theorem 3.1 is uniformly bounded (in t). In particular, there exists a constant C < ∞, such that: where E (N ) [.] denotes the expectation with respect to the stationary distribution of the system of size N .
Proof. Without loss of generality, we assume that h(x ) = h(x * ) outside a bounded set. Indeed, if this is not the case, we may replace h by a function h ′ that is equal to h on B and equal to h(x * ) outside a bounded set. In which case, by Assumption (A3) and because h is bounded, The first step of the proof is then to construct Equation (10). This equation is the analog of (6) as t → ∞. Following Stein's method and [26], we define the function Gh as The previous integral is well defined because x * is exponentially stable. By differentiating in s = 0 the quantity ) dt with respect to s, it can be shown that Gh satisfies the Poisson equation: As L (N ) is the generator associated to the stationary measure, for any function h ′ such that , and combining this with (9), we get: The previous equation is essentially a generalization of [26,Equation (4) and Equation (7)]. We then decompose Equation (10) in two terms: As shown by Equation (7)

Extension to Imperfect Models
The above results apply for exact mean-field models. For some mean-field models, the set of transitions L N or the rates β ℓ depend on the system size. In this section, we show that the above results can be extended to the case of imperfect models. Note that this theorem does not require that the convergence of the set of transitions of the rate functions but only that f N , the drift of the system of size N , converges uniformly to a function f .
(ii) If the other assumptions of Theorem 3.2 are satisfied, then Theorem 3.2 holds, replacing Equation (8) by Proof. One key ingredient of the previous Theorems is Equation (7), that shows that L (N ) converges to Λ at rate O (1/N ). In an imperfect population process, the generator of the system of size N is Let Λ (N ) be the generator of the ODE associated to f N : It follows that Similarly to Equation (7), the first term is of order

Hölder Continuous Derivative
In this section, we study what happens when the derivative of the drift is not Lipschitz-continuous. When the drift f is Lipschitz-continuous with constant L, it is shown in [15] that there exists a constant C such that for all t: In the previous section, we have shown that when the derivative of the drift exists and is Lipschitz-continuous, the convergence is in O (1/N ). The next theorem, whose proof is postponed to Section 6.3, characterizes what happens in between, i.e. when the derivative of the drift exists and is α-Hölder continuous with α ∈ (0, 1).
Assume that the drift is Lipschitz-continuous with constant L and that its derivative exists and is α-Hölder continuous. Let h : E → R be a differentiable function such that h and its derivative Dh are α-Hölder continuous with constant L. Then, there exists C < ∞ such that for t > 0: A key difference between this result and Theorem 3.1 lies in the methodology of proof. On the one hand, the proof of Theorem 3.4 involves Gronwall's lemma which leads to a constant e H t that grows exponentially with time which therefore cannot be easily adapted to the stationary regime. On the other hand, the proof of In fact, the proofs of Theorem 3.1 and 3.2 can be adapted to the case of α-Hölder continuous derivative. The only difference is that, for a function д that has a α-Hölder continuous derivative, we have (L (N ) − Λ)д = O (1/N α ) instead of Equation (7). This means that the α-Hölder continuity of t 0 DΦ s ds implies a convergence rate of O (1/N α ) (for t < ∞ and t = +∞). As a consequence, if the ODE has a locally exponentially stable attractor x * , if the derivative of the drift is α-Hölder continuous and if h equals 0 outside a bounded set, then in stationary regime we have More details are given in Sections 6.1 and 6.2.
To the best of our knowledge, Equation (11) is the first result that guarantees a rate of convergence of the steady-state distribution of a mean-field model in the case where the derivative is not Lipschitz-continuous ( [26] considers models that evolve in a compact space and that have a twice-differentiable drift which implies that the derivative of the drift is Lipschitz-continuous).

NECESSITY OF THE CONDITIONS
In this section, we examine the necessity of the regularity conditions of the drift. We provide an example that shows that when the drift is Lipschitz-continuous (and not differentiable), the rate of convergence of |E[X (N ) ] −x | is not faster than 1/ √ N . When the derivative of the drift is α-Hölder, we show that the convergence is not faster . This contrasts with the case of E[∥X (N ) − x ∥] that is known to converge at rate Θ(1/ √ N ) in all cases.

Case-Study: a Birth-Death Process
We use a small example for which we are able to numerically solve the Kolmogorov equations associated to the N -dimensional Markov chain. This avoids estimation errors due to simulations.
It should be clear that this process is a population process with drift f . For α = 0, f is Lipschitz-continuous and not differentiable. For α = 1, f is differentiable and its derivative is Lipschitz-continuous. For α ∈ (0, 1), the drift is differentiable and its derivative is α-Hölder continuous. In the remainder of this section, we study how α affects the convergence rate of E[X (N ) ] − x and of E[∥X (N ) − x ∥]. An important observation is that the Lipschitz-continuity of the drift is not sufficient to obtain an O (1/N ) convergence. This example shows that mean-field approximation is more accurate when the drift is not only Lipschitz-continuous but also has a derivative that is Lipschitz-continuous.

Influence of the Hölder Exponent
In Figure 3, we study in more detail the impact of the exponent α on the rate of convergence in transient and stationary regime. In both cases, we observe that the exponent (1+α )/2 given by Theorem 3.4 cannot be improved. We used a numerical procedure to compute the values of E[|X (N ) (t ) − x (t )|] and of |E[X (N ) (t )] − x (t )| for α ∈ [0, 1]. The value for t = 1 is computed by integrating the Kolmogorov equation while the steady-state value is easily obtained because the process is a birth-death process. In all tested cases, we observed a convergence in cN −e . In Figure 3

APPLICATION : TWO-CHOICE MODEL
The two-choice paradigm is a well studied model since [20,25]. In this section, we show that this model satisfies the assumptions of Theorems 3.1 and 3.2. We use these theorems to show that the marginal law and the average queue length in steady-state converge at rate O (1/N ) to their mean-field approximation. We also evaluate the constant of the O (1/N ) numerically and provide numerical evidence that it evolves as 1/(2 − 2ρ) as the load of the system ρ goes to 1. By using this, we develop an improved approximation that is O (1/N 2 )-accurate.

Two-choice Model
We consider the supermarket model of [20] with N identical servers. Each server maintains a separate queue. Jobs arrive in the system according to a Poisson process with intensity N ρ and the processing time of a job is exponentially distributed with mean 1. We denote by S (N ) n (t ) the queue size of the server n at time t. For each incoming job, two queues (say i and j) are selected uniformly at random and the job is allocated to the smallest queue among those two. Note that for all N , (S (N ) 1 (t ) . . . S (N ) N (t )) is a CTMC and has a unique stationary distribution when ρ < 1.
Let X The first type of events correspond to a departure from a queue with i jobs. The second term corresponds to an arrival at a queue with i − 1 jobs : An arrival is allocated to a queue with i − 1 jobs if both queues have at least i − 1 jobs and not both queues have i or more jobs.
The form of these transitions indicates that X (N ) is a population process, as defined in Section 2. The corresponding mean-field approximation of this system is given by the following infinite system of ODEs: It is shown in [20] that this ODE converges exponentially fast to its unique fixed point π = (π 0 , π 1 , . . . ), where This guarantees that the stationary distribution of X (N ) concentrates on π as N goes to infinity. In the following, we analyze the rate at which this concentration occurs.

Rate of Convergence
The only difficulty for applying the results of Section 3 is that the queue sizes are unbounded and that therefore, X (N ) lives in an infinite-dimensional space. In the following, we equip the space of infinite sequences with a weighted ℓ 1 -norm with a carefully chosen sequence w. We denote the norm of an infinite sequence x by ∥x ∥ w : Let E be the set of sequences x such that ∥x ∥ w < ∞. The process X (N ) lives in a subset of E for each N : Indeed, the sequence (X (N ) i ) i is non-negative, non-increasing, and as there is a finite number of jobs in each server, there is only a finite number of indices i for which X (N ) i > 0. Such a sequence satisfies ∥X (N ) ∥ w < ∞ and is therefore in E.
The next result, whose proof is given in Section 6.4, shows that there exists a sequence of weights w such that the model satisfies the assumptions of Theorem 3.2.
Theorem 5.1. There exists an increasing sequence of weights (w i ) i ≥0 and a constant B such that the two-choice model satisfies the assumptions of Theorem 3.2 and such that in steady-state: Theorem 3.2 implies that if h is a bounded function, then the quantity E (N ) [h(X (N ) )] converges to h(π ) at rate O (1/N ). This is the case for the marginal law of one server by using the function h(x ) = x i which is bounded by 1. This shows: This result sharpens the result of [18] in which a convergence rate of O (log N /N ) was obtained. Equation (12) implies that for any h that grows at most linearly, E (N ) [h(X (N ) )] converges to h(π ) at rate O (1/N ).

Corollary 1.
Let h : E → R be a twice-differentiable function. Assume that there exists constants a, b such that for all x ∈ E: |h(x )| ≤ a + b ∥x ∥ w for the sequence w defined in Theorem 5.1. Then the stationary measure of the N -server two-choice model satisfies

16:14 • Nicolas Gast
For a vector X ∈ E, the quantity N i ≥1 X i is the total number of jobs in the N servers of the model. As the weight sequence w is increasing, the function h(x ) = i x i satisfies h(x ) = i x i ≤ ∥x ∥ w /w 1 . As a result, the expected number of jobs satisfies Our result can also be used to bound the mean-square deviation by using This shows that for the two-choice model: This last result improves the bound of [27], in which a convergence rate of O ((log N ) 3 (log log N ) 6 /N ) is obtained. The factor (log N ) 3 (log log N ) 6 of [27] comes from the fact that the author studies a truncated finite-dimensional model in order to study the original model. This factor is then a bound on the error between the finite and infinite models. One advantage of our approach is that we treat directly the infinite-dimensional system. To work in the infinite-dimensional case, the main technical difficulty for applying our approach is to show that P[∥X (N ) ∥ ≥ B] = O (1/N 2 ). In our proof of Theorem 5.1, we prove it by using a coupling to bound the process by a system of N independent M/M/1 queues. We then show in Lemma 6.5 that the sum of N independent variables have a third moment and therefore concentrate on a bounded set at rate O (1/N 2 ).

Numerical Evaluation
In this section, we evaluate numerically the O (1/N ) of (13). We provide numerical experiments that show that when ρ is close to 1, the difference between the average queue length of the N -server model and the mean-field approximation is approximately ρ 2 / (2N (1 − ρ)).

Methodology.
In order to evaluate the term O (1/N ), we use a stochastic simulator 3 of the two-choice model. Our simulator follows strictly the stochastic model that we described in Section 5.1.
To sample from the stationary distribution, we use a starting period of 10 4 N events starting from an empty system. We then draw one sample of a performance indicator by computing an average over the next 10 6 events. This gives us one sample. To guarantee confidence intervals of order 1/N , the reported results are averages over K (N ) = max(500, N 2 /100) samples. The error bars that are shown on the figures correspond to the standard confidence intervals for the mean (i.e. ±2σ / K (N ), where σ 2 is the empirical variance of our K (N ) samples). The time to simulate K (N ) samples grows as N 2 : from 10 seconds for N = 100 to more than 1 hour for N = 1000.

Marginal law of one server.
We first study the convergence of the marginal distribution of one object to π . For the system of size N , we denote by p N i the probability that a server has i or more jobs. As the servers are symmetric, p N i equals By Theorem 5.1, the difference between p N i and π i is of order O (1/N ). In Figure 4, we display the quantity N (p N i − π i ) as a function of N . We report the values for ρ = 0.9 and i ∈ {2, 3, 4, 5}. The quantity N (p N i − π i ) converges rapidly to a constant value.  We also performed simulations for other values of ρ ∈ [0.5, 0.99] for which we observe similar behavior. This suggests that for each ρ ∈ [0, 1) and i ≥ 1 there exists a constant c i, ρ such that 5.3.3 Average queue length. We now focus our attention on the average queue length (in steady-state). We denote by m N (ρ) the average queue length in the system of size N and by m ∞ (ρ) = i ≥1 π i its mean-field approximation. By We verify this numerically in Figure 5, where we display N (m N (ρ) −m ∞ (ρ)) as a function of N for various values of ρ. We observe that in each case, this value stabilizes quickly. This stabilization is slower as ρ approaches 1. Equation (15) shows that it is possible to obtain a more precise approximation by considering the mean-field approximation plus the d (ρ)/N term. In Table 2, we study the accuracy of this improved approximation in the case ρ = 0.9. This table should be compared with Table 1 (shown in the introduction) that shows that the mean-field approximation is 1/N -accurate. The improved approximation is indeed much closer to the value obtained by  Table 2. Average queue length in the two-choice model (for ρ = 0.9). The values for a finite number of servers N are obtained by simulation. The error is the distance between m N and the corrected mean-field approximation m ∞ (ρ) + d (ρ)/N . It is proportional to 1/N 2 .
simulation: We observe that the error of the improved approximation, Table 2 is the next term in the asymptotic expansion of m N (ρ), after the first term in d (ρ)/N .
Evolution of d (ρ). As observed in Figure 5, the constant d (ρ) increases rapidly as ρ approaches 1. To evaluate how d (ρ) evolves with ρ, we plot in Figure 6(a) an estimate of the quantity d (ρ) as a function of ρ. Our estimate of d (ρ) is the experimental value for N = 1000, 1000(m 1000 (ρ) − m ∞ (ρ)). We also plot the function ρ 2 /(2 − 2ρ). We observe that this function fits very well the values for ρ ∈ [0.5; 0.99]. To confirm this, we also plot d (ρ)(1 − ρ)/ρ 2 as a function of ρ in Figure 6(b). This figure suggests that it converges to 1/2 as ρ goes to 1.
The average queue length of the mean-field limit has been shown in [20] to be of order log(1/(1 − ρ)) as ρ tends to 1: more precisely, lim ρ→1 m ∞ (ρ)/ log(1/(1 − ρ)) = 1/ log 2. This is an order of magnitude smaller than when jobs are allocated at random in which case the average queue length is 1/(1 − ρ). Our numerical result suggests that for a finite system size N , a term in O (1/ (N (1 − ρ))) remains.
We conjecture that the average queue length of a N -server two-choice system satisfies with a hidden constant of 1/2 in the term Θ ρ→1 (1/(1 − ρ)). This term is equal to d (ρ) and seems to satisfy

PROOFS Notations
For a function f : E → F and д : F → G, we denote by D f (x ) the derivative of the function f evaluated at x. D f (x ) is a linear map that associates to each u the quantity D f (x ) · u. We denote by D ( f (д(x ))) the derivative of the function f • д with respect to x evaluated at x and by D f (д(x )) the derivative of f evaluated at д(x ). It follows that D( f (д(x ))) = D f (д(x )) · Dд(x ).
We define the following operators Ψ (N ) t and Φ t that operate on h : E → R:

Proof of Theorem 3.1 (Transient Regime)
The proofs of the following lemmas follow similar ideas as the proof of [14,Theorem 1]. Yet, we provide a complete proof here as our result generalizes [14, Theorem 1] for two reasons. First, the set of transitions that we consider is more general (a transition in the model of [14] can only affect one object). Second, our results apply for Banach space and not just for finite dimensional sets.
Proof. By definition of the operators Ψ (N ) and Φ introduced in (16) and (17), the left-hand side of Equation (18) can be written as We are therefore interested in the convergence of Ψ (N ) As in [14, Theorem 1], we use the following trick : The function s → Ψ (N ) t −s Φ s is equal to Φ t for s = t and Ψ (N ) t for s = 0. Hence: For any function h ′ , one has This shows that for any h and any x: which is Equation (18). □ Remark: Although Theorem 3.1 is stated only in the Lipschitz-continuous case, the proof in this section holds for α-Hölder continuous. This does not require more work and will be useful for the proof of Equation (11). Lemma 6.2. Assume that D f is bounded and is α−Hölder continuous with constant L. Then, there exists a constant β such that, for any t, Φ t is α-Hölder continuous with constant Le β t .
Proof. Let x, e ∈ E. By [4, Theorem 3.7.1 of Part II], DΦ t (x ) and DΦ t (x + e) exist and are differentiable with respect to time. Moreover, d By using the Hölder-continuity of D f and the fact that ∥DΦ t ∥ ≤ e t ∥Df ∥ , the second term of Equation (20) is bounded by The quantity ∥F t ∥ can then be bounded by using Gronwall's Lemma: The result follows by setting β = (1 + α )∥D f ∥. □

Proof of Theorem 3.2 and of Equation (11) (Stationary Regime)
Remark: As for Lemma 6.2, we directly prove the result in the case of α-Hölder continuous derivative. Theorem 3.2 can be obtained by considering the special case α = 1. Lemma 6.3. Let Φ t be the flow associated to a drift f : E → E that is differentiable with a bounded derivative. Let B ⊂ E be a bounded set. Assume that f is α-Hölder continuous and that the flow has an exponentially stable attractor x * .
(i) There exists b, δ > 0 such that for any t: (ii) There exists a constant c such that, the derivative of x → Φ t x exists and is α-Hölder continuous on B with constant ce −δ t . (iii) The function Gh, defined by is differentiable on B and its derivative is α-Hölder continuous.
Proof. We will first prove (i), then prove that (i) implies (ii) and then prove (iii). The most technical part of our proof is to show that (i) implies (ii). Its proof is inspired by the proof of [8,Lemma C.1] that shows that the exponential decay of the first derivative of the flow implies the exponential decay of the second derivative of a flow.
Proof of (i) -To ease the notation, we assume without loss of generality that x * = 0. By assumption, 0 is exponentially stable, which means that there exists β > 0 such that ∥Φ t (x )∥ ≤ e −β t ∥x ∥. Let where the functions д and h satisfy lim x →0 ∥д(x )∥/∥x ∥ = 0 and lim x →0 ∥h(x )∥ = 0. Let δ = β/2. By assumption, there exists a neighborhood N of 0 such that ∥д(x )∥ ≤ δ ∥x ∥ and ∥h(x )∥ ≤ δ for x ∈ N . As the flow is stable, there exists a neighborhood N ′ of 0 such that if x ∈ N ′ , then Φ t x ∈ N for all t ≥ 0.
The flow satisfies As the flow is exponentially stable, it is also linearly stable: ∥e At ∥ ≤ e −β t . Hence, for all x, x + y ∈ N ′ : This shows that Φ t is Lipschitz-continuous on N ′ with a constant less than ae −δ t with a = sup y ∈N ′ ∥y∥. By Lemma 6.2, DΦ t is continuous, which implies that ∥DΦ t ∥ ≤ ae −δ t . The set B is a bounded subset of E and 0 is exponentially stable. Hence, there exists T such that for all x ∈ B, Proof of (i)⇒(ii) -As in the proof of Lemma 6.2, let F t = DΦ t (x + e) − DΦ t (x ). Following, Equation (20), F t satisfies the differential equation with initial condition F 0 = D(Φ 0 (x + e)) − DΦ 0 (x ) = 0. The homogeneous part of the differential equation isȦ = D f (Φ t (x + e))A. It is a linear differential equation on L(E, E), the set of linear map from E to E. Let A t,s : L(E, E) → L(E, E) be the solution operator 4 of this homogeneous differential equation.
Let F : R + → L(E, E) be the function defined by Let us show that F is indeed the function F defined at the beginning of the proof by showing that it is a solution of the differential equation (21). By definition of F , we have F 0 = 0. Moreover, for t > 0, we have The solution operator A satisfies A t,s 1 = DΦ t −s x. Hence, by using (i) that ∥DΦ t ∥ ≤ ae −δ t , we have ∥A t,s ∥ ≤ ae −δ (t −s ) . This shows that where L is the Hölder constant of D f . By using again (i) (∥DΦ t ∥ ≤ ae −δ t ), we have Plugging this bound into Equation (22) shows This shows that ∥F t ∥ ≤ ce −δ t ∥e ∥ α with c := La 2+α /(δα ).
Proof of (iii) -Lemma 6.2 shows that for all t, the derivative of Φ t exists and is α-Hölder continuous. By (i), the derivative of Gh exists and is equal to By (ii), D(Φ t h) is α-Hölder continuous with constant Lce −δ t . This shows that the function D(Gh) is α-Hölder continuous with constant Proof. Let W be a bounded set outside containing the set V of Assumption (A2) and outside which h(x ) is equal to h(x * ). For x ∈ E, let S (x ) be the first time at which the solution of the ODE that starts in x enters Wthe function S (x ) exists because V ⊂ W and V attracts all trajectories. We have As W is bounded and Gh is continuous, there exists v < ∞ such that sup x ∈W ∥Ghx ∥ ≤ v < ∞. Moreover, by definition of S (x ), we have Φ S (x ) x ∈ W which implies that for all x, ∥Ghx ∥ ≤ v. This shows that This shows that which is of order O (1/N ) by Assumption (A3). □

Proof of Theorem 3.4 (Hölder-continuity)
To simplify the notation, we assume that X (N ) (0) = x. By [16], X (N ) is equal to As D f is α-Hölder continuous, for any x, y ∈ E we have As a result, for a random variable Y , we have Applying this to Equation (23) shows that

Proof of Theorem 5.1 (Two-choice)
To prove that the two-choice model satisfies the assumptions of Theorem 3.2, the only difficulty is to show that the fixed point is asymptotically stable (assumptions A1 and A2) and to show that the stationary measure concentrates on a bounded sub-set of E (assumptions A3). The existence of a sequence of w such that the two-choice model is asymptotically stable for the weighted ℓ 1 -norm ∥x ∥ = i w i x i has been shown in [20,Theorem 3.6]. To prove our concentration result, we also need that this sequence grows slower than ρ −i /3 . We construct such a sequence below. According to [20, Proof of Theorem 3.6], the quantity V (x ) = ∥x − π ∥ w satisfies d dt V (Φ t x ) ≤ −δV (x ) if the sequence w i is positive and satisfies where π i = ρ 2 i −1 is the fixed point of the ODE. Let a = ρ −1/4 and let us show that there exists a sequence (w i ) and a constant δ > 0 that satisfy (24) and such that w i = O (a i ).
As i * is finite, there exists δ * such that for δ ≤ δ * , the sequence w i is increasing and satisfies (24) for i < i * . Let δ = min(δ * , (a − 1)(b − a)/(ab)). For i > i * , we have this shows that w i satisfies Equation (24) for δ = min(δ * , (b − a)(a − 1)/(ab)) > 0. According to [23,Theorem 4] (see also [13,Theorem 4.1]), there exists a coupling between X (N ) and a variableX (N ) picked according to the stationary distribution of N independent M/M/1 queues such that for all j, As the sequence w is increasing, w i − w i−1 ≥ 0. This implies that (for this coupling): In stationary regime, the quantityX (N ) can be written asX (N ) i = (1/N ) N n=1 1Z n ≥i where the variablesZ n are i.i.d. and are geometrically distributed with parameter ρ. The sequence w defined above satisfies w i ≤ ca i . This implies that i j=1 w i ≤ c ′ a i , where c ′ = ca/(a − 1). Therefore, in the stationary regime, we have where c ′ < ∞ is the constant defined above. To conclude, we use the following concentration lemma that we apply with ξ n = aZ n . Note that quantity E[∥ξ n ∥ 3 ] = E[a 3Z n ] is finite because a < ρ −1/3 andZ n is geometrically distributed with parameter ρ.
By taking B ′ = E[ξ 1 ] + 1, we have where we use Markov's inequality to show that P ∥S N ∥ ≥ 1 = P ∥S N ∥ 3 ≥ 1 3 ≤ B/N 2 . □ In this paper, we revisited the rate of convergence of mean-field models. Most of the results in the literature show that the expected distance between the empirical measure X (N ) and its mean-field approximation x converges to 0 at rate O (1/ √ N ). Our approach is different : we study the distance between the expectation of X (N ) and x and we show that this distance decreases as O (1/N ). The latter quantity is natural when one wants to compute expected performance indicators such as the average queue length or the marginal law of one object. We have shown that this convergence rate of O (1/N ) holds for finite and infinite-dimensional models. It holds for the transient regime and also for the stationary regime when the limiting ODE has a locally exponentially stable attractor.
We believe that the potential of applicability of our method is large. Most of the discrete-space mean-field models found in the literature satisfy our assumptions. For these models, our results provide new insights on how accurate mean-field approximation is. They also suggest that in many cases, the mean-field approximation can be refined by a term C/N . An easy way to evaluate this constant C is to compute the gap between the mean-field approximation and a simulation for a small system size (N ≈ 10) and then to extrapolate to larger values of N . For future work, we are currently working on numerical and analytical methods to directly compute the constant C without simulation. This will provide new analytical and numerical tools to build approximations that depend on the system size N . We expect these new approximations to be more accurate than mean-field, especially when the system size is small (for example when N is a few tens).

ACKNOWLEDGMENT
The author would like to thank Romain Joly and Jaap Eldering for fruitful discussions on the link between asymptotic stability and the dependence on initial conditions. The author is grateful to Bruno Gaujal, Debankur Mukherjee and Sem Borst whose insightful comments substantially improved this paper. This work is partially supported by the EU project QUANTICOL, 600708.