Performance Analysis Methods for List-Based Caches With Non-Uniform Access

List-based caches can offer lower miss rates than single-list caches, but their analysis is challenging due to state space explosion. In this setting, we propose novel methods to analyze performance for a general class of list-based caches with tree structure, non-uniform access to items and lists, and random or first-in first-out replacement policies. Even though the underlying Markov process is shown to admit a product-form solution, this is difficult to exploit for large caches. Thus, we develop novel approximations for cache performance metrics, in particular by means of a singular perturbation method and a refined mean field approximation. We compare the accuracy of these approaches to simulations, finding that our new methods rapidly converge to the equilibrium distribution as the number of items and the cache capacity grow in a fixed ratio. We find that they are much more accurate than fixed point methods similar to prior work, with mean average errors typically below 1.5% even for very small caches. Our models are also generalized to account for synchronous requests, fetch latency, and item sizes, extending the applicability of approximations for list-based caches.


I. INTRODUCTION
R EPLACEMENT policies provide a strategy to select items to replace in a cache and are an important performance factor influencing the design of web systems [1], content delivery networks [2], edge networks [3], and peer-topeer traffic [4], among others. The focus of this paper is on the theoretical analysis of randomized policies operating within a single cache, a problem that has attracted much attention over the years [5]- [13]. We consider list-based caches, which can deliver lower miss rates than single-list caches [14], [15], but due to state-space explosion issues are still not fully understood from a theoretical standpoint. We here focus in particular on a general class of list-based caches that admit a Manuscript  tree structure for the lists, rather than linear as in earlier work on these models [15], and where access to the cache is nonuniform, in the sense that items are promoted within the tree based on probabilities that depend on the item, its current list, and the stream requesting it. Splitting the cache into a tree structure can be used to differentiate the way a cache protects items from eviction. For example, items of one stream may be stored in a different list than items of another stream to avoid cache pollution and optimize hit ratios. Our reference model consists of a cache partitioned into h lists, each having a capacity of m l ≥ 1 items, l = 1, . . . , h. Let the capacity vector be m = (m 1 , . . . , m h ) so that the cache has total capacity m = h l=1 m l . The total number of items is n and it is assumed to exceed the cache capacity (n > m) so that a replacement policy needs to decide which items to evict to make room for a new item. The challenge in designing effective replacement policies is that item popularity is not known beforehand, otherwise one could minimize miss rates by keeping the most frequently accessed items in the cache [16]. The focus is therefore on policies that self-organize item storage so to optimize miss rates. This paper stems from the recent development in [15] of a framework to analyze linear list-based caches with the random replacement (RR), first-in first-out (FIFO), or CLIMB policies, under the independent reference model (IRM) [17], [18]. In the list-based setting, these policies generalize into RR(m), FIFO(m), CLIMB(m) [15], [19] and can deliver lower miss rates than single-list caches, even when the latter are equipped with the LRU policy. In the context of non-uniform access probabilities, we further evolve these policies into two policies called RR-C(m) and FIFO-C(m), which may be seen a list-based extensions of policies for single-list caches where probabilities are used to capture access cost [16].
Our main contributions to the performance analysis of these caches are as follows. After showing that known product-form results generalize to the class of models we consider, we develop novel exact and approximate performance analysis methods to investigate RR-C(m) and FIFO-C(m). These include a highly-accurate singular perturbation method, bridged from geometric optics and queueing theory [20], and a refined mean-field approximation [21], which can analyze the asymptotic behavior of the cache in both transient and equilibrium regimes.
The paper also delivers a number of extensions to the reference models as follows. Often, caching systems are accessed by applications with a finite pool size, in which the arrival time of the next request from a stream depends on the fetch latency for the previous request. Stemming from the observation that this cache dynamics behaves qualitatively as a closed system, we combine our results with algorithms for closed queueing network theory, developing an approximate solution technique for models with synchronous requests. Our approximation is fairly general and it is shown to be able to accurately account for queueing due to miss and fetch latency.
Moreover, we show that our approach can be generalized to impose hard bounds on costs for the stored items. For example, we illustrate that it is possible to analyze our reference model in the case where one accounts for item sizes and limited size for individual lists, or globally for the cache as a whole.
The paper is organized as follows. Section I-A overviews related work. The reference model is introduced in Section II and its equilibrium analyzed in Section III. An exact analysis is developed in Section IV. Section V approximates the normalizing constant. Iterative analysis and bounds are introduced in Section VI. Numerical results are reported in Section VII. An approximation for models with synchronous requests and fetch latencies is presented in Section VIII, followed by a cost analysis method in Section IX and the conclusion. Proofs are given in the Appendix. Additional material is included within an online supplement. The present work delivers an extension of a preliminary paper that analyzes non-uniform access probabilities in a less general family of list-based caches with linear structure [22].

A. Related Work
Recently, most of the research literature has focused on time-to-live (TTL) caches, which can be analyzed either by exact methods [23] or by characteristic time approximation [24]. The later approach has also been shown to be valid for randomized policies [25]. However, there is still a limited body of work on using this method to study list-based caches [19] and therefore our paper strikes a contribution towards performance analysis of this particular family of stochastic systems. This paper is related to works that consider the independent reference model (IRM), which include [5]- [7], [18], [26], and references therein. Although real systems deviate from the IRM assumptions, IRM theoretical results provide useful insights, such as the characterization of the miss ratio of LRU [9], and formulas for the approximate analysis of cache networks [27]. The work in [28] further adopts an IRM model to help designers compare trade-offs in hybrid memory systems, first extending the linear model in [15] to a parallel structure. Recently, some frameworks that simplify the mathematical analysis of caches have been investigated [14], [15]. This paper generalizes some of the ideas in [15], which provides a mathematical framework to study list-based caches. [28] offers another example of generalization of [15], but to list-based caches with flat and layered list topologies. Compared to the above works, our results apply to general tree-structured caches and to non-uniform access.
In the literature, [16] is among the first works to analyze the least-recently used (LRU) and CLIMB policies when an item has an access probability. The authors in [29] study optimal policies for a single-list with non-uniform access. The authors use a randomization approach to model costs as probabilities, so that items are promoted to a deeper list as a result of a cache hit. This definition is rather flexible, for example it can be extended to take into account item sizes through a notion of density [16]. However, models with non-uniform accesses are generally harder to deal with, since the access probability need not be monotonic through the list hierarchy. Other probabilistic algorithms are surveyed in [1].

II. PRELIMINARIES
A summary of the main notation used in the paper is given in Table I. We initially take the standard assumption that items have equal sizes; an analysis of the case where items have variable sizes is given in Section IX. In our cache model, h lists are arranged into a fairly general tree topology in which every subtree is disjoint. Special cases of tree topologies, such as single or linear structures, are allowed within this definition. Some possible instances of the list-based caches under consideration are given in Figure 1. We take the convention that a virtual list with index l = 0, called the root list, contains all the items outside the cache. In this way, cache misses are simply modeled as standard accesses to list 0.
The set of cached items changes over time. The time required to replace an item is assumed to be negligible compared to inter-request time, and thus treated as instantaneous. We assume that, for each list l = 1, . . . , h there exists one and only one parent list p(l) from which cache items can be promoted into list l, simultaneously demoting an item from l into p(l). A list l can be parent list for an arbitrary number of children lists and there is no requirement for the number of children to be constant. A list j without children is called a leaf list. To require that the lists are connected and rooted in list 0, we ask that recursive application of the function p(l) eventually returns the index of the root list 0, i.e., p(p(· · · p(l) · · · )) = 0.
The cache serves u independent streams, each issuing requests for item k in list l according to an exogenous Poisson arrival process with rate λ vk (l) ≥ 0, v = 1, . . . , u, k = 1, . . . , n, l = 0, . . . , h. The rates λ vk (0) for items in the virtual list 0 represent the arrival rates of requests for items that are currently out of the cache and which therefore produce cache misses.

A. Non-Uniform Access
Let the access probability c vk (l, j) be the probability that item k, currently residing in list l, is moved by the replacement policy to list j after a hit from a request issued by stream v. Matrix C vk = [c vk (l, j)], l, j = 0, . . . , h, may be seen as a discrete-time Markov chain regulating probabilistically the access to lists. In particular, the tree structure of the cache requires that c vk (l, j) > 0 whenever l = p(j) and c vk (l, j) = 0 otherwise. According to these definitions, note that c vk (l, l) = 1 − h j=1,j =l c vk (l, j) is interpreted as the probability that the item is served and remains in its current list l in the same position.
As mentioned in the introduction, and for consistency with earlier work that interprets probabilities as access costs [16], the variants of RR(m) and FIFO(m) that allow for non-uniform access are called RR-C(m) and FIFO-C(m). Based on the above definitions, we define RR-C(m), FIFO-C(m), as shown in Algorithm 1 and 2. CLIMB-C(m) follows from the above definitions as the extreme case of FIFO-C(m) and RR-C(m) when m = (1, . . . , 1) [8].

B. Motivating Example
We use a small example to illustrate the benefits of tree structures and non-uniform access. We consider n = 10 items accessed by u = 2 streams. Stream 1 can only request the first five items at rate λ 1k (j) = 0.9, ∀j, k = 1, . . . , n/2. Stream 2 can instead access the other items at rate λ 2k (j) = 1.0, ∀j, k = n/2 + 1, . . . , n. We compare several caches with identical total capacity, but with different internal structures and access probabilities. In particular, we show the effect of reserving space for Stream 2 using non-uniform access. All caches use random replacement or CLIMB and are as follows: Move the item currently at the tail of newlist(i) into list(i) at position pos(i) 7: for position j = m newlist(i) − 1, . . . , 1 do 8: Move the item at position j of newlist(i) into position j + 1 within the same list 9: end for 10: Place item i at the head of newlist(i) 11: return // Commit cache state change 12: end if 13 Table II, showing the miss rates, total and per-stream. The results indicate that, while all methods are relatively close to each other in terms of miss rate, the combination of non-uniform access probabilities and tree structure delivers the best performance. Further, the tree with uniform access is instead performing best in terms of fairness by balancing the miss rates of the two streams.
Overall, this small scale example illustrates the principle that the extra flexibility that comes with non-uniform access and non-linear cache structures can be both beneficial to tune cache performance and to differentiate performance across streams (i.e., users). Such flexibility does not exist in models with uniform access, and justify our interest for this class of models. However, the large range of possible structures and non-uniform access probabilities motivates investigating efficient performance analysis methods, to guide in the selection of the best structure for the cache in reference scenarios.

A. Markov Process
Define the cache state vector s = [s(i, j)], such that s(i, j) ∈ [1, n] is the index of the item in position i of list j, and let S be the space of cache states that are reachable from the initial state. We take the convention that m 0 = n − m represents the number of items outside the cache, i.e., the capacity of list 0, and that s(i, 0) indicates the item with the ith largest index among those outside the cache.
Let π(s) be the equilibrium probability of the continuoustime Markov chain (CTMC) modeling the chosen replacement policy. From now on, we restrict our attention to recurrent states and assume the CTMC to be irreducible. We also require that for each list l there exists at least m l items that can reach it under requests from one or more streams.
The following product-form result for π(s) generalizes similar expression in earlier work [15] by showing the relationship between the equilibrium distribution of the cache and the access probabilities.
Theorem 1: RR-C(m) and FIFO-C(m) admit the productform equilibrium distribution for all recurrent states s ∈ S, where E(m) is a normalizing constant and define γ i,0 = 1 and for all i = 1, . . . , n and j = 1, . . . , h. The proof is given in Appendix A. The result generalizes prior work on list-based caches with a linear structure to tree structures and non-uniform access probabilities.
We refer to γ ij as the access factor for item i to access list j. In prior work [15], this term specializes to γ ij = r j i in discretetime, r i being the popularity of item i. The significance of the extension is that we allow for list-and item-dependent behaviors, which are required to account for non-uniform access probabilities. Items can behave differently inside the cache, and be requested at varying rates depending on the current list, e.g., due to different fetch latency as we discuss in Section VIII, or produce different impacts on different streams, as we have shown in Table II. Compared to models such as those in [15], monotonicity properties also no longer apply, in the sense that reaching a deeper position in the list does not necessarily translate into a lower miss rate for the item, as this depends on the particular tree structure and also on the access patterns of the other items. As such, intuition on these models needs to be supported by quantitative methods to avoid pitfalls, for example in assigning list capacities. This motivates the development of specialized performance analysis techniques in the following sections.

B. Performance Measures
We now give some standard definitions for the performance measures of interest. These definitions are consistent with the ones used in prior work. Let π kl (m) be the (marginal) probability that item k is in list l, and let π k0 (m) denote the miss ratio for item k. By definition we have h l=0 π kl (m) = 1 and since at all times list l contains exactly m l items it is also n k=1 π kl (m) = m l . Further, using the product-form result (1), we can write From the definitions, the miss rate for stream v and item k may be written as is the probability that an item outside the cache is cached after a cache miss. The miss rate of item k is then M k (m) = v M vk (m) and the cache miss rate is

IV. EXACT ANALYSIS
In this section, we derive recurrence relations to calculate the performance measures. The algorithms we derive are exact and require polynomial time and space in n and m for constant h. Thus, they ease the problem that the state space S grows exponentially in size with n and m, however their cost still make them applicable only in small models. This is addressed with the approximate results in the next sections.

A. Recurrence Relation for the Normalizing Constant
We begin by conditioning on the list containing item k, so that we can recursively express the normalizing constant as for an arbitrary k = 1, . . . , n. Here, m j accounts for the permutations of the positions of item k within list j. This is a slight generalization of a similar expression obtained in [15] for uniform access models. The recurrence relation requires the boundary conditions E(0) = 1, and E(m) = 0 if either i m i > n or m j < 0 for any j.
Using (3), E(m) can be computed in O(n h i=1 (m i + 1)) time and space. Standard dynamic scaling methods may be used to alleviate floating-point range exceptions in large models [30]. Additional recurrence relations related to (3) are developed in Appendix-E in the supplemental material.

B. Exact Analysis of Marginal Probabilities
Let π k (m) = h l=1 π kl (m) be the probability that item k is cached. We begin by deriving the following relation for marginal state probabilities Theorem 2: The probability of finding item k in list l is for all items k = 1, . . . , n and lists l = 1, . . . , j, and where The proof is given in Appendix B. It is interesting to observe that (4) is similar to the arrival theorem for queueing networks [31]. It may be seen as relating the equilibrium distribution of the cache with the conditional distribution of states seen by item k while residing in list l. A similar argument holds also for the arrival theorem, which is sometimes seen as relating the equilibrium distribution of a closed product-form queueing network with the conditional distribution while a job resides at a given station [32]. We exploit this connection to develop a recurrence relation for the marginal probabilities that is similar in structure to the mean-value analysis algorithm [31].
Since n k=1 π kl (m) = m l , using (4) and solving for ξ l (m) we find Finally, plugging (6) into (4), and using that π k (m) = h l=1 π kl (m), we get the exact recurrence relation for all i = 1, . . . , n and l = 1, . . . , h. The recursion has termination conditions π il (m) = 0 if either m = 0 or min j m j < 0. Solving (7) space. Item miss rates may be easily computed by the expressions in Section III-B.

V. NORMALIZING CONSTANT APPROXIMATION
Exact recurrence relations are too expensive to apply to models of medium and large caches, because as shown in the last section their cost grows exponentially in the number of lists h. To address this issue, we study the asymptotic limit for E(m) to approximate performance measures. The main contribution is to develop, under mild assumptions, the leading term in the asymptotic expansion for E(m) when n → ∞ and m → ∞, with n/m ∼ O (1). To this aim, we use the singular perturbation method [20], which can determine the asymptotic solution of multivariate recurrence relations by means of tractable partial differential equations (PDEs).

A. Overview
Before giving the formal derivation of the asymptotics, we sketch the solution paradigm of the singular perturbation method proposed in [20] once applied in a similar fashion to our product-form solution, which presents structural similarities with the one characterizing queueing network models considered in the original paper. We point the interested reader to the tutorial in [33] for more details.
The singular perturbation method may be used to obtain approximations to multivariate recurrence relations by studying their behavior under perturbations in their parameters. In our context, without loss of generality, we focus on relation (3) with k = n and set H(m) so that it may be rewritten as We now scale the recurrence parameters as x = mε and y = nε, for some small perturbation ε, and rewrite it as which is a scaled version of H(m) in which x corresponds to list capacities and y to the number of items, and in which access factors are modeled by the g j (y) functions. To match the coefficients of the initial expression, we have assumed that we can replace γ nj with a smooth function g j (y) defined such that g j (kε) = γ kj and g j (0ε) = g j (nε), ∀j.
1) Analytical Form: Our goal is to approximate asymptotically the function h(x, y). As an example, consider the case where γ kj = g j (k) = γ j , ∀k, j, for arbitrary γ and denote the normalizing constant for this model by H 0 (m, n). Since all the items have identical access factors to each list, it is possible to show that Using Stirling approximation n! ∼ √ 2πn n e n , and scaling the variables in H 0 (m, n) we obtain an asymptotically exact form for the normalizing constant given by As (10) is one particular example of asymptotic solution, the general case needs to include an exponential dependence on 1/ε to match it. As such, we look for solutions matching the quite general form for some arbitrary functions φ(x, y), B(x, y), and D(ε) to be determined. Since (10) is an exact asymptotics, the assumed form (11) is guaranteed to converge to the exact solution in the limit of a sequence of models where n and m grow large in a constant ratio when γ kj within each list are asymptotically identical to a constant γ j . Thus, we expect the errors to be larger for finite models with small caches and where the access factors γ kj differ significantly. In spite of this, we show later in the numerical experiments that the approximation is empirically very accurate also in small caches and when the access factors are very different, due to Zipf-like item popularity.
2) Multivariate Recurrence Relations: The structure of (11) is particularly suitable for approximating linear multivariate recurrence relations such as (9), since the exponential term allows for simplifications. Plugging (11) into (9) we get We now expand both sides in powers of ε, obtaining in particular the O(1) and O(ε) terms. A typical approach of the method is to treat the terms at each order separately, equating both sides of the equation individually at each order. This relies on the observation that as ε → 0, the O() term will vanish. With this approach, at the lowest order, one notices first that by Taylor expansion where φ j (resp. φ y ) denotes partial differentiation with respect to x j (resp. y). Plugging back into (12) in the limit ε → 0 we get after simplifying the following PDE This is a PDE with linear degree with respect to the partial derivatives, which can be solved explicitly, as shown in Appendix C.
A similar derivation for the O() terms produces a more complex equation in the second-order partial derivatives, which may also be explicitly solved. The derivation relies on boundary conditions which we choose to be the analytical solution for a cache with uniform item popularity shown in (10). Once such explicit solutions are obtained, (11) is scaled back to approximate the original normalizing constant E(m).

B. Main Result
The formal derivation to obtain φ(x, y), B(x, y), and D(ε) from (9) is given in Appendix C, and leads to x j log ξ j (13) where the terms ξ j , j = 1, . . . , h satisfy We show later in Section VI-A that the terms ξ l may be seen as the limits of ξ l (m) defined in (6). The remaining coefficients are D(ε) = ε h/2 and where the matrix J = [J jl ], j, l = 1, . . . , h, is given by with δ jl = 1 if j = l, and δ jl = 0 otherwise.

C. Approximating a Finite System
Having determined the asymptotics of h(x, y), we can now use the Euler-Maclaurin formula [34] to rewrite (11) in terms of the variables m and n. For a function f (x) defined in [0, n], the Euler-Maclaurin formula may be written as Scaling variables in (14) so that the integral ranges in [0, n], the error of the Euler-Maclaurin formula is O(ε), and thus the resulting finite formula is asymptotically exact. Since we have chosen g j (0ε) ≡ g j (nε), this is given by for j = 1, . . . , h. Under uniform access, [15] obtains an expression similar to (17) as the steady-state limit of mean field ordinary differential equations. Our derivation shows that a similar result holds under non-uniform access.
Using Euler-Maclaurin we can also write The above expressions provide by (11) the asymptotic expansion of H(m), which implies under the assumptions the following expansion of the original normalizing constant where the ξ j terms are obtained from (17).

D. Performance Measures
For large m, we can expand E k (m − 1 l ) in a neighborhood of m to obtain by (18) the leading terms for all k = 1, . . . , n and l = 0, . . . , j. Additional terms in the expansions may also be used to generate higher-order approximations, which involve the derivatives of √ det C. However, In what follows, we call singular perturbation approximation (SPA) the latter method, which consists of computing the normalizing constants appearing in performance measures such as (2) and derivatives by means of (18). The approach that numerically solves (17) for the ξ l terms and then calculates performance measures by (19) is instead developed in the next section and referred to as the fixed-point iteration (FPI).

VI. FURTHER APPROXIMATIONS
In this section, we introduce three new approximations. We first derive an efficient fixed-point iteration to solve (17). We then explain how to use the recent refined mean field approximation to refine this fixed point approximation. We also develop simple bounds to quickly assess performance measures in settings where speed can be more important than high-accuracy, as in computational optimization.

A. Fixed-Point Iteration (FPI)
We now introduce a fixed-point method to obtain the ξ l variables. Let us first consider a regular perturbation expansion [33] of (7), which for large m and n, with m/n ∼ O(1), takes the form π il (m) ∼π Plugging the last expression in (7), we get at the lowest order for i = 1, . . . , n, l = 1, . . . , h. This is a non-linear system of equations with nh equations and nh unknowns. It is possible to verify that the system is solved by settinĝ where ξ Applying the regular perturbation expansion to the relation n k=1 π kl (m) = m l , and plugging (21) in the resulting expression, we see that as ε → 0 the ξ (0) l terms become solutions of (17). Since ξ (0) l is the value of ξ l corresponding to the leading terms of the expansion of π il (m), this provides support to our earlier statement that the variables ξ l appearing in the expansion of E(m) are the limits of ξ l (m) for ε → 0. A similar conclusion follows by calculating the leading term of (5) as in Section V-D.
As mentioned before, the FPI method approximates the solution of (17) by (22), and then plugs the obtained ξ l values into (19) to obtain performance measures. The iteration can be initialized with a guess of the marginal probabilities, e.g., π (0) il = 1/(h + 1). We stop it when the maximum relative change of the miss ratios π il across iterations does not exceed a user-specified tolerance (e.g., τ = 10 −6 ).
Over thousands of models considered in the experiments, none raised convergence issues. More general nonlinear equation solver may be used to find a set of values for ξ l and π l which satisfy the equations with convergence guarantees, where needed.

B. Derivation of the Refined Mean-Field Approximation (RMF)
Let us indicate withπ F P I the exact solution of (17). For uniform access model, it is possible to show thatπ F P I is also a solution of the ordinary differential equations (ODEs) obtained by mean field approximation of the Markov process underlying RR(m) [15]. This prompts us to investigate the applicability of the mean-field methods also to RR-C(m).
Mean-field approximations are asymptotic solutions that enable approximate transient and steady-state analysis with accuracy guarantees. Since transient analysis is an important concern in caching systems, we propose in this section a highly-accurate refined mean field (RMF) approximation for RR-C(m) grounded in the general framework developed in [21]. Note that if RR-C(m) and FIFO-C(m) have the same stationary distribution, their transient behavior is quite different. We do not believe that our analysis to study the transient behavior of RR-C(m) can be easily adapted to FIFO-C(m).
In RMF, the system scale is controlled by a positive parameter N . The original system is approximated by a scaled one where each item has N identical copies and the cache capacity is increased to mN . We refer to the copies of item i as the class-i items. As in the original model, class-i item are treated with the usual replacement rules of RR-C(m).
Let π (N ) il be the marginal probability for a class-i item to be in list l in the scaled model. Then it is possible to show that there exists a vector V = [V il ] such that for all items i = 1, . . . , N and lists l = 1, . . . , h. The RMF approximation consists in using π RMF il = π (N ) il + V il . A precise derivation of how to compute V is provided in Appendix J in the online supplement and here briefly summarized. Let us denote by X (N ) il (t) the fraction of class-i items that are in list l at time t (for the N -scaled model). With this notation, a class-i item from list i is exchanged with a class-j item from list l at rate N u v=1 λ iv (l)c iv (l, l )X il X jl /m l . In this expression, the factor N u v=1 λ iv (l)c iv (ll )X il is equal to the rate at which a class-i item will be promoted to list l while X jl /m l is the probability that it is an item of class j that will be demoted in exchange. Such a transition changes the vector X (N ) (t) = (X (N ) il (t)) i≤n,l≤h by adding 1/N on the its (il ) and (jl) coordinates, and by removing 1/N on its (il) and (jl ) coordinates. This implies that the family of scaled models defines what is called a density dependent population processes [35] and allows us to use the theory in [21] to define the expansion (23).
The method can also be used to study the transient regime [36], as it can be shown that the transient probability satisfies where x is the solution of the mean-field ODE given in the supplementary material (see in particular (11)) and V (t) = [V il (t)] is a time-varying vector. The quantities x MF and V (t) satisfy ODEs that can be easily computed numerically.

1) Example:
We illustrate the accuracy of the RMF approximation in transient regime by using (24) for the original model with N = 1. To compute numerically the values x MF and V , we rely on a fast software implementation. 1 The original model consists of n = 50 items, h = 5 lists, each having capacity m l = 3. There is a single stream that issues requests at Poisson rate λ il = r i , where r i is the probability of accessing item i, which is Zipf-like distributed with parameter α = 1.4. Figure 3 illustrates the accuracy of the RMF approximation in comparison with estimates of π il (t) obtained by averaging 100 cache simulations. In the latter, we use the ratio of miss count to simulation time at t, requiring in the very first samples the value to be upper bounded at the theoretical maximum in this example of 1.0. The example is typical of RMF transient accuracy and shows high precision.

C. Bound Analysis
Here we develop some simple analytical bounds on ξ l (m) and π i0 . Note that by (3) and by the definition of miss rate, we may write Bounds on ξ l (m) allow us to bound the miss rate M (m), whereas bounds on the probabilities π i0 can be used to bound the miss rates using the definition of M vk (m).

1) Bounds on ξ l (m):
It is possible to verify that where γ + l = max k γ kl and γ − l = min k γ kl . These follow from (6) by bounding γ kl with γ + l or γ − l and noting that k (1 − π k (m − 1 l )) = n − m + 1. For an important class of models, we then state a tighter lower bound. Lower bounds on ξ l (m) are the most important in practice, as they lead to pessimistic estimates of the miss rates. The bound assumptions are satisfied by a rather broad class of models including, but not limited to, models with uniform access.
Theorem 3: Consider a cache where for each pair of items i and k > i, we have γ il ≥ γ kl , ∀l. Then whereγ l = j γ jl /n is the average access factor for list l. The proof is given in Appendix-I and shows that the argument generalizes for any list where γ kl ≥ γ il implies π k (m − 1 l ) ≥ π i (m − 1 l ). If the property holds for all lists, we say that the model is monotone.
2) Bounds on π k0 (m): We now develop upper bounds on the miss rate π k0 (m). Let ξ −k j (m) be defined as ξ j (m), but for a model without item k. We are now ready to give an upper bound on the miss rates.
Theorem 4: The item miss ratios satisfy the bound . . . , n and lists l = 1, . . . , h. The proof is given in Appendix-F. Note that bound is asymptotically exact for fixed h, as it converges to (19) when n and m increase in a fixed ratio. A closed-form expression for this upper bound can be readily obtained by replacing ξ l (m) with the lower bound in either (26) or (27). Although (27) is not guaranteed to be a bound in non-monotone models, combining it with Theorem 4 still provides a useful non-iterative heuristic approximation of π k0 , as illustrated in the next example.
3) Example: Figure 4 illustrates Theorem 4 for a monotone and a non-monotone cache. Estimates are computed by replacing ξ j (m) in the bound of Theorem 4 with (27). In monotone caches, which include caches with uniform access as a special case, rank is inverse proportional to item miss rates [15]. However, this clearly does not hold in non-monotone caches, as shown in the figure. As visible from comparison against the exact value, the heuristic provides rather accurate estimates of the item miss ratio.

A. Simulation-Based Validation 1) Non-Uniform Access in Linear Caches:
Since exact numerical solution is applicable only to small models, we have performed a simulation study of RR-C(m) and FIFO-C(m) to assess the accuracy of the proposed approximations.
The Request rate is λ = 1 and item popularity is Zipf-like with parameter α, with the object ranks assigned uniformly at random within each stream. Each simulation collects 10 9 samples. From the results, we observe as expected that RR-C(m) and FIFO-C(m) have statistically indistinguishable performance, thus we discuss only RR-C(m) Let π 0k (m) be the estimated miss ratio for item k. The error metric is In small models with n = 10 items, SPA incurs the lowest mean absolute percentage error, around 0.4% while RMF is around 1.2% and FPI around 10.2%. For such instances, we have observed the maximum error of RMF and FPI to reach respectively 5% and 35.1%, whereas for SPA it is just 0.6%. We have observed the largest errors of FPI to be in the estimation of miss probabilities for high popularity items. Conversely, with n = 1000 items we observe a mean absolute percentage error of around 0.6% for the three methods. Note that we believe that, even for n = 1000, SPA and RMF should be significantly more accurate than FPI. We do not observe it here as we believe that for n = 1000, the three approximations provide estimations of π 0k (m) that are more accurate than a simulation with 10 9 samples.
Sensitivity analysis results on these experiments may be found in [22] and illustrates that the error grows slightly with the ratio n/m and α, but remains very small in magnitude, with a mean around 1% at worst for SPA. Accuracy is slightly  better for smaller values of h. The results are practically insensitive to c. Overall, the results indicate that FPI often suffices, but SPA and RMF provide more robust estimates for models far from the asymptotic regime.
2) Non-Uniform Access in Tree Caches: The previous section highlights the resilience of the approximations in capturing non-uniform access. We now focus on approximating caches with a tree structures as in Figure 5. To compare accuracy with simpler cases, we also explore the case where the cache in the figure is restricted to a single list (h = 1) or the first two lists (h = 2). The list capacities are set to m j = m/h, j = 1, . . . , h. We vary sequentially these parameters: n ∈ {10, 50, 500}, n/m ∈ {2, 4, 10}, α ∈ {0.6, 1.0, 1.4}, u ∈ {1, 4}, h ∈ {1, 2, 5}. Conversely, the access probabilities between lists are now randomized uniformly and normalized to form a valid discrete-time Markov chain. For each combination of the other parameters, we consider 30 random instances of the access probabilities. The arrival rate for stream v is set to λ vkj = vr i , where r i follows a Zipf-like distribution with parameter α. The simulations use 10 8 samples.
Approximation results are given in Figure 6. We compare FPI, which in fact gives for this system the same solution as the classical mean field approximation, the refined mean field (RMF) and SPA. The results indicate that RMF and SPA improve over FPI thanks to the additional information they carry in the equations. Overall, the average error is 4.04% for FPI, 1.28% for RMF, and 0.43% for SPA.
We have also tried to analyze the mean absolute percentage error on miss rates, i.e.
where M (m) is the estimated miss rate. For this metric, we observe over all instances 1.75% error for FPI, 0.25% for RMF, and 0.10% for SPA. This is consistent with the intuition that this metric captures average performance and therefore can be better characterized than π mape by first-order methods such as FPI. The additional terms in RMF and SPA instead capture higher-order interactions between the items more visible upon account the entire miss ratio distribution as in π mape . Since caches have typically a large number of objects, we conclude that both RMF and SPA provide an accurate way to analyze list-based caches. The main differences are that SPA has a compact expression and does not require to use ordinary differential equations (ODEs) and thus does not incur issues such as ODE stiffness or implementation-specific overheads associated to ODEs. Conversely, RMF may be used also in the transient regime, which is not accessible to SPA, since the method is derived under equilibrium assumptions.

B. Youtube Trace
Lastly, we use a trace-driven simulation to illustrate the applicability of the proposed methods to real workloads. We have simulated the 2008 Youtube trace collected in [37] and also used to validate the list-based cache models in [15], [28]. The trace consists of 611, 968 requests, spanning n = 303, 332 items and originating from u = 16, 337 client IPs. Each client IP is mapped to a single stream v.
For each simulation, we compute the item-miss rates M k , k = 1 . . . , n, and compare them with the estimates returned by FPI. In the stochastic model, each stream v is modeled as a Poisson process with rate λ v (k) = n vk /T , where n vk is the number of times item k is requested by stream v and T is the time span of the trace.
Despite the large number of items, FPI produces an estimate of all item miss rates in just 8.2 seconds on a laptop (Intel Core i5). Even though the Youtube trace is temporally-correlated, thus departing from IRM and Poisson assumptions, and about half of the items are accessed just a single time each, the mean absolute percentage error of FPI on the M k values is only 1.74%, with a maximum error of just 4.68%. Both values are compatible with the uncertainties of the simulation, in which the trace is simulated only once. Overall, the experiments indicate that our model can reliably reproduce simulation results in the presence of access probabilities and multiple request streams.

VIII. SYNCHRONOUS STREAMS
We further generalize the reference model to include synchronous streams, i.e., arrival processes where the stream awaits reply from the previous request before issuing a new one, as in pooled cache access. We assume the following reference model. Stream v can have up to s v outstanding requests. The inter-request time is the sum of an exponential time with mean σ v , representing an arbitrary delay in-between successive requests (think time), and a miss (or fetch) latency, also exponential with mean θ v0 (or θ v1 for fetch). The cache state is updated at arrival time, latency (either due to miss or due to fetch, but not both) is incurred right after.
We wish to determine the cache arrival rates, which at steady-state match the sum of hit and miss rates. We leverage FPI and SPA to develop an iterative approximation. Let Λ v = k Λ vk be the unknown total arrival rate from synchronous stream v. By Little's law, for each stream v = 1, . . . , u we have where M vk and H vk = Λ vk − M vk are the miss rate and hit rate for item k requests from stream v. Each term in the right-hand side of the above expression corresponds to the average number of requests either accumulating think think or incurring fetch latency. Denoting with H v = n k=1 H vk the total hit rate, we may rewrite the last equation as which we may solve by seeking a fixed-point for the iteration for t ≥ 0. Using a decomposition approximation, and assuming an initial guess Λ 1) Extentions: It is not difficult to generalize the proposed decomposition in two ways. Models processing both synchronous and asynchronous streams require only to include in the isolated cache model additional (asynchronous) Poisson streams. We also support models where requests, after visiting the cache, travel through queueing stations to better characterize the miss (or fetch) latency. This may be due to network transfer or queueing at the worker threads that handle the requests.
Such hybrid caching-queueing models may be seen as multi-chain closed queueing networks with state-dependent routing probabilities, determined by cache hits and misses. A cache can be treated as a topological element with no sojourn time, where given a guess of the arrival rates it is possible to use FPI and SPA to obtain miss and hit ratios. Miss and hit ratios can then be treated as routing probabilities. This leads naturally to an iterative scheme, where the routing matrix is considered fixed at each iteration, the queueing model solved with approximate MVA [38] obtaining arrival rates at the cache nodes. This yields updated miss and hit ratios, which produce the routing matrix for the next iteration. This iteration rapidly estimates the performance metrics. 2 Fig. 7. Item miss ratios for models with synchronous requests. Prediction errors for FPI and SPA obtained for FIFO-C(m) and RR-C(m) against simulation are shown in Figure 7. The overall mean percentage error Λ v is 1.64% for FPI and 0.25% for SPA. The approximation error is generally consistent with the trends observed before in the baseline (asynchronous) models. We have verified that the trends as a function of n/m and h are also consistent with previous experiments. We also note that think times do not significantly affect precision.
2) Hybrid Caching-Queueing Models: We now consider the examples in Figure 8(a)-(b), with the parameters reported in Appendix -L1 of the supplemental material. Scheduling at queues is first-come first-served in Figure 8(a), processor sharing in Figure 8(b). Each parameterization is evaluated with 216 independent models, each taking 10 7 samples, for a total of 648 simulations. Other experiments are given in Appendix -K. a) Parallel topology: We consider the model in Figure 8(a). The two streams can issue s 1 = s 2 = s synchronous requests each. We use the iterative method with tolerance 10 −3 on the maximum absolute percentage error on the arrival rates Λ iv at all nodes. The simulation uses 10 7 samples. Results are shown in Figure 9(a). Both FPI and SPA are again accurate, with errors below 4% for FPI and 2% for SPA. b) Serial topology: We now switch our attention to the model in Figure 8(b). Here, job hits or misses determine the mean service time at the queue. Results are obtained as in Fig. 9. FPI and SPA item miss ratios on the models in Figure 8. the last example and shown in Figure 9(b). Trends are similar, with errors marginally lower than under the serial topology.

IX. ACCESS AND STORAGE COSTS
In this section, we show that our results can be used also for cost analysis, which is a topic of broad interest in caching theory [1], [16], [29], [39], [40]. This problem is particularly relevant in the context of non-uniform access, in which a preference may be expressed to store some items in particular lists, based on their cost.
In the proposed models, one may naturally consider two types of costs: access costs and storage costs. Access costs relate to the costs for item movements between lists under non-uniform access patterns, while storage costs are related to holding of particular items in cache. Access costs are easily computed in our framework. Recall that π ij is the steady-state probability that item i is in list j and that λ vi (p(j))c vi (p(j), j) is the rate at which stream v triggers a move of item i from list p(i) to p(i + 1). This implies that A ij = V v=1 π ij λ vi (j)c vi (p(j), j) is the average rate at which access cost is accumulated for item i access into list j from its parent list p(j). The quantity can be readily computed with FPI, RMF, or SPA by first estimating the probabilities π ij as shown before. Other ways to account probabilistically for access costs that can be used in our models exist in the literature [16].
To determine storage costs, we assume first that each item i incurs a storage cost σ i upon residing in the cache. For example, σ i may simply be the size of item i, and the given cost constraint may capture limited storage space. We wish to evaluate the performance of RR-C(m) when list j cannot exceed a cost cap k j for its stored items. Formally, we require that the cost cap is respected for each list, i.e., mj i=1 σ s(i,j) ≤ k j , for j = 1, . . . , h. We indicate with k = (k 1 , . . . , k h ) the vector of cost caps.
We extend RR-C(m) to account for cost constraints as follows. Initially, we assume the cache to be in a feasible state that respects all the cost caps. Upon receiving a request for item i in list j that would promote it to list j , while simultaneously demoting item i , the policy checks whether the replacement would violate the cost cap for either list j or j . If this is the case, item i is served but without changing the cache state. This rule applies also to list 0, i.e., a cache miss would serve the object, but not cache it afterwards.

A. Exact Analysis
Let O ⊂ S be the reachable state for the CTMC underlying RR-C(m). Due to reversibility [41], a restriction of the state space does not break the product-form solution, which remains as in (1) but with normalizing constant It is routine to show that the new normalizing constant satisfies similar properties as before, e.g., (3) is now replaced by for arbitrary i = 1, . . . , n, with the usual termination conditions and the requirement that E(m, k) = 0 if there exists a k j < 0. The marginal probabilities may be obtained from (32) yielding an expression similar to (2). Similar expression also hold for miss rates and other performance measures. It is also possible to obtain an expression for the average cost K j for the items in list j at steady-state as We have verified that the above analysis applies similarly under a global cost constraint for the entire cache, in which case k is replaced by a scalar specifying the global cost cap. The above results hold exactly only for reversible CTMCs, hence they should be treated as approximations in the case of the FIFO-C(m) policy.

B. Approximate Analysis
As before, exact relations are efficient to use only in small caches. Approximations for larger models are quite difficult to develop under cost constraints, since the recursion on k − σ k 1 j introduces dependencies between the object sizes and the structure of the recursion graph. We may alternatively leverage the product-form solution to use Monte Carlo summation to estimate E(m, k). This process ensures rapid convergence to accurate estimates since samples are independent [42].
For a list with capacity m, we sample a set of k-permutations of n items, with k = m. The v-th sample maps to a random vector S v = (s (v, 1, 1), . . . , s(v, m h , h)), where s(v, i, j) indicates the item in position i of list j. Define q(S v ) = h j=0 mj i=1 γ s(v,i,j)j , and let the sampling distribution be uniform, i.e., p(S v ) = (n − m)!/n!. After V samples, we compute the estimator where I{·} is the indicator function. An unbiased estimator of the normalizing constant is readily given by E(m, k) Confidence intervals may be easily obtained using the Central limit theorem [42].

1) Example:
We illustrate the Monte Carlo summation approach using the tree cache with non-uniform access given in Section II-B (Model 7).The items accessed by Stream 1 have assumed to have size σ 1 = · · · = σ n/2 = 1, with n = 10. The remaining items are accessed only by Stream 2 and have sizes σ n/2+1 = · · · = σ n = 2. The cache has capacity vector m = (2, 1, 1, 2). The cost cap vector is k = (2, 1, 2, 4), which requires in particular that Stream 1 cannot place items in list 2, due to their sizes. The exact value of of the normalizing constant is computed by (32) to be E(m, k) = 7.7963, significantly different from the normalizing constant in the corresponding model without cost caps, which is E(m) = 238.7982. Figure 10 illustrates the rapid convergence of the Monte Carlo estimator E[S V ] to the true value of the normalizing constant. The curve in Figure 10(a) is obtained by averaging the value of E[S V ] over 100 independent repetitions, each with the value of V shown in the horizontal axis. The associated standard deviation is shown in Figure 10(b), indicating that with as little as V = 10 4 samples the approximation error of a single estimate of E[S V ] is typically below 15%. The result also cross-validates the correctness of (32), which we have also verified using the direct computation from the definition.

X. CONCLUSION
We have generalized models for list-based caches to include non-uniform access. After solving the Markov process for the RR-C(m) and FIFO-C(m) policies, we have developed exact and approximate formulas to efficiently analyze performance measures such as miss rates. We have then applied singular perturbation methods and a refined mean-field approximation to determine the asymptotic behavior of the cache in transient and steady-state regimes. Simulation results indicate that our approximations typically incur less than 1.5% error on large models. We have further proposed to use this result to approximate models with synchronous requests, fetch latencies, and item costs or sizes, obtaining similar levels of accuracy.

A. Proof of Product-Form Solution
Compared to the proof techniques used in prior work [15], [28], which are based on the global balance equations of the underlying Markov process, our proof gives for RR-C(m) an argument based on Kolmogorov's criterion [41].

1) RR-C(m) Policy:
Under RR-C(m), consider a state s t ∈ S, and let s t+1 ∈ S be obtained from s t after completing a request that promotes item k from list j to list j , simultaneously demoting i from list j to list j. The transition rates for RR-C(m) are then q(s t , s t+1 ) = u v=1 λ vk (j)c vk (j, j )/m j and q(s t+1 , s t ) = u v=1 λ vi (j)c vi (j, j )/m j . The same holds also for cache misses, with items that reside outside the cache being in list j = 0.With the definitions in Theorem 1, we have π(s t )q(s t , s t+1 ) = αγ kj γ ij u v=1 λ vk (j)c vk (j, j ) (33) where α = (1/m j ) l =j =j m l k=1 γ s(k,l)l . Similarly, π(s t+1 )q(s t+1 , s t ) = αγ ij γ kj u v=1 λ vi (j)c vi (j, j ) (34) If both left-hand sides of (33) and (34) are strictly positive, then they must be equal since γ ij /γ ij and γ kj /γ kj match the values of the two summations by the definitions in Theorem 1. For the same reasons, if any of the factors in zero, then both left-hand sides of (33) and (34) are zero, e.g., if the summation in (33) is zero, then γ kj in (34) is zero as it also includes this sum in its definition.
Note that the last passages are a consequence of the fact that the parent relationship are stream-independent, as otherwise there could be two streams v and v such that item k after a request from stream v reaches list j from list l = j = j and without visiting list j. In this case c v k (l, j )λ v k (l) would not be present within the summation in (33), whereas it would need to be accounted for within γ kj , as this describes a position for k that is reachable also via the promotion from l.
Applying recursively the last result, it follows that the product-form (1) for any feasible sequence of states s 0 → s 1 → · · · → s S satisfies Kolmogorov's criterion [41] π(s 0 ) S t=1 q(s t−1 , s t ) = π(s S ) S t=1 q(s t , s t−1 ). This is a necessary and sufficient condition for the Markov process to be reversible and for (1) to be its equilibrium distribution [41].
2) FIFO-C(m) Policy: In the case of FIFO-C(m), the process is not reversible. However, it is sufficient to plug (1) in the global balance equations to verify the statement. The full derivation is given in Appendix -D in the online supplement.

B. Proof of Theorem 2
It is possible to verify with a little algebra that Applying this relationship to (3) and using (5) (12) in powers of ε, we obtain at the lowest order 1 = e −φy + j g j (y)e −φy−φj , where φ j (resp. φ y ) denotes partial differentiation with respect to x j (resp. y). Multiplying both sides by e φy and simplifying yields 1 − e φy + h j=1 g j (y)e −φj = 0. Except for the sign in front of φ y , this is the eikonal equation studied in [20]. The method of characteristics for non-linear PDEs [43] then yields dy = −e φy dt (37a) dx j = −g j (y)e −φj dt (37b) dφ = φ y dy + j φ j dx j (37c) where all variables are intended as functions of (t, ξ), with t being the integration variable over the characteristic curves and let ξ = (ξ 1 , . . . , ξ h ) parameterize the family of curves. To solve this system, we follow a strategy similar to the one used in [20] to analyze normalizing constants in queueing networks; our strategy differs mainly at the boundary condition. We first solve (37e) as φ j = log ξ j , where we have set the integration constant to log ξ j , j = 1, . . . , h. Multiplying (37d) by (37a) after dividing both by dt, and integrating we find under the initial conditions φ = 0, x = 0, and y = 0 if t = 0, that φ y = log(1 + j g j (y)ξ j ), where intermediate integration constants need to be set to unity for consistency with the eikonal equation. Plugging the last result in (37a) and integrating we obtain t, after which (37b) yields (14). Using the above results, (37c) integrates to (13).
3) Boundary Condition: At the boundary, we choose to match the arbitrary function b(ξ 1 , . . . , ξ h ) to the value of H(m) when γ kj = g j (k) = γ j , ∀k, j, for arbitrary γ j . This is simply given by (10). For small x and y, we note that (14) gives x j = ξ j γ j y(1 + h l=1 γ l ξ l ) −1 , implying ξ j = x j (y − x) −1 γ −1 j upon summing over j. Substituting the expression of ξ j in (13) it is possible to verify that φ 0 is matched. We then require Using this result we compute det J and determine with a little algebra the required value of b (ξ 1 , . . . , ξ h ), which yields (15). Lastly, to match (10), we need to set D(ε) = ε h/2 .