Density Estimators of the Cumulative Reward Up to a Hitting Time to a Rarely Visited Set of a Regenerative System

For a regenerative process, we propose various estimators of the density function of the cumulative reward up to hitting a rarely visited set of states. The approaches exploit existing weak-convergence results for the hitting-time distribution, and we apply simulation (often with previously developed importance samplers for estimating the mean) to estimate parameters of the limiting distribution. We also combine these ideas with kernel methods. Numerical results from simulation experiments show the effectiveness of the estimators.


Introduction
A regenerative stochastic process on a state space S possesses an increasing sequence of time points at which the process "probabilistically restarts" [9].An example is the queue-length process (including any customer in service) of a stable GI/GI/1 queue, with the starts of busy periods being regeneration times [9, Example 1.2.2].Suppose that there is a set A ⊂ S of rarely visited states (e.g., large queue lengths in a stable queue), and for a given reward function on the state space 1 S , consider the cumulative reward R up to first hitting A .Our goal is to estimate the density function f of R.
One reason for estimating f is that an analyst may glean important features of the distribution of R from a graph of its density, arguably more easily than from a plot of its cumulative distribution function (CDF) F. Another motivation arises in quantile estimation [4].A central limit theorem for the estimator of the q-quantile ξ = F −1 (q) for 0 < q < 1 has an asymptotic variance that includes f (ξ ), so constructing a confidence interval for ξ often entails estimating f (ξ ).
When sampling independent and identically distributed (i.i.d.) copies of R from F via crude simulation (CS), we can estimate the density f through kernel methods [17], a class of nonparametric techniques for function estimation.But since A is rarely visited, generating an observation of R from F can be computationally costly as it entails simulating the process for a typically large number of transitions.Rareevent simulation techniques, such as importance sampling (IS), have exploited regenerative structure to obtain efficient estimators of the mean E[R] (e.g., [5] and Chapter VI of [1]).Rather than replicating i.i.d.copies of R from F, the IS methods instead sample cycles from a different distribution than the original, unbiasing results by multiplying by a correction factor, the likelihood ratio.Compared to their CS counterparts, the IS estimators of E[R] can have much smaller variance, with also substantially less computation cost.But ordinary kernel methods are then no longer applicable to estimate f because i.i.d.replicates of R from F are not available.
We now consider estimating f by employing ideas from [6], which take advantage of known weak-convergence results based on regenerative properties in an asymptotic regime in which visiting the set A becomes rarer.These limit theorems [10] generalizes a classical result of Rényi [10, p. 3] establishing that for a geometric sum (i.e., the sum of a geometrically distributed number of i.i.d.nonnegative random variables with finite mean), the ratio of the sum and its mean converges weakly to an exponential as the geometric's parameter p shrinks to 0. Then simulation is applied to estimate the limiting distribution's parameters, where those related to rare events are dealt with via existing IS methods for E [R].Applying this approach, [6] develops so-called exponential and convolution estimators for the CDF F of R, along with its q-quantile ξ = F −1 (q) and the conditional tail expec- , the latter two being frequently employed risk measures in finance, where ξ (resp., γ) is known as the value-at-risk (resp., conditional valueat-risk); see, e.g., [8].Our current paper extends the methods of [6] to handle the density f , and also further combines the approaches with IS kernel techniques [11].
The rest of the paper proceeds as follows.Section 2 reviews (ordinary) kernel density estimation.We then describe the assumed regenerative structure in Section 3, and Section 4 explains the asymptotic regime for the weak-convergence results upon which our density estimators are based.Sections 5 and 6 extend the exponential and convolution estimators, respectively, of [6] to instead estimate densities.We then combine the convolution estimator with kernels in Section 7. Section 8 presents numerical results, and we give concluding remarks in Section 9.

Review of kernel density estimation
Consider a stochastic process X = [X(t) : t ≥ 0] evolving on a state space S ⊂ ℜ d .Let T = inf{t : X(t) ∈ A } be the hitting time (or first passage time) to a set A ⊂ S , and let R = T 0 r(X(t)) dt be the cumulative reward up to T for a reward function r : S → ℜ + , where ℜ + is the set of nonnegative real numbers.Let F be the CDF of R; i.e., F(x) = P(R ≤ x), x ∈ ℜ.Under the assumption that F has a density f (with respect to Lebesgue measure), the goal is to estimate f .We first review estimating f via kernel methods.Suppose that R 1 , R 2 , . . ., R n are n i.i.d.observations from CDF F. We can estimate F through the empirical CDF F n defined by where I (•) is the indicator function, which takes value 1 (resp., 0) when its argument is true (resp., false).Thus, F n assigns a mass of size 1/n to each observed R i , and F n (x) is the fraction of the n data points that are at most x.As the derivative of F, the density for a small but fixed δ > 0 known alternatively as the bandwidth, window width, or smoothing parameter.
We can obtain other estimators of f by replacing k U [−1,1) in (3) with any density or more generally a kernel k : ℜ → ℜ, which is a function with ℜ k(u) du = 1; e.g., see [17,Section 2.4].This then leads to the kernel density estimator f n,k,δ (with kernel k) defined by where k δ (u) = (1/δ )k(u/δ ) is the scaled kernel, and δ > 0 is the bandwidth.When k is a density, the mean (resp., variance) of k δ equals that of k multiplied by δ (resp., δ 2 ), and the kernel estimator ( 4) is a mixture of n scaled densities k δ , each translated by an R i and with equal mixture weight 1/n.In practice, δ = δ n is usually chosen as a function of the sample size n such that δ n → 0 and nδ n → ∞ as n → ∞.Kernel density estimators are generally biased, with the bias shrinking to 0 as δ n → 0. (Chapter 5 of [17] also discusses adaptive estimators of f with bandwidth depending on x or the distances between the data points, but for simplicity, we do not consider these variants.) The kernel is often taken as a density function that is symmetric (about the origin).Although these properties are not required generally [17, Section 3.6.1],we will simplify our discussion by assuming that k is a density (i.e., k(u) ≥ 0 for all u ∈ ℜ and ℜ k(u) du = 1) but not necessarily symmetric.In addition to k U [−1,1) (u), other common choices for a symmetric kernel k include the Gaussian kernel k N (u) = φ (u) = (2π) −1/2 e −u 2 /2 (with CDF Φ), and the Epanechnikov kernel [17, p. 42].
Using a symmetric kernel k in (4) can lead to f n,k,δ assigning positive mass to sets of negative values (i.e., 0 −∞ f n,k,δ (x) dx > 0) when the bandwidth δ is sufficiently large.This may be undesirable when the true density f of R has f (x) = 0 for all x < 0, as is the case under our assumption that the reward function r(•) ≥ 0. One possible way to address this is to truncate f n,k,δ so that f n,k,δ (x) = 0 for x < 0, but then f n,k,δ may not be a density as it may integrate to less than 1; [17], Section 2.10, discusses other issues also arising from truncation.[2] suggests instead choosing what we call a positive-support kernel, i.e., a density function and the exponential kernel k E (u) = e −u I (u ≥ 0).When r(•) ≥ 0, positive-support kernels always result in ∞ 0 f n,k,δ (x) dx = 1, so f n,k,δ is a density.But the lack of symmetry of k can lead to a less statistically efficient estimator of f .For a positivesupport kernel k and sample size n, [2] derives the bandwidth δ = δ n (also depending on k and f ) that asymptotically (as n → ∞) minimizes the mean integrated squared error (MISE) E[ ( f n,k,δ (x) − f (x)) 2 dx] of the kernel density estimator.The optimal δ n shrinks at a rate of order n −1/3 as n → ∞, leading to the MISE optimally decreasing at a rate of order n −2/3 , worse than the order n −4/5 optimal MISE rate for symmetric kernels with optimal bandwidth of order n −1/5 ; e.g., see Section 3.3.2 of [17].

Regenerative process
A drawback of the kernel estimators of Section 2 is that generating i.i.d.copies of R from F can be computationally expensive when A is rarely visited.To obtain more efficient methods for estimating f , we will now impose additional structure on the process X = [X(t) : t ≥ 0], which will enable us to adapt approaches from [6].Specifically, we first approximate the CDF F of R using existing weak-convergence results [10] in an asymptotic regime where visits to the set A become rarer (in the sense of (7) below).We then apply simulation to estimate the limiting distribution's unknown parameters, some of which relate to rare events, and we will exploit previously developed IS methods designed to efficiently estimate E[R] to handle such parameters.
To accomplish this, we now assume that X is a (non-delayed) regenerative process [9].Thus, X has a sequence of regeneration times 0 = Γ 0 < Γ 1 < Γ 2 < • • • , so the process "probabilistically restarts" at each Γ i .An example of a regenerative process is a positive-recurrent continuous-time Markov chain (CTMC) on a discrete state space S with a fixed starting state x 0 ∈ S , and successive entry times to x 0 form a regeneration sequence [9, Example 2.1 on p. 15].
For each i ≥ 1, define is a sequence of i.i.d.pairs of cycle lengths and reward processes during cycles.For i ≥ 1, let T i = inf{t ≥ 0 : X(Γ i−1 + t) ∈ A } be the time elapsing after Γ i−1 until the next hit to A .For x, y ∈ ℜ, let x ∧y = min(x, y) and x ∨y = max(x, y).
) dt be the reward accrued during the ith cycle up to hitting A or the end of the cycle, whichever occurs first.The regenerative property ensures that We can give a stochastically equivalent representation for the cumulative reward R up to time T in terms of independent quantities.To do this, let W 1 ,W 2 , . . .be i.i.d., each with CDF G W (x) = P(Y ≤ x | τ < T ), so W i is distributed as the reward in a cycle that does not hit A .Independent of the W i , further define M as a geometric random variable with parameter p = P(T < τ) (and support starting from 0), so P(M = l) = (1 − p) l p for l = 0, 1, 2, . ... Independent of M and the W i , additionally let V be a random variable with CDF H defined by V is distributed as the reward in a cycle up to hitting A given that T < τ.For G as the CDF of the geometric sum S = ∑ M i=1 W i , the regenerative property of X implies where = denotes equality in distribution, and ∼ means "distributed as".Thus, the independence of S and V ensures that F = G H, where is the convolution operator; i.e., G H(x) = G(x − y) dH(y).
If p > 0, then the expectation µ of the cumulative reward R up to T satisfies where ζ and p = E[I (T < τ)] are means of "cycle-based random variables" (i.e., measurable with respect to the sigma-field of X up to τ); e.g., see [7] and [5].

Asymptotic regime
We will exploit existing generalizations [10] of a classical result of Rényi [10, p. 3] establishing that the ratio of a geometric sum and its mean converges weakly to an exponential as the geometric's parameter p shrinks to 0. For a theoretical framework to accommodate this, we parameterize the problem by introducing a rarity parameter ε > 0 and examine the behavior of R ≡ R ε and S ≡ S ε as ε → 0, where we assume that p ≡ p ε → 0 as ε → 0, (7) which is what we meant before by saying that the set A is "rarely visited".We next describe two stochastic models satisfying (7).
Example 1 For a stable GI/GI/1 queue, let X(t) be the total number of customers in the system at time t ≥ 0, where the first customer arrives at time t = 0 to an empty system.Thus, the state space is S = {0, 1, 2, . ..}, and the process X is regenerative, regenerations occurring when a customer arrives to an empty system [9, pp.[16][17].
. .} be the set of states in which there are at least b ε ≡ 1/ε customers in the system, so for ε > 0 small, T represents the first time when there is a large number b ε of customers in the system.Theorem 1 of [15] shows that (7) then holds.Example 2 A highly reliable Markovian system (HRMS) comprises a fixed number of components, each with exponentially distributed failure times, and a finite number of repairpersons, who fix each failed component in an exponentially distributed time.Each state in the (discrete) state space S keeps track of the set of currently failed components, as well as any other necessary information, e.g., about queueing of failed components.At time t = 0, all components are operational, and the resulting process X on S is a positive-recurrent CTMC, making it regenerative.
The system fails when certain combinations of components are down, and let A denote the set of those states, so T is the first time to system failure.The rarity of system failures results from having failure rates as positive powers of ε and repair rates as constants.Then [16] provides conditions guaranteeing (7).
In Example 1, rarity arises by letting the set A = A ε recede from state 0 as ε shrinks, but the transition dynamics remain unchanged.In contrast, the set A in Example 2 does not change as ε shrinks, but rarity instead comes from the failure rates becoming smaller with fixed repair rates.As we are now actually considering a family of models indexed by ε, we sometimes (but not always) add a subscript ε in our notation (e.g., S = S ε , W i = W ε,i , or F = F ε ) to emphasize the dependence on ε.
To try to apply Rényi's theorem for geometric sums in our regenerative setting, consider 5).Rényi's result covers the case when the summands' distribution remains fixed as p ε shrinks, but our Examples 1 and 2 violate this assumption.[10] provides generalizations to address this when (7) holds.For example, the conditions in Theorem 3.2.5 of [10] ensure that the ratio of S ε to its mean η ε = E ε [S ε ] converges weakly to an exponential: for all y ∈ ℜ, where x + = max(x, 0), x ∈ ℜ.If the cumulative reward R ε = S ε +V ε in (5) further has only a "negligible" contribution from V ε (e.g., ), then we also have where µ ε = E ε [R ε ]; e.g., see Theorems 3.2.5 and 3.4.1 of [10] for specific assumptions.[12] give an example (Section 8 provides a simpler model) where V ε is not negligible relative to R ε , so (8) holds more generally than (9).[6] develops simulation methods that exploit (8) or ( 9), and we next extend those approaches to construct density estimators.

Exponential estimator of the density f
The weak convergence in (9) suggests that for fixed but small ε > 0, which we call the exponential CDF approximation.This suggests approximating the density f of F by [6] uses (10) to develop simulation estimators of the CDF F and associated risk measures, and we now adapt this to estimate the density f .Dropping the subscript ε to simplify notation, we next construct a simulation estimator µ n of µ in (11).To do this, we apply measure-specific importance sampling (MSIS; [7]), which independently estimates the numerator and denominator in (6) using crude simulation and importance sampling, respectively.Let n be our computation budget of the total number of cycles to simulate, and for a fixed user-specified parameter β ∈ (0, 1), we simulate n CS ≡ β n (resp., n IS ≡ n − n CS ) cycles using CS (resp., IS), where • denotes the floor (round-down) function.[7] selects β to minimize the asymptotic variance (in the corresponding central limit theorem as n → ∞ for fixed ε > 0) of the resulting overall estimator of µ.To estimate the numerator ζ = E[D] in (6), we generate D i , i = 1, 2, . . ., n CS , as n CS i.i.d.copies of D sampled using CS.Generating each D i entails simulating a cycle until either it ends or A is hit, whichever occurs first.A CS estimator of ζ is then Independently of the simulation runs employed to construct ζ n in (12), we use IS to estimate the denominator p = P(T < τ) in (6) as follows.Applying a change of measure leads to where P (resp., E ) denotes the probability measure (resp., expectation) under IS, with P absolutely continuous with respect to P , and L = dP/dP is the resulting likelihood ratio.The representation (13) motivates the following approach to estimate p.Let (I (T i < τ i ), D i , L i ), i = 1, 2, . . ., n IS , be i.i.d.copies of the cycle-based random vector (I (T < τ), D, L) generated via IS.Then an IS estimator of p is which is unbiased.Choosing P appropriately, which is problem-specific, can lead to p n having much smaller variance than its CS counterpart, but a poorly selected P can lead to larger (or even infinite) variance; see [1], Section V.1 and Chapter VI, and [14] for further details on IS and particular approaches for various settings, including those in Examples 1 and 2. We combine the estimators ζ n from ( 12) and p n from ( 14) to obtain the MSIS estimator of µ in (6) as Putting µ n into (11) results in the following.Proposition 1 For a reward function r : S → ℜ + , the exponential density estimator f exp,n of f based on (11) satisfies where µ n is from (15).

Convolution estimator of the density f
With the rarity parameter ε reintroduced, the exponential approximation in (11) essentially assumes that V ε makes a negligible contribution to R ε in (5).But for stochastic models in which this is not the case (e.g., see Section 8 and [12]), [6] gives other estimators for the CDF and risk measures of R ε based on a convolution arising from (5) to more explicitly account for the contribution of V ε to R ε .The resulting so-called convolution estimators can apply more generally than the exponential estimators in Section 5, with the convolution estimators often having less bias, especially when ε is not so small, which is useful in practice because actual systems have a fixed ε > 0. We next extend this convolution approach to construct a density estimator.Dropping again the subscript ε to simplify notation, we have that (5) implies that the density assuming that G has density g (with respect to Lebesgue measure); e.g., see [3], eq.(20.37).
To develop an estimator of the density f based on (17), we will use (8) to approximate the density g of S by an exponential with mean η.To construct an estimator of H, [6] again applies MSIS with total sample size n and a user-specified allocation parameter β ∈ (0, 1), as in Section 5. Recall that H is also the conditional CDF of D given that T < τ and p = P(T < τ), so we have through a change of measure, where as in Section 5, E is the expectation operator under IS, and L is the corresponding likelihood ratio.We previously obtained in (14) an IS estimator p n of the denominator p of (18).From the same IS data used in (14), we then construct an IS estimator H n of H as Section 7 will consider instead replacing ( 19) with an IS kernel estimator of H.We next develop an estimator of g in ( 17) by extending ideas from [6].In ( 8), Wald's identity [13,Theorem 3.3.2] where (Y i , I (τ i < T i )), i = 1, 2, . . ., n CS , are i.i.d.copies of the cycle-based random vector (Y, I (τ < T )) generated under CS.This then results in as an MSIS estimator of η.Recalling (8), we obtain a parametric estimator of the exponential approximation The convolution estimator of the density f (x) in ( 17) is then and the following works out an expression for f ,n (x).
Proposition 2 For a reward function r : S → ℜ + , the convolution estimator f ,n in (22) of the density f of R satisfies where p n is from ( 14) and η n is from (20).

Convolution-kernel estimator of the density f
The IS estimator H n in (19) of the CDF H of V in (5) has a point mass at each D i , i = 1, 2, . . ., n IS .Assuming that H has a density h, we now devise an IS kernel [11] estimator of h, which we then convolve with the exponential density estimator g exp,n in (21) of the geometric sum S to obtain a convolution-kernel estimator of the density f of R D = S +V .As h(y) = lim δ →0 [H(y + δ ) − H(y − δ )]/(2δ ), this suggests using a bandwidth δ > 0 to estimate h by the finite difference Then just as the ordinary kernel estimator (4) generalizes (3), replacing Recalling the convolution for f in (17), we next consider a convolution-kernel density estimator of f , with g exp,n from (21).Proposition 3 below will derive an expression for f • ,n,k,δ .To do this, let ψ k (θ ) = e θ u k(u) du, θ ∈ ℜ, be the moment generating function (MGF) of a kernel k.Also, for z ∈ ℜ, define the lower incomplete (or partial) moment-generating function (LIMGF) [18] of k as While the MGF ψ k (θ ) for a particular k may be infinite for some θ , the LIMGF always satisfies du ≤ e θ z since k is a density.We will later give simple expressions for ψ ↓ k (θ , z) for particular choices of densities k.Proposition 3 Consider a reward function r : S → ℜ + , a kernel function k that is a density, and any bandwidth δ > 0. The convolution-kernel estimator f for LIMGF ψ ↓ k (•, •) in (26), where p n is from ( 14) and η n is from (20).
Evaluating f • ,n,k,δ (x) in (28) requires computing ψ ↓ k (θ , z), which we need to do for only θ ≥ 0 because r(•) ≥ 0 implies δ / η n ≥ 0 in (28).We now give the LIMFG ψ ↓ k (θ , z) for some particular choices of kernel k.The uniform Similar to the discussion in the last paragraph of Section 2, choosing a symmetric kernel k can lead to h • n,k,δ in (24) assigning positive mass to sets of negative values (i.e., 0 −∞ h • n,k,δ (y) dy > 0) when the bandwidth δ is sufficiently large.Thus, (25) then shows that 0 x=−∞ f • ,n,k,δ (x) dx > 0 is possible.This may be undesirable when the true density f of R has f (x) = 0 for all x < 0, as is the case under our assumption that the reward function r(•) ≥ 0. Selecting instead a positive-support kernel avoids these issues, but can lead to a statistically less-efficient estimator of f ; see the last paragraph of Section 2.

Numerical results
We next provide numerical results comparing the previous sections' estimators of the density f of the cumulative reward R up to a hitting time T to a set A for a simple tractable model.A variation of a 3-state CTMC of [7], our model is a semi-Markov process (SMP; Section 4.8 of [13]) X = [X(t) : t ≥ 0] on state space S = {0, 1, 2} and A = {2}.The SMP starts in state 0, and returns to state 0 are regenerations of X. Figure 1 shows the transition probabilities of the embedded discrete-time Markov chain (DTMC) Z = [Z j : j = 0, 1, 2, . ..] of X. From state 0, the embedded DTMC moves to state 1 with probability p ∈ (0, 1), and returns to state 0 (regeneration) with probability 1 − p. From state 1, the DTMC goes to state 2 with probability 1, so P(T < τ) = p.(To make the embedded DTMC regenerative, we also specify the probability of going from state 2 to state 0 to be 1.)Let E (λ ) denote an exponential distribution with rate λ > 0, and U (a, b) be a uniform distribution on an interval (a, b) with a < b.Given Z, the holding times in each state Z j visited are independent, where successive holding times for visits to state 0 (resp., 1) are E (λ 0 ) for some λ 0 > 0 (resp., U (a, b) for some 0 ≤ a < b < ∞).(While the holding times in state 2 do not matter for the process up to T , we define its distribution as E (λ 2 ) for some λ 2 > 0.) 0 Figure 1: The edge labels are the transition probabilities of the embedded DTMC Z of an SMP X, with the holding-time distribution to each visit to a state below that state, where w 0 , w 1 , and λ 2 are constants.
Define the reward function r(•) ≡ 1, so the cumulative reward R is the hitting time T to state 2. The SMP X has only two possible types of paths for T ∧ τ, corresponding to paths of Z with 0 → 0 and 0 → 1 → 2. Let M be the number of cycles completed before the first visit to A , so M is geometric with parameter p and support {0, 1, 2, . ..}; i.e., P(M = m) = (1 − p) m p for m ≥ 0. Let A 1 , A 2 , . . .be the successive holding times in state 0, where each A i ∼ E (λ 0 ).Also, let B ∼ To incorporate the asymptotic regime of (7) into the model, we make visiting A before regenerating rare by having p = ε for small ε > 0. We further set the exponential holding time in state 0 to have rate λ 0 = ε w 0 for a constant w 0 ≥ 0, and the holding time in state 1 is uniform on (a, b) = (0, ε −w 1 ) for w 1 ≥ 0. Varying (w 0 , w 1 ) allows investigating the viability of the various estimators.The mean of S (resp., and E[V ] are of respective orders ε −w 1 and ε − max(w 0 ,w 1 ) as ε → 0. Hence, if w 0 + 1 > w 1 (resp., w 0 + 1 < w 1 ), S and S typically make the dominant (resp., a negligible) contribution to R = S +V = S +B compared to B and V .If w 0 +1 = w 1 , then S, S , V , and B each typically contribute comparably to R. Thus, as we will see in Figure 2, the weak convergence in (9) does not hold when w 0 + 1 ≤ w 1 , but ( 8) is still valid.The exponential estimator of Section 5 is then only appropriate for w 0 + 1 > w 1 , but the convolution estimators of Sections 6 and 7 are applicable for all w 0 and w 1 as ε → 0.
For w 0 = 1 and w 1 ∈ {1, 2}, Figure 2 plots the exact density f in (31), the exponential estimator ("Exp.est.") f exp,n in ( 16), the convolution estimator ("Conv") f ,n in (23), and the convolution-kernel estimator ("Conv:k") f • ,n,k,δ in (28), for k as the Gaussian kernel k N , the U [0, 2) kernel k U [0,2) , or the exponential kernel k E (Section 2).All of the estimators apply MSIS (Section 5), where for IS, we change p to p = 0.8 for Z (as in [7]) and use the original CDFs (given Z) for the holding times.All estimators are based on simulating a total of n = n CS + n IS = 10 5 independent cycles, where we determined the MSIS allocation parameter β (used in n CS = β n ) to approximately minimize the variance of the MSIS estimator µ n in (15) of µ, the variance estimated from a small pilot run.While the end of Section 2 mentions asymptotically optimal bandwidths δ = δ n of order n −1/5 (resp., n −1/3 ) for symmetric (resp., positive-support) kernels to minimize MISE, such asymptotics may take effect in practice for only enormous sample sizes, especially when kernels are combined with IS.Instead, our kernel estimators used a bandwidth δ = 500/ √ n IS , chosen based on ad-hoc experiments to avoid oversmoothing.The top panels of Figure 2 have w 1 = 1 < w 0 + 1, so the exponential and convolution (both without and with kernels) estimators are all valid: the estimators get closer to the exact density as ε shrinks from 0.3 (left panel) to 10 −2 (right panel).For the larger ε, the true density has a mode at a positive x, which the exponential estimator does not capture, but the convolution estimators do, although placing the mode somewhat to the right of the true mode.For the smaller ε, the true mode's location moves left, making it virtually indistinguishable from the origin, which is the mode of the exponential estimator.Figure 2's bottom panels have w 1 = 2 = w 0 + 1, invalidating the exponential estimator, but the convolution estimators (without and with kernels) are all still valid as ε → 0. The convolution estimators get closer to f as ε shrinks from 0.3 (left panel) to 10 −2 (right panel), but the exponential estimator does not.
Zooming in on the left panels of Figure 2 reveals that the Conv:k N estimators have been truncated at the origin as they assign positive mass to negative values (see the last paragraph of Section 7).In contrast, the estimators with positive-support kernels k U [0,2) and k E do not require truncation. in line with the theory (last paragraph of Section 2) for ordinary (non-IS) kernel estimators.The right panel shows that for k N , varying the bandwidth δ can lead to different errors, where δ = 500 results in some difference (with some improvement for some small x) compared to the non-kernel convolution estimator.

Concluding remarks
We provided density estimators of the cumulative reward R = S + V until hitting a rare set A of a regenerative process X.The estimators exploit known weakconvergence results that show that S or R divided by its mean converges to an exponential, which [6] uses to construct exponential and convolution estimators of the CDF of R and related risk measures.We extended these approaches to estimate the density and further incorporated kernels.We introduced a simple model for which numerical results show that our density estimators can do well, and also  demonstrate when the exponential estimator can lead to poor results.( [12] considers another more complicated model that could also be used.)Further work is needed to determine how to choose the bandwidth δ for the convolution-kernel estimator.
F B = U (a, b) be the holding time in the first visit to state 1, so R = S + V with S = ∑ M i=1 A i and V = A M+1 + B in (5).To work out the CDF F of R, rewrite R as R = S + B, where S = ∑ M+1 i=1 A i ∼ E (pλ 0 ) [10, p. 7].As a consequence, F is the convolution of E (pλ 0 ) and F B .The density f B of F B = U (a, b) is given by f B (y) = I (y ∈ (a, b))/(b − a), so the density f of R is f (x) = I (x ≥ 0) x 0 pλ 0 e −pλ 0 (x−y) f B (y) dy = I (x ≥ 0) b − a e −pλ 0 x e pλ 0 (b∧x) − e pλ 0 (a∧x) .