A Bounding Technique for Probabilistic PERT

. Bounding time distributions has been an eﬀective way of improvements of the original PERT method. Analytical enhancement of PERT is often an eﬀective time bounding approach. However, one thing that is missing today is a combination of time distributions which parameters can be eﬀectively obtained empirically and the eﬀective bounding technique for them. We aim at addressing this gap and suggest Cornish-Fisher expansion (CFE) to compute time bounds in formal models like the classical PERT. We argue that CFE allows us to evaluate analytically approximate time bounds easily without resort to simulations. This bounding approach is useful in case of complex distribution functions of task durations, because analytical derivation of project completion time distribution is tedious. Our example shows CFE usage for uniform time distributions and comparison with time bounds of classical PERT.


Introduction
Project Evaluation and Review Technique (PERT) [1] (classical PERT) is known today as a method of project time evaluation.The principal idea is to perform a probabilistic analysis of the project completion time.Unfortunately, there has been a lot of criticism of the classical PERT due to its model assumptions, e.g.[2].One of the most unfortunate assumption is use of central limit theorem (CLT) to approximate path durations with Gaussian distribution, regardless of distribution function (DF) of individual activity times.CLT appeals to be an easy solution to the complex problem of project time evaluation in a stochastic activity network (SAN).The alternative is aggregation of random time variables along the paths.CLT introduces ambiguity about initial time distributions of tasks in classical PERT, i.e. they had initially beta DF, but going back from the resulting normal distribution of the whole project time we can assume that they could be marginally normal distributions.In this way, there is little use of constructing initial beta distribution of tasks.There are a number of equivalent conditions for data in order that CLT can be applied in the theory [3].Goman [4] has analyzed the applicability of CLT for classical PERT problems and confirmed the need to verify the CLT conditions for the given data in order to have consistent and reasonable results of classical PERT analysis.
There have been many attempts at using times with Gaussian DF in PERT analysis [5].Choice of the normal DF was due to simplified modelling and calculations, especially for multivariate case.Computational difficulties called attention to normal distributions many decades ago because manipulation of arbitrary distribution functions (DFs) was hard due to slow computers and lack of readily available software tools for that.Computation time for a project SAN is not an issue today for any realistic size of PERT SAN.Aggregation of known DFs can be done fast using modelling.Moreover, the choice of normal distribution needs more explanations for its negative range and infinite tails of time values.However, this is not only about assumptions.Basic principles of management are even more important, e.g.what kind of the outcome should the analysis produce to a decision maker (DM) in project management?DM can not expect exact time estimation of possible project time due to unreliable initial time estimations, unknown real distribution types and other uncertainties in the project model.These distributions are only artefacts of the model and follow from some assumptions.In fact, the only thing possible about prospective time analysis is reduction of uncertainty, i.e. more or less accurate estimation of time.The resulting time distribution is obtained from complex aggregation of time distributions of all project activities and can not be verified until certain time in the future.Therefore, any mathematically precise derivations of the resulting distribution are not very vital.Instead, a good guess of the form of the distribution seems more appropriate.
In reality, DM does not need a good approximation of time DF, but a good estimation of time bounds (intervals) for activity start and end times.A good example is critical path method (CPM) that is considered useful today and referred to in textbooks.Although deterministic, it shows the information that a project manager needs and a schedule can be build on it.The same is true for different possible project events like milestones.Because the form of the resulting distribution can be diverse (skewed, non-continuous, discrete), bounds that reflect probability density function (PDF) concentration can be of more value for the DM.A bounding technique that returns quantiles of DF as bounds with certain reliability level can help the DM to manage project in the circumstances of uncertainty.In particular, it can be simple and useful in the pre-project estimations and in the very beginning of a project when lack of empirical data (related to the current project) is an issue.Additionally, a good bounding method should be simpler than derivation of a precise DF or application of a modelling technique.
Analytical bounding techniques have long history [6][7][8][9][10][11].There are bounds for PDF and for expected value of the project time distribution.Classical PERT also produces a lower bound of expected value of project completion time.Moreover, classical PERT implies beta distributions to express uncertain time estimations.There is no practically proven method today that can easily operationally obtain time estimations with normal, exponential or beta time distribution.On the opposite, simpler distributions like uniform and triangular are possible to obtain from experts and verify the estimations (e.g.[12]).Analytical bounding is still appealing for its completeness.One of the goals of deterministic CPM method is determination of time bounds of earliest and latest start and end times for each activity in SAN.As DF assumptions are predefined and SAN is given, it should allow computational complexity of O(n) to compute (or recompute) n activity times like those of CPM without modelling for the stochastic PERT problem.
Consideration of uniform or triangular distributions instead of beta distribution are new in this context since they almost have not been considered for PERT problems in the last 30 years.These DFs do not have many useful mathematical properties that simplify aggregation in PERT analysis.However, it is easier to obtain their parameters in practice [12,13].Johnson [13] has shown that simpler and intuitively obvious triangular distribution can be very close to beta DF and proposed a procedure for its parameter estimation.
In order to set up a basis for classical PERT improvement, we would like to address one piece of the aggregation task in this paper.This is aggregation of a number of serial activity durations for known identical independent distributions (iid.) of activity times.We suggest an analytical bounds for random time aggregation in order to solve this problem in general for different possible activity time distributions.This technique can be applied to consecutive tasks or to paths for their comparison (paths need to be enumerated first).We believe, that it is possible to extend this technique of time analysis to full project graph, but this needs consideration of conditional DF behavior during collapsing of parallel activities or paths.We leave this task for the future.
Our bounding approach presumes application of CFE [14] using known moment generating function (MGF) of activity DFs.This promises an approximation of time quantiles of the real convoluted DF of the last activity in the sequence.In order to verify the quality of our bounds, we consider the case of uniformly distributed activity times.Fortunately, there is a closed form PDF expression for the sum of random variables for this case.Thus, we can resolve any time quantile with PDF.However, we are searching for a better bounding technique in general.Therefore, this particular case is convenient for verification of the quality of our bounds because this DF type expresses the largest uncertainty.Thus, we verify the quality of our CFE bounds (quantiles of the sum) as an absolute error in comparison with known analytical solution for the quantiles based on known PDF of the sum of uniform distributions (UD).The CFE technique should work with any DF types, including discrete, mixed or non identical DFs.In case of convolution of other DFs, verification of the bounding error in general can be done with simulation techniques.
Using normal distribution and CLT is still popular and original PERT employs it.In fact, the kernel of the original PERT is nothing else as aggregation of Gaussian random time variables along one critical path (CP).Therefore, we consider it useful to compare CFE bounds and quantiles for a normal DF as in the classical PERT analysis and for another perspective distribution, namely uniform distribution.Nevertheless, we should repeat again that serial sub-paths may not be long enough in order to approximate their duration with CLT.
The paper is organized as follows.In section 2 we define the problem and give necessary theoretical information required for CFE understanding and bounds derivation.In section 3 we provide expressions for determination of necessary DF parameters for CFE application, especially for a sum of iid.uniformly distributed time variables.Bounds are computed and compared to the classical PERT in an example problem in section 4. Conclusion summarizes our findings.

Theoretical Background Problem Definition
Let A = (A 1 , A 2 , . . ., A n ) be a vector of random time durations of n project activities t 1 , t 2 , . . ., t n (activities on arcs) with defined precedence relations t i ≺ t j , i ∈ 1, n − 1, j ∈ i + 1, nfrom the start node (event) e 0 to the finish node e n .The activities are performed sequentially.These sequence of activities A is a subset of all activities of the project and they constitute one possible path or a part of it.We consider the case of UD activity durations A i ∼ U (a i , b i ) in this paper.The duration of their execution time is the completion time of the last activity t n before the end event e n .Naturally, this time is a new random variable Z = Σ n i=1 A i .We are interested in concentration of PDF of Z.We introduce the reliability threshold α ∈ [0.5, 1] for the bounds derivation.The lower bound (LB) and upper bound (UB), of project time, such that the real time can be below (above) the bound with respective probability P LB = (1−α)/2 = α and P U B = 1−α .Thus, the probability that the PDF of the value Z gets into the interval [LB, UB] is α.These bounds are related to one of the most important project management objectives, viz. the LB (UB) is bound of the earliest (latest) start or end times of any activity.The LB and UB of the start time of the specific activity k is a random time variable k−1 i=1 A i at the moment just before the activity k begins.The LB and UB of the end time of the specific activity k is a random time variable k i=1 A i at the moment when the activity k has ended.

Moment-generating Function
For ∀w ∈ IR, MGF of a random variable X is defined as [15]: Let Y = bX +a be a new random variable for a random variable X and a, b ∈ R, then the MGF for Y is related to the MGF of X by M Y (w) = e aw M X (bw).For our case of a n-dimensional random vector A with independent components MGF produces k-th moments µ k = E(X k ) of the convolution of n random variables as the value of the k-th derivative at its parameter point w = 0.

Cornish-Fisher Expansion
The CFE is a tool for random variables quantile approximation using only its first few cumulants.According to Stuart and Ord [14], the cumulants of the order r of a random variable X are values κ r , such that ∀t Cumulants of X are connected to moments of X. Cumulant-generating function is K(t) = ln(M X (w)).Like MGF, a specific cumulant κ r is obtained from r-order differential of K(t) at point zero: κ r = K (r) (0).Use of cumulants gives an advantage in the context of our problem.Cumulants of a sum of n independent random variables is the sum of their respective cumulants, e.g. for independent X and Y , K X+Y (t) = K X (t) + K Y (t) and this is true for all orders r of cumulants of the sum κ r (X + Y ) = κ r (X) + κ r (Y ).
Cumulants κ r of a random variable X can be expressed in terms of its mean value µ and its central moments µ r = E[(X − µ) r ].Alternatively, the same cumulants κ r can be expressed in terms of only raw (noncentral) moments µ r = E[X r ].Both the cases are summarized in Table 1 [14].
The CFE tries to approximate the quantile q of a target DF taking into consideration higher moments (skewness and kurtosis) of that DF to adjust for its non-normality.Thus, for a normally distributed random variable X with µ = 0 and σ = 0, Cornish and Fisher [16] derived an expansion that enables approximation of q-quantile Φ −1 X (q) using the five cumulants of X and quantile function of Gaussian distribution Φ −1 Z (q), Z ∼ N (0, 1).There are several versions of the expansion that use different number of cumulants.The formula that uses five cumulants is as follows (all terms are required) [15]: Table 1.Computation of cumulants κr from moments of random variable X.

CFE Application to Bounds of n Activity Distributions
Using concepts of CFE, MGF and basics of probability theory, we derive the simple analytical approximate bounds for random activity time durations.We can use CFE to approximate quantiles of Z = n i=1 A i with means µ i and standard deviations σ i .First, we normalize A i as required by CFE: A i = (A i −µ i )/σ i .Now A i ∼ N (0, 1) and central moments of A i are obtained from the central moments of A i with the expression µ r = µ r /σ r [15].Because A i and consequently A i are independent random variables, we compute individual cumulants of A i and add the respective cumulants as it was explained in section 2 to get cumulants of Z = n i=1 A i .Then, we resolve the required quantile q approximation z * of Z applying CFE expression (3).The approximated value of the original z * -quantile of Z is obtained through de-normalization: In order to generalize the approach to moment derivation for other distributions in the prospective work, we will consider the ways of raw and central moment derivation in more details below.The central moments of A i are required for CFE application.However, we can express central moments µ r = E[(X −µ) r ] through non-central moments µ k = E(X k ).Denoting powers k of the mean value µ k , the required formulas for the central moments are shown in Table 2.
In their turn, raw moments can be obtained from known MGF taking lim t→0 .However, this is hard for UD.Alternatively, raw moments can be determined for a uniform distribution via straitforward integration of µ n = x n f (x)dx with PDF in the terms of Heaviside step function f (x) = H(x−a)−H(x−b) b−a [17]: that enables to get the expressions of raw moments easier (see Table 2).The respective central moments can be determined in the same way for UD via integration of µ n = (x − µ) n f (x)dx (see µ = µ 1 in Table 2) with PDF in the terms of Heaviside step function f (x) = H(x−a)−H(x−b) b−a [17].This approach and gives the required central moments for the UD case: Table 2. Computation of raw and central moments of random variable X.

Illustrative Example
We consider a sequential set of independent tasks with identical UD DFs and obtain approximations of the end time bounds with quantiles for probabilities from q = 0.1 to q = 0.9 with step 0.1 by means of CFE with five cumulants (3).
Results of the classical PERT analysis are also computed.We evaluate the maximum error of CFE and PERT quantile approximation with available expression for PDF f Z (z) of the sum of n UD.According to the original expression formula for f Z (z), the lower bound of DFs should strictly equal to zero B i ∼ U (0, c i ) [18]: where We obtain This is a shift of DFs by constants a i .It is possible to process the random parts B i with known PDFs and then apply correction by adding n i=1 a i .By definition, a value z * such that cumulative distribution function (CDF) a quantile of order q, q ∈ (0, 1).Quantiles for probability q can be evaluated by solving the equality with known PDF f (z) or the respective CDF F (z): q = F (z * ) = u l f (z)dz, where in our case l = 0 is the lower bound and u = z * is the sought upper bound for B i (q is specified).
We consider two sub problems.One is aggregation of two, five and ten identical DFs.The other is addition of random time variable B ∼ U (1, 3) i = 2, 3, 4 and several unequal time DFs i−1 j=1 A 1 ∼ U (1, c j ), c j = 2, 4, 6.We consider that very long task sequences or paths without branches are very unlikely in real projects.Resulting quantiles as bounds for UD are shown in Table 3.We compute CFE approximation z * CF E of quantiles z * for given probabilities q = 0.1, . . ., 0.9.For comparison, results of classical PERT (PERTz * ) are given as well.Absolute errors of Z * approximation can be seen by comparing the quantiles obtained with known PDF (6) in "Real z * " rows and the respective values z * CF E or PERT z * .We also verify the quality of approximations by substitution of the approximated quantile z * into the integral of the known PDF (6) and observing how different the probability F Z (z * CF E ) from the initial q value is.Absolute errors (abs.errorq) of quantiles of orders q between F Z (z * CF E ) and q are also given in the Table 3.The same verification is performed for classical PERT approximations F Z (PERT z * ) and absolute error is given.
It is obvious that classical PERT that uses CLT approximation is significantly worse than CFE approximation unless many identical iid.DFs are considered (and this makes the pattern closer to CLT applicability assumptions).PERT's probability estimation for q quantiles is always worth than CFE for not very equal DFs (and mostly for equal ones) for the most important quantiles (q = 0.1, 0.2 for LB and q = 0.8, 0.9 for UB) and this difference is impressive, i.e. from 1.5 to 10 times and even more.The difference in time value z * approximation from the real one may not seem so dramatic, however, due to the scale of time units, the absolute difference can be substantial for the DM.Additionally, CFE underestimates the LB and overestimates the UB, i.e. produces real bounds, whereas classical PERT does vice versa, i.e. underestimates the quantile values from above and below that is undesirable.

Conclusion
In this study, we attempted at improving only one problem of PERT analysis, namely obtaining time bounds on a sequential set of tasks.For the general case, we can determine upper and lower bounds using bounding techniques.This can be imprecise in case of CLT approximation in classical PERT or if bounding methods do not take into account specific properties of distributions.If time distributions are known, we can determine upper and lower bounds using CFE.
Using CFE, we derived bounds for iid.sum of uniform DFs of activity times and verified the absolute error with exact analytical solution of the quantile problem.The DF type is not used frequently, however it has an advantage of easily obtainable parameter estimation in practical application.We also compared our result with the results of classical PERT analysis.In fact, original PERT employs CLT approximation and thus uses quantiles of Gaussian distribution.Our experiments show that CFE approximation outperforms classical PERT in evaluation of time of sequential tasks with known DFs.This is absolutely important for bounds with high confidence (e.g.α ≥ 0.8).It seems that CFE application instead of CLT approximation can improve evaluation of CP in the classical PERT.
To the best of our knowledge, our bounding technique for UD based on CFE is pioneering in the area of PERT analysis.CPM like time bounds for the case of several consecutive activities with uniform DFs enable evaluation of duration of sub-critical paths in a SAN.Moreover, CFE is a universal tool that can be applied to the task with other DF types.The computational complexity is O(n) for a known SAN structure without simulation modelling.This is the first step in the development of an improved bounding technique for a project SAN.Based on the current results for serial activities, an extension of the bounding technique should be developed for a SAN with converging subpaths.We need to examine conceptual meaning of converging activities into an event before the next common activity after them.Mathematically, this is consideration of a maximum operator for two or more random time variables and choosing the best promising method of DF evaluation for this operator.Another prospective task is evaluation of the bounding approach with triangular time distributions that are also perspective in operational parameter estimation and have been used for project time analysis (e.g.[12]).