Measuring Evidential Weight in Digital Forensic Investigations: a Role for Bayesian Networks in Digital Forensic Triage

– a method for obtaining a quantitative measure of the relative weight of each individual item of evidence in a digital forensic investigation by means of a Bayesian network is described. The resulting evidential weights can then be used to determine a near-optimal cost-effective triage scheme for the investigation in question.


Introduction and Background
Until recently, an inability to reliably quantify the relative plausibility of alternative hypotheses purporting to explain the existence of the totality of the recovered digital evidence in a criminal investigation has hindered the development of digital forensics into a mature scientific and engineering discipline from the qualitative craft that originated in the mid-1980s [1].Such a rigorous science and engineering oriented approach should not only provide numerical results but should also quantify the confidence limits, sensitivities and uncertainties associated with these results.However, beyond the works cited in the present contribution, there appears to be a dearth of research literature devoted to developing such a rigorous approach to digital forensic investigations.
Posterior probabilities, likelihood ratios (LRs) and odds, generated using technical approaches such as Bayesian Networks (BNs), are capable of providing digital forensic investigators, law enforcement officers and legal personnel with a quantitative scale or metric against which to assess the plausibility of an investigatory hypothesis which may be linked to the likelihood of a successful prosecution, or indeed the merit of a not-guilty plea.This approach is sometimes referred to as digital meta-forensics; some examples can be found in [2,3].
A second and closely related issue involves reliably quantifying the relative weight of each of the individual items of digital evidence recovered during a criminal investigation.This is particularly important from the perspective of digital forensic triage, i.e. the prioritisation strategy for searching for digital evidence, in the context of the ever-increasing volumes of data and varieties of device that are routinely seized for examination [4].The economics of digital forensics, also known as digital forensonomics [5], provides for the possibility of a quantitative basis upon which to prioritise the search for digital evidence during a criminal investigation, by making use of such well-known concepts from the field of Economics as Return-on-Investment (RoI) or, essentially equivalently, Cost-Benefit Ratio (CBR).
In this approach, a list of all the expected items of digital evidence for the hypothesis being investigated is drawn up.For each item of digital evidence, two attributes are required: (i) its cost, which is in principle relatively straightforward to quantify as it is usually measured in terms of the resources required to locate, recover and analyse that item of digital evidence, typically investigator hours plus any specialist equipment hire-time needed; (ii) its relative weight, which measures the contribution that the presence of that item of digital evidence makes towards supporting the hypothesis, and until now it is usually based on the informal opinions or consensus of experienced digital forensic investigators [6].
The principal contributions of this short paper are: (i) to demonstrate that a quantitative measure of the relative weight of each item of digital evidence in a particular investigation can be obtained in a straightforward manner from the Bayesian network (BN) representing the hypothesis underpinning that investigation; and (ii) to demonstrate that these evidential weights can be employed to create a near-optimal cost-effective evidence search list for the triage phase of the digital forensic investigation process.

Methodology
Bayesian networks (BNs) were first proposed by Judea Pearl [7], based upon the concept of conditional probability originated by Thomas Bayes in the eighteenth century [8].Formally speaking, a BN is a directed acyclic graph (DAG) representation of the conditional dependency relationships between entities such as events, observations or outcomes.Visually, a BN typically resembles an inverted tree.In the context of digital forensic investigations, the root node of the BN represents the overall hypothesis underpinning the investigation in question, the child nodes of the root node represent the sub-hypotheses which contribute to the overall hypothesis, and the leaf nodes represent the items of digital evidence that are associated with each of the sub-hypotheses.After populating the interior nodes with conditional probabilities (likelihoods) and assigning prior probabilities to the root node, the BN can then propagate these probabilities using the rules of Bayesian inference to produce a posterior probability for the root hypothesis.However, it is the architecture of the BN together with the definition of each sub-hypothesis and its associated evidential traces, which together define the hypothesis characterising the specific investigation.The first application of a BN to a specific digital forensic investigation appears to be that reported in [2]. Figure 1 illustrates an example of a BN applied to a particular digital forensic investigation.
The posterior probability output by the BN when all of the expected items of digital evidence are present is compared with the posterior probability of the BN when item i of the digital evidence is absent (but all the other expected evidential items are present); the difference between, and the ratio of, these two quantities both provide a direct measure of the relative weight of item i of the digital evidence in the particular context of the hypothesis of the investigation represented by the BN.Thus the relative weight of evidential item i can be written as: (relative-weight)i ∝ posterior-probability -(posterior-probability)i (1) or, in normalized form, as: (relative-weight)i ∝ I -{(posterior-probability)i / posterior-probability} (2) or alternatively as: (relative-weight)i ∝ posterior-probability / (posterior-probability)i ( 3) where (posterior-probability)i signifies the posterior probability output by the BN when item i of the digital evidence is absent.From a ranking perspective, any one of equations ( 1), ( 2) or (3) could be used since in each case the relative weight of evidential item i increases monotonically with the difference between the posterior probabilities.For the remainder of this work we will continue to employ equation (1).
For a BN involving ne items of digital evidence it is necessary to perform (ne + 1) executions of the BN.Once all of the relative evidential weights have been obtained in this manner using any one of equation ( 1), ( 2) or (3), the RoI and CBR for item i of the expected digital evidence in the hypothesis are given by the following two equations, respectively [5]: (RoI)i ∝ (relative-weight)i / [(examiner-hours)i × (hourly-cost) + (equipment-cost)i] (4) (CBR)I ∝ [(examiner-hours)i × (hourly-cost) + (equipment-cost)i] / (relative-weight)I ( 5)

Results and Discussion
As an illustrative application of the proposed approach we have taken the real-world criminal case of the illegal uploading of copyright protected material via the peer-to-peer BitTorrent network [2,10].The freely available BN simulator MSBNx [11] from Microsoft Research was used to perform all the required calculations initially; these results were subsequently verified independently using the free version of AgenaRisk [12].A previous sensitivity analysis performed on the BitTorrent BN [9] demonstrated that the posterior probabilities, and hence the relative evidential weights derived from them, are stable to within <±0.5%.
The ranked evidential weights of the 18 items of digital evidence shown in Figure 1 are listed in Table 1, together with their estimated relative costs [6] and their associated RoIs and CBRs as given by equations ( 4) and ( 5) respectively.The relative evidential recovery costs for the BN are taken from [6] and were estimated by experienced digital forensic investigators from the Hong Kong Customs & Excise Department IPR Protection group, taking into account the typical forensic examiner time required together with any specialist equipment utilisation needed.In the present approach it has been assumed that the typical cost of locating, recovering and analysing each individual item of digital evidence is fixed, although it can be envisaged that under certain circumstances an evidentiary cost could be variable, for example, if its recovery required the invocation of a mutual legal assistance treaty (MLAT) with law enforcement officers in another jurisdiction.
The relative evidential weights in Table 1 can be used to create an evidence search list, with the evidential items ordered first by decreasing relative weight and, within that, either by decreasing RoI or, equivalently, by increasing CBR.This search list can be used to guide the course of the triage phase of the digital forensic investigation in a near-optimal cost-effective manner by ensuring that evidential 'quick wins' (or 'low-hanging fruit') are processed early on in the investigation whilst evidence of low relative weight which is costly to obtain is relegated until later on, when it may become clearer whether or not the support of this evidence will be crucial to the overall support for the investigative hypothesis.
The advantages of a procedure such as this are that if an item of evidence of high relative weight is not recovered, this fact will be detected early on during the investigation and could result in the investigation being de-prioritised or even abandoned at an early stage, before valuable resources (of time, effort, equipment, etc.) have been expended unnecessarily.In addition, it may be possible to terminate the investigation without the need to search for an item of evidence of low relative weight with a high recovery cost (e.g. the requirement to use a scanning electron microscope to detect whether or not a solid-state memory latch or gate is charged), as a direct consequence of the Law of Diminishing Returns.
In the BitTorrent example illustrated above, if evidential item E18 could not be recovered, the outcome for the investigation would probably be serious and might well lead to its immediate de-prioritisation or even abandonment, whereas the absence of evidential items E5 or E7 would make very little difference to the overall support for the digital forensic investigation hypothesis.
A further possible refinement of the scheme outlined above can be introduced by considering the role of any potentially exculpatory (i.e.exonerating) items of evidence in the investigatory context.Such evidence might be, for example, that CCTV footage reliably places the suspect far from the presumed scene of the digital crime at the material time.The existence of any such evidence would by definition place the investigatory hypothesis in jeopardy.Therefore if any such potential evidence could be identified in advance then a search for this potentially exculpatory evidence could be undertaken either before or in parallel with the search for evidential items in the triage schedule.However, since by definition the BN for the investigatory hypothesis would not contain any exculpatory evidential items, it cannot be used directly to obtain the relative weights of any such items of exculpatory evidence.Hence it is not possible to formulate a cost-effective search strategy for these items on the basis of the BN itself.

Summary and Conclusions
A method to obtain numerically the relative weight for each item of digital evidence from the associated BN has been outlined and illustrated by applying it to the commonly occurring criminal case of piracy of copyright protected material using the BitTorrent P2P network.By considering the corresponding RoIs or CBRs, a near-optimal cost-effective digital forensic triage search strategy for this exemplar case can be constructed, which eliminates unnecessary utilisation of scarce resources (of time, effort, equipment, etc.) in today's overstretched, under-resourced, digital forensic investigation laboratories.