Identifying Evidence for Cloud Forensic Analysis

Cloud computing provides beneﬁts such as increased ﬂexibility, scalability and cost savings to enterprises. However, it introduces several challenges to digital forensic investigations. Current forensic analysis frameworks and tools are largely intended for oﬀ-line investigations and it is assumed that the logs are under investigator control. In cloud computing, however, evidence can be distributed across several machines, most of which would be outside the control of the investigator. Other challenges include the dependence of forensically-valuable data on the cloud deployment model, large volumes of data, proprietary data formats, multiple isolated virtual machine instances running on a single physical machine and inadequate tools for conducting cloud forensic investigations. Thisresearch demonstrates that evidence from multiple sources can be used to reconstruct cloud attack scenarios. The sources include: (i) intrusion detection system and application software logs; (ii) cloud service API calls; and (iii) system calls from virtual machines. A forensic analysis framework for cloud computing environments is presented that considers logged data related to activities in the application layer as well as lower layers. A Prolog-based forensic analysis tool is used to automate the correlation of evidence from clients and the cloud service provider in order to reconstruct attack scenarios in a forensic investigation.


Introduction
Digital forensics involves the identification, collection, examination and analysis of data while preserving its integrity and maintaining strict chain of custody during post-incident investigations [9].Network forensics is a component of digital forensics that primarily focuses on the analysis of network traffic and other data from intrusion detection systems and logs [14].Cloud forensics is an emerging branch of network forensics, which involves post-incident analysis of systems with distributed processing, multi-tenancy, virtualization and mobility of computations.Ruan et al. [16] identify several challenges associated with cloud forensics.These include the dependence of forensically-valuable data on the cloud deployment model and methods, large volumes of data, proprietary data formats, large numbers of diverse, simultaneously-executing virtual machine instances, lack of monitoring and alerts by hypervisors that run virtual machines, and limited techniques and tools designed specifically for cloud forensic investigations.
The National Institute of Standards and Technology (NIST) [7] has published a cloud computing standards roadmap that emphasizes cloud governance, security and risk assessment.A key recommendation in the roadmap and by members of the digital forensics research community [14,16] is the implementation of forensics-enabled clouds.However, most approaches focus on evidence gathering from infrastructure-as-aservice cloud model deployments.No formal approach currently exists for reconstructing attack scenarios based on evidence collected in virtualized cloud environments.This research demonstrates that evidence from multiple sources can be used to reconstruct cloud attack scenarios.The sources include: (i) intrusion detection system and application software logs; (ii) cloud service API calls; and (iii) system calls from virtual machines.A Prolog-based forensic analysis tool is used to automate the correlation of evidence from the three sources in order to reconstruct attack scenarios in cloud forensic investigations.

Background and Related Work
Cloud computing has three principal service deployments: (i) softwareas-a-service (SaaS); (ii) platform-as-a-service (PaaS); and (iii) infrastructure-as-a-service (IaaS) [12].A software-as-a-service model enables consumers to use service provider applications running on a cloud infrastructure.A platform-as-a-service model allows consumers to deploy their own applications or acquired applications using programming languages, libraries, services and tools supported by the service provider.An infrastructure-as-a-service model provides consumers with the ability to provision processing, storage, networks and other fundamental computing resources, including operating systems and applications.
Cloud forensics is a subset of network forensics that uses techniques tailored to cloud computing environments [16].For example, data acquisition is different in the software-as-a-service and infrastructure-asa-service models because an investigator has to depend entirely on the cloud service provider in the case of a software-as-a-service model whereas an investigator can acquire virtual machine images from a customer in an infrastructure-as-a-service model.
Several techniques have been proposed to collect evidence from cloud environments, including remote data acquisition, management plane acquisition, live forensics and snapshot analysis [15].Dykstra and Sherman [3] have retrieved volatile and non-volatile data from the Amazon EC2 cloud active user instance platform using traditional forensic tools such as EnCase and FTK.However, these tools do not validate the integrity of the collected data.Dykstra and Sherman [4] subsequently developed the FROST toolkit, which can be integrated within OpenStack to collect logs from the operating system that runs the virtual machines; this technique assumes that the cloud provider is trustworthy.Zawoad et al. [19] have designed a complete, trustworthy and forensics-enabled cloud.
Hay and Nance [5] have conducted live digital forensic analyses on clouds with virtual introspection, a process that enables the hypervisor or any other virtual machine to observe the state of a chosen virtual machine.They also developed a suite of virtual introspection tools for Xen (VIX tools).At this time, live forensic tools have not been incorporated as a commercial service by cloud providers.
Snapshot technology enables cloud customers to freeze virtual machines in specific states [2].A frozen snapshot image may be restored by loading it to a target virtual machine, following which information about the running state of the virtual machine can be obtained.Several hypervisors, including Xen, VMWare, ESX and Hyper-V, support snapshot features.
In order to reduce the time and effort involved in forensic investigations, researchers have proposed the use of rules to automate evidence correlation and attack reconstruction [10,18].Liu et al. [10] have integrated a Prolog rule-based tool with a vulnerability database and an anti-forensic database to ascertain the admissibility of evidence and explain missing evidence due to the use of anti-forensic tools.However, these rule-based forensic analysis frameworks have been developed for networks, not for cloud environments.

Attack Reconstruction
Liu et al. [10,11] have described an application of the MulVAL logicbased network security analyzer [13] that uses rules representing generic attack techniques to ascertain the causality between different items of evidence collected from a compromised network to reconstruct the at-tack steps.The rules, which are based on expert knowledge, are used as hypotheses by an investigator to link chains of evidence that are written in the form of Prolog predicates in order to create attack steps.Attack scenarios are reconstructed in the form of acyclic graphs as defined below [11].

Definition 1 (Logical Evidence Graph (LEG)):
A logical evidence graph LEG = (N f , N r , N c , E, L, G) is a six-tuple where N f , N r and N c are three disjoint sets of nodes in the graph (called fact, rule and consequence fact nodes, respectively), is the evidence, L is a mapping from nodes to labels and G ⊆ N c is a set of observed attack events.
Every rule node has one or more fact nodes or consequence fact nodes from prior attack steps as its parents and a consequence fact node as its only child.Node labels consist of instantiations of rules or sets of predicates specified as follows: Valid instantiations of these predicates after an attack update valid instantiations of the three predicates listed in item 1 above.Figure 1 shows an example logical evidence graph; Table 1 describes the nodes in Figure 1.In Figure 1, fact, rule and consequence fact nodes are represented as boxes, ellipses and diamonds, respectively.Consequence fact nodes (Nodes 1 and 3) codify the attack status obtained from event logs and other forensic tools that record the postconditions of attack steps.Fact nodes (Nodes 5, 6, 7 and 8) include network topology (Nodes 5 and 6), computer configuration (Node 7) and software vulnerabilities obtained by analyzing evidence captured by forensic tools (Node 8).Rule nodes (Nodes 2 and 4) represent rules that change the attack status using attack steps.These rules, which are based on expert knowledge, are used to link chains of evidence as consequences of attack steps.Linking a chain of evidence using a rule creates an investigator's hypothesis of an attack step given the evidence.

Reconstructing Attack Scenarios
This section demonstrates how three experimental attacks launched on a private cloud are reconstructed using evidence from the cloud.

Experimental Setup
OpenStack was used to create a private cloud.OpenStack is a collection of Python-based software projects that manage access to pooled storage and computing and network resources that reside in one or more machines corresponding to a cloud.The collection has six core projects: (i) Neutron (networking); (ii) Nova (computing); (iii) Glance (image management); (iv) Swift (object storage); (v) Cinder (block storage); and (vi) Keystone (authentication and authorization).OpenStack can be used to deploy software-as-a-service, platform-as-a-service and infrastructure-as-a-service cloud models; however, it is mostly deployed as an infrastructure-as-a-service cloud.
DevStack is a series of extensible scripts that can invoke an Open-Stack environment quickly.DevStack was used to deploy a private infrastructure-as-a-service cloud with a version of Juno on an Ubuntu computer that was accessed from IP address 172.16.168.100.An authenticated user can manage OpenStack services by entering the IP address 172.16.168.100 on a browser to access the cloud control dashboard Horizon as shown in Figure 2.
Two virtual machine instances were deployed in the private cloud, a web server named WebServer with IP address 172.16.168.226 and a file server named FileServer with IP address 172.16.168.229.The instances were managed by an authenticated user named admin.Web-Server was an Apache server with a MySQL database that enabled SQL queries to be issued via web applications.Also, SSH was set up on FileServer to enable authenticated users to access it remotely.The Kali ethical hacking Linux distribution tool was set up in the same network at IP address 172.16.168.173in order to launch attacks.

Experimental Attacks
A SQL injection attack, distributed denial-of-service (DDoS) attack and denial-of-service (DoS) attack were launched at the two virtual machines in the infrastructure-as-a-service cloud.The SQL injection attack exploited an unsanitized user input (CWE89 vulnerability) to the web server.The DDoS attack involved a TCP connection flood that used nping in Kali to prevent legitimate requests from reaching the file server.The SQL injection and DDoS attacks could target any network (including a cloud) that has the associated vulnerabilities.However, only privileged users in the infrastructure-as-a-service cloud can resize and delete a virtual machine by launching the DoS attack that exploits vulnerabil-  terminate when an instance is deleted by exploiting CVE-2015-3241, so an authenticated user could bypass the user quota enforcement mechanism to deplete all the available disk space by repeatedly performing instance migration.
Figure 3 shows the resizing of the file server from ds512M to ds1G where the availability zone of the instances is Nova.Instances were resized and deleted until Nova was so depleted that it could not accept any new instances.

Collecting Evidence for Reconstruction
In order to obtain evidence for forensic analysis, WebServer and the SQL database in WebServer were configured to log accesses and query history.Also, Snort was installed on the virtual machines in WebServer and FileServer while Wireshark was deployed in the Ubuntu host machine to monitor network traffic.Snort was configured to capture the SQL injection attack, which generated alerts based on the pre-set rules while Wireshark was configured to capture packets associated with the DDoS and DoS attacks.
Figure 4 lists example Snort alerts and MySQL query logs for the SQL injection attack.Note that the attack was launched using or '1'='1' to bypass the SQL query syntax check.
Figure 5 shows a snapshot of the packets captured by Wireshark.Kali Linux at IP address 172.16 A Prolog-based forensic tool [10,11] was used to automate the process of correlating items of evidence to reconstruct the SQL injection and DDoS attacks.This was accomplished by coding the evidence and the cloud configuration as Prolog predicates to create the input file shown  in Figure 6.At runtime, the input file instantiated the rules to create the attack paths shown in Figure 7.
Table 2 describes the notation used in Figure 7, which shows two attack paths.The attack path on the left [7,8]  Figure 8 shows a snapshot of the Nova API logs pertaining to the instance migration caused by the DoS attack.The commands in bold font show that instance bd1dac18-1ce2-44b5-93ee-967fec640ff3 representing the FileServer virtual machine was resized via the commands //Initial attack status and final attack status attackerLocated(internet). attackGoal(serviceDown(fileServer, user)).attackGoal(execCode(database, user)).
To combine the attack status and cloud system configuration, the related Nova API calls were manually aggregated and encoded as Prolog evidence predicates.This yielded the input file shown in Figure 9.
Running the Prolog-based forensic analysis tool on this input file produced the logical evidence graph shown in Figure 1, but with different node notation (shown in Table 4).The logical evidence graph shows an attack path that exploited the vulnerability CVE-2015-3241 and used the control dashboard Horizon to launch a DoS attack on the cloud.
Figure 7, which represents the SQL injection and DDoS attacks, and Figure 1, which represents the DoS attack, cannot be grouped together because the attacks originated from different locations.In addition, the DoS attack was on the Nova service instead of on a virtual machine, although it was launched from a virtual machine.

Using System Calls for Evidence Analysis
Because system calls enable low user-level processes to request kernel level services such as storage operations, memory and network access, and process management, they are often used for intrusion detection and  forensic analysis [6].When evidence cannot be obtained from forensic tools or system services to help recognize a known attack, system calls can be used to ascertain system behavior.Because it would be extremely rare to have an attack path in which every attack step is a zero-day   attack [17], system calls can help reconstruct the missing attack steps when other evidence is not available.Five popular mechanisms are available to trace the system calls in a cloud-based virtual machine: (i) ptrace command that sets up system call interception and modification by modifying a software application; (ii) strace command that logs system calls and signals; (iii) auditing facilities within the kernel; (iv) system call table modification and the use of system call data writing wrappers to log the corresponding system calls; and (v) system call interception within a hypervisor [1].Because OpenStack supports several hypervisors, including Xen, QEMU, KVM, LXC, Hyper-V and UML, no generic solution for intercepting system calls within a hypervisor exists.Hence, the strace command and system  call table modification with system call data writing wrappers may be used to log relevant system calls.An example attack launched from Kali Linux is used to demonstrate how system call sequences are used in attack reconstruction.In this attack, SSH was used to log into FileServer by supplying stolen credentials from a legitimate user named coco.In order to simulate the stealthy attack without triggering intrusion detection sytem alerts, the attacker was assumed to use shoulder surfing to obtain the (username, password) credentials.Figure 10 shows the SSH log from /var/log/auth.log in FileServer.The log entry shows that coco logged into FileServer from 172.16.168.173, which actually belonged to the attacker, indicating that the attacker stole the credentials belonging to coco.
A process typically issues many system calls; however, only some of the calls are important for ascertaining process behavior.The important system calls [17] are listed in the second column of Table 5.
Figure 11 shows the important system calls captured from the attack.The read and write calls (in bold font) indicate that the attacker  opened and modified a file named test.txt.In a read or write call, the first argument is the file descriptor where the process reads/writes data, the second is the buffer contents, the third is the number of bytes read/written by the system call; and = 1 or any number greater than 1 indicates that the system call was executed successfully.
The program behavior and the opening and modifying of a legitimate user's file were expressed in the form of the Prolog predicate: canAccessFile(fileServer, user, modify, ).This predicate states that the attacker as a legitimate user can modify the file located at , which represents the home directory of the legitimate user.Using the evi-//Initial attack status attackerLocated(internet). //Attacker was able to log into FileServer using stolen credentials attackGoal(logInService(fileserver, tcp, 22).attackGoal(principalCompromised(user)).//Incompetent user inCompetent(user).//Attack status obtained by analyzing system call sequence attackGoal(canAccessFile(fileServer, user, modify, )).//User could log into FileServer using the SSH protocol networkServiceInfo(fileServer, sshd, tcp, 22, ).//User who has the account on FileServer has file modification privileges localFileProtection(fileServer, user, modify, ).dence obtained from the log in Figure 10, which shows that the attacker with stolen credentials (expressed by the predicates: (i) attack-Goal(principalCompromised(user)); (ii) inCompetent(user); and (iii) at-tackerLocated(internet)) logged into FileServer using SSH (expressed by the predicate attackGoal(principalCompromised(user))), and the fact that user coco with an account on FileServer had the privileges to modify files (expressed by the predicate localFileProtection(fileServer, user, modify, )), the input file shown in Figure 12 was created for the Prologbased tool.
Figure 13 shows the reconstructed attack paths and Table 6 shows the associated node notation.The attack path [3,4,7] → 2 → 1 has three pre-conditions, which are represented by Nodes 3, 4 and 7. Node 3 expresses the fact that files in FileServer can be modified by FileServer users.Node 4 is obtained from the fact that FileServer can be accessed using SSH via TCP on port 22. Node 7 is obtained from the SSH authentication log in Figure 10, which indicates that the user's credentials were stolen by the attacker.Note that, without the evidence obtained from the system call sequence (Node 1), the attack path [3,4,7] → 2 → 1 would not have been established.
The two rule nodes (Node 5 and Node 2) in Figure 13 do not have rule descriptions because of the obvious correlation between Node 6 and Node 4 (if the network provides the SSH service for logging into File-Server via TCP on port 22, then any user or attacker with stolen credentials could log into FileServer); and Nodes 3, 4 and 7 collectively and Node 1 (if a user has privileges to modify a file in FileServer, then the attacker who has stolen a user's credentials could modify the file).

Conclusions
Cloud computing increases the efficiency and flexibility of enterprise operations.However, clouds present significant challenges to digital forensics.One challenge is the lack of customer control over the physical locations of data.Other challenges include the dependence of forensicallyrelevant data on the cloud deployment model, large volumes of data, proprietary data formats, multiple isolated virtual machine instances running on a single physical machine, and inadequate tools for conducting cloud forensic investigations.
This research has demonstrated that evidence from multiple sources can be used to reconstruct cloud attack scenarios.The sources include intrusion detection system and application software logs, cloud service API calls and system calls from virtual machines.To acquire evidence from the sources, a forensics-enabled cloud should support: (i) logging and retrieval of intrusion detection system and software service data; (ii) secure storage and retrieval of OpenStack service API call logs, firewall logs and snapshots of running instances; and (iii) storage and retrieval of system calls, especially when the first two sources are unavailable.The Prolog-based forensic analysis presented in this chapter demonstrates the effectiveness and utility of automating the correlation of evidence from multiple sources to reconstruct attack scenarios in digital forensic investigations.
Future research will implement extensions to the forensics-enabled cloud to preserve data integrity, reduce data volume and manage the diversity of digital forensic data stored in the cloud.
This chapter is not subject to copyright in the United States.Commercial products are identified in order to adequately specify certain procedures.In no case does such an identification imply a recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the identified products are necessarily the best available for the purpose.
ity CVE-2015-3241 in OpenStack Nova versions 2015.1 through 2015.1.1 and 2014.2.3 and earlier.The process of resizing and deleting an instance in this way is called instance migration.The migration process does not

Figure 4 .
Figure 4. Example Snort alerts and MySQL query logs.
→ 6 → [5, 9, 10] → 4 → [3, 11] → 2 → 1 corresponds to the SQL injection attack on the web server that exploited the CWE89 vulnerability to steal user data.The attack path on the right [8, 16] → 15 → [14, 17, 18] → 13 → 12 corresponds to the DDoS attack on FileServer.However, Snort and Wireshark failed to capture the DoS attack on FileServer that exploited the CVE-2015-3241 vulnerability in the Open-Stack Nova service.Fortunately, the OpenStack Nova API logs, which record information about user operations on running instances, provided evidence related to the DoS attack on FileServer.

Figure 6 .
Figure 6.Prolog predicates for the SQL injection and DDoS attacks.

Figure 7 .
Figure 7. Attack path reconstruction for the SQL injection and DDoS attacks.

Figure 11 .
Figure 11.Traces of read and write system calls.

Figure 12 .
Figure 12.Input file for modifying a file with stolen credentials.

Figure 13 .
Figure 13.Attack path reconstruction using evidence obtained from system calls.

Table 1 .
Descriptions of the nodes in Figure1.
.168.173 sent numerous SYN packets to FileServer at IP address 172.16.168.229 and FileServer sent numerous SYN-ACK packets back to Kali Linux.

Table 2 .
Descriptions of the nodes in Figure7.

Table 3 .
Virtual machine instances, names and IP addresses.

Table 4 .
Descriptions of nodes in the DoS attack.

Table 5 .
Important system calls.

Table 6 .
Descriptions of the nodes in Figure13.