Hierarchical Multicore-Scheduling for Virtualization of Dependent Real-Time Systems

. Hypervisor-based virtualization is a promising technology to concurrently run various embedded real-time applications on a single multicore hardware. It provides spatial as well as temporal separation of diﬀerent applications allocated to one hardware platform. In this paper, we propose a concept for hierarchical scheduling of dependent real-time software on multicore systems using hypervisor-based virualization. For this purpose, we decompose oﬄine schedules of singlecore systems based on their release times, deadlines, and precedence constraints. Resulting schedule fragments are allocated to time partitions such that task dead-lines as well as precedence constraints are met while local scheduling order of tasks is preserved. This concept, e.g., enables consolidation of various dependent singlecore applications on a multicore platform using full virtualization. Finally, we demonstrate functionality of our concept by an automotive use case from literature.


Introduction
Nowadays, there is a raising interest in multicore technology for embedded realtime systems. Using multicore hardware promises not only more computational power but also reduced system size, weight, and power consumptions. However, many embedded applications require sequential interaction between different components. Increasing system performance is not reached by parallelization of dedicated software but rather by running various applications on one multicore platform concurrently [12]. Virtualization provides means to separate various applications. Multicore architectures and virtualization are therefore known as symbiotic technologies [9].

Hypervisor-based Virtualization
In this paper we focus on type-1 hypervisor-based virtualization, i.e. an additional software layer -the hypervisor -is placed between hardware and oper-ating system (OS) respectively application software. As type-1 hypervisor run bare-metal, they must provide, e.g., device drivers either by their own (monolithic) or by means of some special guest system (console-guest). Hypervisor provide virtual machines (VM) that represent duplicates of the real hardware. These VMs allow to run various systems spatial and temporal separated on a single hardware platform. Literature distinguishes full and para-virtualization [9]. While guest systems running at full virtualization are not aware of the hypervisor, para-virtualized systems are adapted to run in VMs. Consequently, para-virtualization allows information exchange between guest system and hypervisor, but full virtualization does not.

Problem Statement
Temporal isolation is an important property of hypervisor-based virtualization for embedded real-time systems. Current real-time hypervisor ensure temporal isolation of various VMs by some cyclic scheduling on hypervisor-level (cf. Sect. 2.2). These approaches provide each VM a guaranteed share of processing time during a predefined period, but dependencies between VMs remain an open issue.
Dependencies between tasks hosted by the same VM must be solved by its local scheduler. But dependencies between VMs must be solved by hypervisor scheduler. Using para-virtualization, local schedulers could notify the hypervisor when tasks are finished. This may enable solutions based on servers to schedule VMs with precedence constraints. In contrast, full virtualization implies that local and hypervisor scheduler cannot actively exchange information. Hence, apriori knowledge of local schedules and VM-dependencies are required to get an appropriate global scheduling.

Contribution
In this paper, we focus on hierarchical real-time scheduling of dependent VMs to enable full virtualization of singlecore systems deployed to multicore hardware. Here, dedendencies are given by precedence constraints. The challenge is to share execution time of p > 1 cores to m > p VMs such that deadlines as well as precedence constraints are met. Each VM encapsulates a periodic real-time system driven by its separate local singlecore schedule. Time sharing shall be realized by a fixed cyclic scheduling that guarantees holding task deadlines and precedence constraints. We consider task sets with acyclic dependencies, because cyclic task dependencies imply non-deterministic behavior. Nevertheless, resulting VM dependency graph may contain cycles. To meet deadlines as well as precedence constraints of the overall system, hypervisor scheduler has to preempt execution of VMs. For this purpose, first we decompose local schedules and then allocate time partitions of various length to those parts of VM schedules. The result of our approach is an offline multicore schedule for VMs that provides not only sufficient execution time for each VM but also considers precedence constraints.

Related Work
In this paper, we address hierarchical scheduling of periodic tasks with precedence constraints on a multicore platform. We therefore divide related work into approaches related to multicore scheduling and hierarchical scheduling.

Multicore Scheduling
Multicore scheduling approaches are classified as partitioned or global [1]. Davis and Burns [5] state that i) most published research addresses independent tasks and ii) main advantage of partitioned multicore scheduling is reuse of results from singlecore scheduling theory after allocation of tasks to cores has been achieved. Considering periodic task sets with precedence constraints, partitioned scheduling allows to apply, e.g., adapted Earliest Deadline First (EDF*) presented by Chetto et al. [2] or Deadline Monotonic (DM) based scheduling proposed by Forget et al. [6]. Both approaches adapt deadlines to solve dependencies between tasks allocated to a singlecore and thus enable deadline-based scheduling as for independent task sets. But dependencies between tasks allocated to different cores are not considered. For global multicore scheduling, e.g., some scheduling policies from singlecore scheduling were adapted. For independent tasks, global EDF schedules p tasks with earliest absolute deadline at each time, where p is number of cores. Lee [11] extended global EDF to Earliest Deadline Zero Laxity (EDZL) that was proven to dominate global EDF [1]. Cho et al. [3] presented Largest Local Remaining Execution time First (LLREF). It is an optimal offline real-time scheduling approach for independent periodic tasks with implicit deadlines (d = T ) and it performs non-work-conserving scheduling, i.e. cores can be idle even in case of ready tasks. Rönngren and Shirazi [15] proposed static scheduling of periodic tasks with precedence constraints for multiprocessor systems connected by a time division multiple access (TDMA) bus network. They adapt task deadlines -similar to [2], [6] -and apply a heuristic that schedules tasks w.r.t. earliest starting time, laxity, etc. In contrast to these approaches, our work aims at global offline scheduling that does not adapt local schedules, i.e. task parameters as well as local execution order keep untouched.

Hierarchical Scheduling
Most approaches for hierarchical scheduling at virtualization focus on independent sub-systems, while our work allows dependencies between those systems. In [7], Grösbrink and Almeida present hierarchical scheduling for hypervisor-based real-time virtualization of mixed-criticality systems. They address independent periodic VMs and apply partitioned hierarchical scheduling, i.e. VMs are allocated as periodic servers to cores and each core schedules its servers according to Rate Monotonic (RM). Masmano et al. [13] present the monolithic hypervisor XtratuM that provides para-virtualization. It schedules VMs -called partitions -globally by a static cyclic schedule and locally by a preemptive fixed prioritybased policy [4]. Xi et al. [16] present the console-guest hypervisor RT-Xen. It enables scheduling VMs as periodic or deferrable servers by EDF or DM priority schemes. Masrur et al. [14] proposed the priority-based scheduling plus simple EDF (PSEDF) to apply XEN hypervisor for mixed-criticality systems in automotive domain. But in contrast to our work, none of these approaches allows precedence constraints between VMs.

System Model
This paper focuses on hierarchical scheduling of periodic dependent real-time systems on a multicore platform. Usually, periodic embedded real-time systems get input from some sensors and compute output to control some acutators. But resources are limited to get input respectively set output via direct I/O access or network interfaces. To take this into account, we consider a periodic task model that allows asynchronous release of tasks: Each task τ i ∈ Γ is characterized by its worst case execution time (WCET) C i , period T i , constrained deadline D i ≤ T i , and offset O i . By means of constrained deadlines and offsets, we are able to cover systems where the multicore platform is connected to a time-triggered network. We denote j th instance of task τ i by τ ij and its absolute deadline by d ij . Task dependencies are given by precedence constraints τ i ≺ τ j meaning that τ i must finish before τ j can start execution. This corresponds to implicit communication between tasks, i.e. tasks require input just when they start and provide output when finished. To keep software behavior deterministic, we assume acylic task graphs. Consequently, task dependencies can be described by directed acyclic graphs (DAG). We define the set of source respectively sink nodes as While tasks τ i ∈ source make progress as soon as they are scheduled, tasks with predecessors (τ ∈ Γ \source) can only progress when required input has been delivered. Using hypervisor-based virtualization, task set Γ is mapped to a set of virtual machines (VM) where VM υ k is given by a task set γ k ⊂ Γ and a scheduling σ k . In general, σ k can be an online or offline scheduling. In this paper, however, we assume offline singlecore scheduler running within VMs, i.e. σ k represents a fix order how tasks τ i ∈ γ k are scheduled. We denote worst case start time of task instance τ ij scheduled by σ k with σ s k (τ ij ) and its worst case finishing time with σ f k (τ ij ). The hypervisor scheduler is a fix cyclic schedule, i.e. VMs are scheduled by means of time partitions to keep temporal isolation. Each time partition represents a time interval I h = [a h , a h + l h [ defined by its start time a h and length (duration) l h . A VM υ k mapped to a time partition I h will be scheduled at time a h for l h time units. During this time, VM υ k can progress according to its schedule σ k . The hypervisor schedule finally provides for each core a set of time partitions where each partition I h is associated to a dedicated VM υ k . We note this association by I υ k h .

Hierarchical Scheduling with Precedence Constraints
Hierarchical scheduling comprises scheduling of schedules and thus introduces different levels of scheduling. We consider hierarchical scheduling for hypervisorbased virtualization that implies two levels: Global scheduling of VMs by hypervisor and local scheduling of tasks within each VM. We restrict local schedulers to offline singlecore schedules, i.e. execution order of tasks is fix within each VM. This restriction simplifies handling a-priori knowledge of local schedules that we require to cover full virtualization.
The main idea of our approach is to combine knowledge of local schedulers' task execution order with a-priori knowledge of tasks' WCETs and dependencies to compute worst case time partitions (WCTP) for VMs. That is, we calculate worst case VM execution time required to guarantee that a dedicated task τ has finished (cf. Sect. 4.2). In Section 4.3, we schedule these time partitions, which represent activation slots of the corresponding VMs, on a multicore system. In case of success, assigning execution time to VMs according to the resulting schedule ensures that task dependencies as well as tasks' deadlines are met.

Necessary Condition for Schedulability
To our best knowledge, literature provides no schedulability test that is neccessary as well as sufficient for periodic tasks with precedence constraints on multicore systems. For multicore scheduling, there are also no approaches known that convert precedence constraints to real-time constraints -as proposed by Chetto et al. [2] for singlecore scheduling. This makes transferring results of multicore scheduling theory from independent to dependent task sets challenging. However, some results from multicore scheduling theory of independent tasks can be transferred to task sets with precedence contraints at least as necessary conditions. For instance, a trivial fact from scheduling theory is that a task set Γ with computation demand higher than computation supply provided by some hardware with p cores is not schedulable. Consequently, for multicore hardware with p identical cores, utilization of feasible task set Γ cannot be higher than available number of cores, i.e.
Although Eq. 5 is just a necessary condition, it allows to exclude at least some non-feasible task sets.

Decomposition of Local Schedules
Here, we consider local schedules that result from offline singlecore scheduling. Precedence constraints of tasks which are mapped to the same VM are solved by the corresponding local scheduler: Hence, two challenges remain to be solved by hypervisor during VM scheduling: it has to schedule VMs such that (i) deadlines of tasks running within VMs are met and (ii) dependencies between tasks hosted by different VMs are taken into account. For this purpose, we decompose local schedules of VMs based on 1. deadlines of tasks that are sink nodes of dependency graphs (τ ∈ sink) 2. release times of tasks that are source nodes of dependency graphs (τ ∈ source) 3. dependencies between tasks that are hosted by different VMs A first step towards enabling hypervisor to keep deadlines of tasks is done by splitting local schedules at worst case finishing time of sink nodes τ ∈ sink. This eases handling of different periods within task set Γ . Since execution order of tasks is static within a local schedule σ, fulfilling an absolute deadline d requires to run each local schedule until all task instances with absolute deadline d are finished. Therefore, we split local schedule σ at worst case finishing time of a sink node that is scheduled by σ last amongst all other sink nodes of equal absolute deadline: Note, that each resulting fragment of a local schedule is associated with the earliest absolute deadline d of all its tasks. Hypervisor must also consider release time of task instances because VMs with offline schedules cannot progress as long as the currently scheduled task is not ready. To avoid that hypervisor schedules VMs that cannot progress because of unrelased tasks, we apply another decomposition step onto local schedules based on release times. We split local schedule σ at the beginning of a source node that is scheduled first amongst all other source nodes of equal release time by σ: Our last decomposition step is based on precedence constraints of tasks hosted by different VMs. As tasks with precedence constraints are just released when all predecessors have finished execution, we split local schedules based on inter-VM dependencies as follows: if a task τ allocated to VM υ k has predecessors hosted by another VM υ l , l = k, we just split schedule σ k at beginning of τ .
The result of the described decomposition is a totally ordered set Φ k of scheduling fragments ϕ h for each VM υ k . The order within Φ k is such that composing all scheduling fragments ϕ h ∈ Φ k w.r.t. this order results in the original local singlecore schedule σ k . Finally, we compute worst case time partitions (WCTP) based on these scheduling fragments and WCETs. For each local schedule fragment ϕ h , we sum up WCET of task instances covered by this fragment and define a time partition I h of this length. As this time partition is associated with the VM that hosts these task instances, we note:

Multicore Scheduling of Time Partitions
Our approach for hierarchical multicore scheduling is based on time partitions I h that were introduced in Sect. 3. While length of time partitions is set according to the WCTP resulting from decomposition of local schedules (cf. Eq. 9), starting time a h of time partitions as well as a core must be determined by hypervisor scheduler. So, the challenge addressed by our multicore scheduling approach is to allocate time partitions I υ k h to cores C j and set their starting time a υ k h such that all precedence constraints are met and tasks finish before their deadlines even in worst case.
We have to make scheduling decisions each time that a scheduling fragment is released or finished. As we decomposed local schedules based on precedence constraints, finishing one scheduling fragment usually implies that one or more other scheduling fragments were released during this execution. Therefore, we also make scheduling decisions when worst case finishing of a task τ i ∈ γ l with successor task τ j hosted by another VM is passed. However, we just need to consider worst case finishing of the task τ i ∈ γ l that is scheduled by υ l last amongst all other predecessors of τ . Keeping order of local schedules guarantees that all other predecessors hosted by VM υ l are then finished, too.
As multicore decisions are not only based on deadlines but have to consider dependencies as well, we define two sets of scheduling fragments that are updated at each scheduling decision: R covers those scheduling fragments ϕ V M k h that are ready, i.e. predecessors required to execute ϕ V M k h are finished and ϕ V M k h is due according to local schedule σ k . In fact, R is similar to a ready queue known from common task scheduling. Analogous, N covers those scheduling fragments ϕ V M k h that are next to become ready w.r.t. order of local schedule. Both, R and N , contain at most one scheduling fragment ϕ υ k h of a VM υ k . Scheduling decisions are based on the following rules with decreasing priority: 1. Schedule the fragment ϕ h ∈ R with earliest deadline (EDF) Note: Here, we use deadlines associated to scheduling fragments during first decomposition step (cf. 4.2) 2. Schedule the fragment that has most successor fragments ϕ ∈ N While first scheduling rule aims at keeping deadlines, second rule addresses dependencies between different VMs.

Application Example
We will use an application example to demonstrate how our approach presented in Sect. 4 works. Based on the problem definition given in Sect. 1.2, we apply our approach to a minimal system that consists of p = 2 cores and m = 3 VMs. The task set Γ deployed to VMs is taken from Kandasamy et al. [8]. It covers three applications from automotive domain: Adaptive cruise control (ACC), traction control (TC), and electric power steering (EPS). Figure 1 shows the corresponding direct acyclic task dependency graphs. In Table 1, we provide original task parameters of these applications given in [8]. In addition, we adapted WCETs of tasks by some reduction. This represents a scenario where singlecore applications are consolidated on a multicore hardware with increased computational power related to original singlecore hardware. Table 1. Task parameters (original WCET C O i , adapted WCET Ci, period Ti, and relative deadline Di) of example applications shown in Fig. 1, cf. [8].

EPS System
ACC System Task τi  τ1  τ2  τ3  τ4  τ5  τ6  τ7  τ8  τ9 τ10 τ11 τ12 τ13  Task set Γ is deployed to VMs according to an approach presented by Klobedanz et al.( [10], "Algorithm 1: Initial Mapping"). This deployment originally addresses singlecore ECU-networks and thus fits to the indicated scenario of consolidating singlecore systems on a multicore platform. Figure 2 shows the resulting local offline schedules based on original WCETs. These local schedules define execution order of tasks within VMs.

Decomposition of Local Schedules
Now, we use schedule σ 2 to demonstrate decomposition of local schedules. VM υ 2 hosts tasks of two example applications: EPS and TC. Tasks of EPS have deadline D = 1500 while deadline of TC-tasks is D = 3000. Our first decomposition step -splitting based on deadlines -therefore splits schedule σ 2 after finishing of τ 6,1 and associates first fragment with absolute deadline d = 1500 and second fragment with d = 3000.
Next decomposition step -splitting based on release times -is driven by second release of EPS system at host time t = 1500. According to our description in Sect. 4.2, we split σ 2 before beginning of τ 2,2 . Thus, second fragment resulting from first step is splitted again. Note, that both fragments resulting from this step keep associated with absolute deadline d = 3000.
Last decomposition step -splitting based on precedence constraints -requires to consider dependencies to other VMs. In particular, we split σ 2 at the beginning of tasks that require input from other VMs. In case of σ 2 , this results in splits at the beginning of τ 4,1 , τ 20,1 , and τ 22,1 .
Applying these decomposition steps to the other local schedules of our application scenario results in the scheduling fragments depicted in Fig. 3. Rectangles clustering tasks correspond to the results of our decomposition steps: outmost rectangles result from deadline-based decomposition, middle rectangles from splitting based on release times, and innermost rectangles result from splitting based on dependencies.

Multicore Scheduling by Time Partitions
Having local schedules decomposed into fragments, we now can allocate time partitions to dedicated cores of a multicore platform. In this example, we consider m = 3 VMs given by example applications introduced in this Section and p = 2 cores. Table 3 shows for each point in time -when the hypervisor can make scheduling decisions -why scheduling point occurs, what the current host time of hypervisor system is, which scheduling fragments are within sets R and N , and which scheduling fragments are scheduled next on cores C 1 and C 2 . For instance, applying rules defined in Sect. 4.3, hypervisor scheduling makes first decision based on deadlines of scheduling fragments. That is, ϕ υ1 1 and ϕ υ2 1 get higher priority than ϕ υ3 1 . Another interesting circumstance for making scheduling decision is at line 7 where "reason for scheduling" is ϕ υ1 2 . This is the first time, R does not contain scheduling fragments of all VMs because ϕ υ1 3 ∈ N requires input from ϕ υ3 2 that in worst case has not finished yet. Therefore, ϕ υ1 3 ∈ N is not passed to R and thus is not considered by hypervisor. Finally, mapping of scheduling fragments to cores is used to define time partitions I h resulting, e.g., for core C 1 in