Embedded systems market by hardware (mpu, mcu, application specific ic / application specific standard product, dsp, fpga, and memory), software (middleware and operating system), application, and geography-global forecast to 2023, 2017. ,
Power reduction techniques for microprocessor systems, ACM Comput. Surv, vol.37, issue.3, pp.195-237, 2005. ,
Industry trends: Chip makers turn to multicore processors, Computer, vol.38, pp.11-13, 2005. ,
Time-critical computing on a single-chip massively parallel processor, Proceedings of the Conference on Design, Automation & Test in Europe, DATE '14, vol.97, pp.1-97, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01090449
On-chip interconnection architecture of the tile processor, IEEE Micro, pp.15-31, 2007. ,
Knights landing: Second-generation intel xeon phi product, IEEE Micro, pp.34-46, 2016. ,
The worst-case execution-time problem: Overview of methods and survey of tools, ACM Trans. Embed. Comput. Syst, vol.7, issue.3, pp.1-36, 2008. ,
The challenge of time-predictability in modern many-core architectures, 14th International Workshop on Worst-Case Execution Time Analysis, vol.39, pp.63-72, 2014. ,
Contention in multicore hardware shared resources: Understanding of the state of the art, 14th International Workshop on Worst-Case Execution Time Analysis, OpenAccess Series in Informatics (OASIcs), pp.31-42, 2014. ,
An empirical characterization of stream programs and its implications for language and compiler design, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pp.365-376, 2010. ,
Parallel FFT algorithms on network-onchips, Fifth International Conference on Information Technology: New Generations (ITNG 2008), pp.1087-1093, 2008. ,
Misconceptions about real-time computing: A serious problem for next-generation systems, Computer, vol.21, pp.10-19, 1988. ,
, Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications, 2011.
Loop bound analysis based on a combination of program slicing, abstract interpretation, and invariant analysis, 2007. ,
Analysing switch-case tables by partial evaluation, 7th Intl. Workshop on Worst-Case Execution Time (WCET) Analysis, 2007. ,
Static determination of dynamic properties of programs, Proceedings of the Second International Symposium on Programming, pp.106-130, 1976. ,
Performance analysis of embedded software using implicit path enumeration, Proceedings of the 32Nd Annual ACM/IEEE Design Automation Conference, pp.456-461, 1995. ,
, AbsInt Angewandte Informatik GmbH, "ait wcet analyzers
Bound-t time and stack analyser ,
Otawa: An open toolbox for adaptive wcet analysis, 8th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems (SEUS), pp.35-46, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-01055378
The heptane static worst-case execution time estimation tool, 17th International Workshop on Worst-Case Execution Time Analysis, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01590444
Rapitime white paper-worst-case execution time analysis, 2008. ,
Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Computing Surveys, vol.31, issue.4, pp.406-471, 1999. ,
A survey of hard real-time scheduling for multiprocessor systems, ACM Computing Surveys, vol.43, issue.4, p.44, 2011. ,
Analysis of cache-related preemption delay in fixed-priority preemptive scheduling, IEEE Trans. Comput, vol.47, pp.700-713, 1998. ,
Tightening the bounds on feasible preemptions, ACM Trans. Embed. Comput. Syst, vol.10, pp.1-34, 2011. ,
Limited preemptive scheduling for realtime systems. a survey, IEEE Trans. on Indus. Infor, vol.9, pp.3-15, 2013. ,
Is semi-partitioned scheduling practical?, Proceedings of the 2011 23rd Euromicro Conference on Real-Time Systems, pp.125-135, 2011. ,
Semi-distributed load balancing for massively parallel multicomputer systems, IEEE Trans. Softw. Eng, vol.17, pp.987-1004, 1991. ,
Moore's law: Past, present, and future, IEEE Spectr, vol.6, pp.52-59, 1997. ,
Intel multi-core processors: Making the move to quad-core and beyond, tech. rep, 2006. ,
Cache coherence protocols: Evaluation using a multiprocessor simulation model, ACM Trans. Comput. Syst, vol.4, pp.273-298, 1986. ,
The case for a single-chip multiprocessor, SIGOPS Oper. Syst. Rev, vol.5, pp.2-11, 1996. ,
Parallel Programming in OpenMP, 2000. ,
Multithreaded Programming With PThreads, 1998. ,
A generic and compositional framework for multicore response time analysis, Proceedings of the 23rd International Conference on Real Time and Networks Systems, pp.129-138, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01231700
Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip, RTSS, 2007. ,
,
T-crest: Time-predictable multi-core architecture for embedded systems, J. Syst. Archit, pp.449-471, 2015. ,
The case for the precision timed (pret) machine, Proceedings of the 44th Annual Design Automation Conference, pp.264-265, 2007. ,
A case for intelligent ram, IEEE Micro, vol.17, issue.2, pp.34-44, 1997. ,
The effect of context switches on cache performance, Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp.75-84, 1991. ,
An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms, pp.213-232, 2003. ,
Computer Architecture, Fifth Edition: A Quantitative Approach, 2011. ,
A survey on replacement strategies in cache memory for embedded systems, Distributed Computing, VLSI, Electrical Circuits and Robotics, pp.12-17, 2016. ,
A study of replacement algorithms for a virtual-storage computer, IBM Syst. J, pp.78-101, 1966. ,
Amortized efficiency of list update and paging rules, Commun. ACM, vol.28, issue.2, pp.202-208, 1985. ,
Caches in WCET analysis, 2008. ,
Timing anomalies in dynamically scheduled microprocessors, Proceedings of the 20th IEEE Real-Time Systems Symposium, pp.12-21, 1999. ,
An overview of approaches towards the timing analysability of parallel architecture," in Bringing Theory to Practice: Predictability and Performance in Embedded Systems, pp.32-41, 2011. ,
Timing analysis of concurrent programs running on shared cache multi-cores, Real-time Systems, vol.48, issue.6, pp.638-680, 2012. ,
Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches, Proceedings of the 30th IEEE Real-Time Systems Symposium, RTSS, pp.68-77, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00380298
Wcet analysis for multi-core processors with shared l2 instruction caches, Proceedings of the 2008 IEEE Real-Time and Embedded Technology and Applications Symposium, pp.80-89, 2008. ,
Shared data caches conflicts reduction for wcet computation in multi-core architectures, 18th International Conference on RealTime and Network Systems, pp.80-89, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00531214
A unified wcet analysis framework for multicore platforms, ACM Trans. Embed. Comput. Syst, vol.13, pp.1-29, 2014. ,
Wcet analysis for multi-core processors with shared buses and event-driven bus arbitration, Proceedings of the 23rd International Conference on Real Time and Networks Systems, pp.193-202, 2015. ,
Parallelism analysis: Precise WCET values for complex multi-core systems, Sci. Comput. Program, pp.175-193, 2017. ,
Making shared caches more predictable on multicore platforms, Proceedings of the 2013 25th Euromicro Conference on Real-Time Systems, pp.157-167, 2013. ,
Realtime cache management framework for multi-core architectures, Proceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp.45-54, 2013. ,
Pret dram controller: Bank privatization for predictability and temporal isolation, Proceedings of the Seventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp.99-108, 2011. ,
PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms, 20th IEEE RealTime and Embedded Technology and Applications Symposium, pp.155-166, 2014. ,
Memory servers for multicore systems, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pp.97-108, 2016. ,
Memory bandwidth management for efficient performance isolation in multi-core platforms, IEEE Trans. Computers, pp.562-576, 2016. ,
The multi-resource server for predictable execution on multi-core platforms, 20th IEEE Real-Time and Embedded Technology and Applications Symposium, pp.1-12, 2014. ,
Modeling shared cache and bus in multi-cores for timing analysis, Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems, SCOPES '10, vol.6, pp.1-6, 2010. ,
Static analysis of multi-core tdma resource arbitration delays, Real-Time Syst, vol.50, issue.2, pp.185-229, 2014. ,
A predictable execution model for cots-based embedded systems, Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS '11, pp.269-279, 2011. ,
WCET analysis of a parallel 3d multigrid solver executed on the MERASA multi-core, 10th International Workshop on Worst-Case Execution Time Analysis, WCET 2010, pp.90-100, 2010. ,
Automatic wcet analysis of real-time parallel applications, 13th Workshop on Worst-Case Execution Time Analysis (WCET 2013), pp.11-20, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01239727
Minimizing the cost of synchronisations in the wcet of real-time parallel programs, Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, pp.98-107, 2014. ,
Integrated worst-case execution time estimation of multicore applications, 13th International Workshop on Worst-Case Execution Time Analysis, vol.30, pp.21-31, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00909330
Optimal scheduling strategies in a multiprocessor system, IEEE Trans. Comput, vol.21, pp.137-146, 1972. ,
Multiprocessor scheduling of processes with release times, deadlines, precedence, and exclusion relations, IEEE Trans. Softw. Eng, vol.19, pp.139-154, 1993. ,
Practical multiprocessor scheduling algorithms for efficient parallel processing, IEEE Trans. Comput, vol.33, issue.11, pp.1023-1029, 1984. ,
Scheduling precedence graphs in systems with interprocessor communication times, SIAM J. Comput, vol.18, pp.244-257, 1989. ,
Combined task and message scheduling in distributed real-time systems, IEEE Trans. Parallel Distrib. Syst, vol.10, pp.1179-1191, 1999. ,
Static mapping of real-time applications onto massively parallel processor arrays, Proceedings of the 2014 14th International Conference on Application of Concurrency to System Design, ACSD '14, pp.112-121, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01095130
Many-core scheduling of data parallel applications using SMT solvers, pp.615-622, 2014. ,
Reducing the contention experienced by real-time core-to-i/o flows over a tilera-like network on chip, 28th Euromicro Conference on Real-Time Systems, ECRTS 2016, pp.86-96, 2016. ,
URL : https://hal.archives-ouvertes.fr/cea-01838135
Shared cache aware task mapping for WCRT minimization, 8th Asia and South Pacific Design Automation Conference, ASP-DAC, pp.735-740, 2013. ,
On the design and implementation of a cacheaware multicore real-time scheduler, 21st Euromicro Conference on Real-Time Systems, pp.194-204, 2009. ,
Tightening contention delays while scheduling parallel applications on multi-core architectures, ACM Trans. Embed. Comput. Syst, vol.16, p.20, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01590508
Quantifying wcet reduction of parallel applications by introducing slack time to limit resource contention, Proceedings of the 25th International Conference on Real-Time Networks and Systems, pp.188-197, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01590532
Memory-centric scheduling for multicore hard real-time systems, Real-Time Systems, vol.48, issue.6, pp.681-715, 2012. ,
Memory-processor co-scheduling in fixed priority systems, Proceedings of the 23rd International Conference on Real Time and Networks Systems, pp.87-96, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01249107
Memory efficient global scheduling of real-time tasks, 21st IEEE Real-Time and Embedded Technology and Applications Symposium, pp.285-296, 2015. ,
Contentionfree execution of automotive applications on a clustered many-core platform, 28th Euromicro Conference on Real-Time Systems, ECRTS, pp.14-24, 2016. ,
Mapping mixed-criticality applications on multi-core architectures, Proceedings of the Conference on Design, Automation & Test in Europe, pp.1-6, 2014. ,
Cache-aware scheduling and analysis for multicores, Proceedings of the Seventh ACM International Conference on Embedded Software, EMSOFT '09, pp.245-254, 2009. ,
Task assignment with cache partitioning and locking for wcet minimization on mpsoc, Proceedings of the 2010 39th International Conference on Parallel Processing, pp.573-582, 2010. ,
Cache-conscious offline real-time task scheduling for multi-core processors, 29th Euromicro Conference on Real-Time Systems (ECRTS 2017), 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01590421
Benchmarking and comparison of the task graph scheduling algorithms, Journal of Parallel and Distributed Computing, vol.59, pp.381-422, 1999. ,
The variability of application execution times on a multi-core platform, 16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016), pp.1-11, 2016. ,
Gurobi optimizer reference manual, 2015. ,
Response time analysis of synchronous data flow programs on a many-core processor, Proceedings of the 24th International Conference on Real-Time Networks and Systems, RTNS '16, pp.67-76, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01406145
A generic and compositional framework for multicore response time analysis, International Conference on Real Time and Networks Systems, RTNS '15, pp.129-138, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01231700
Improving the worst-case execution time accuracy by inter-task instruction cache analysis, IEEE Second International Symposium on Industrial Embedded Systems, SIES, pp.25-32, 2007. ,
Off-line mapping of multi-rate dependent task sets to many-core platforms, Real-Time Systems, vol.51, issue.5, pp.526-565, 2015. ,
Mapping hard real-time applications on many-core processors, Proceedings of the 24th International Conference on Real-Time Networks and Systems, RTNS '16, pp.235-244, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01692702
Temporal isolation of hard real-time applications on many-core processors, 2016 IEEE RealTime and Embedded Technology and Applications Symposium (RTAS), pp.37-47, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01585055
Integrated scratchpad memory optimization and task scheduling for mpsoc architectures, International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '06, pp.401-410, 2006. ,
Optimizing preemption-overhead accounting in multiprocessor real-time systems, Proceedings of the 22Nd International Conference on Real-Time Networks and Systems, RTNS '14, vol.235, p.243, 2014. ,
Scheduling with preemption delays: Anomalies and issues, Proceedings of the 23rd International Conference on Real Time and Networks Systems, RTNS '15, pp.109-118, 2015. ,
BUNDLE: real-time multi-threaded scheduling to reduce cache contention, IEEE Real-Time Systems Symposium, RTSS, pp.279-290, 2016. ,
Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems, IEEE Trans. on CAD of Integrated Circuits and Systems, vol.28, issue.7, pp.966-978, 2009. ,
Low-complexity algorithms for static cache locking in multitasking hard real-time systems, Proceedings of the 23rd IEEE Real-Time Systems Symposium, pp.114-123, 2002. ,
Dynamic instruction cache locking in hard real-time systems, International Conference on Real-Time Networks and Systems (RTNS), pp.1-10, 2006. ,
Scheduling of parallel applications on many-core architectures with caches: bridging the gap between WCET analysis and schedulability analysis. 9th Junior Workshop on Real Time computing, in conjunction with RTNS, 2015. ,
30 3.2 Summary of the characteristics of StreamIt benchmarks in our case studies.. 36 3.3 The size of code and communicated data for each benchmark (average µ and standard deviation ?), 29th Euromicro Conference on Real-Time Systems (ECRTS), 2017. ,
WCETs (average µ / standard deviation ?) without cache reuse and weighted average WCET reduction, Tasks ,
Comparison of CILP and CLS (schedule length and run time of schedule generation) ,
Cost of estimating cache reuse ,
, Notations used in the ILP formulation in the adapted stage, p.61
The size of code and communicated data for each benchmark (average µ and standard deviation ?) ,
71 4.4 Performance comparison between ACILP and the double fixed-points algorithm proposed in [94] ,
8 1.2 Task graph of a parallel version of a 8-input Fast Fourier Transform (FFT) application [11], List of Figures 1.1 The influence of scheduling strategies on the WCET of tasks ,
Typical parameters of real-time tasks ,
The variation of execution times of a task depending on the input data or different behavior of environment ,
An example of multi-core architecture ,
The location of cache ,
An example of memory hierarchy ,
Fast Fourier Transform (FFT) application [11] ,
An example of the swapping of tasks' allocation ,
,
, Gain of CILP as compared to NCILP (gain = sl N CILP ? sl CILP sl N CILP * 100) on a 16 cores system
, Gain of CLS as compared to NCLS (gain = sl N CLS ? sl CLS sl N CLS * 100) on a 16 cores system
The reuse pattern found in the Lattice benchmark ,
43 3.10 Impact of the number of cores on schedule length (CLS method), p.44 ,
, , p.45
, Comparison of schedules lengths for CLS using different tasks weight functions in the case that tasks are sorted in the list according their top levels, p.46
, Comparison of schedules lengths for CLS using different tasks weight functions in the case that tasks are sorted in the list according their bottom levels, p.47
, SMEM interleaved address mapping
, SMEM memory request flow
Structure of our proposed time-driven scheduler ,
Illustrative example of the delay to the start time of a task caused by the execution of the sched function ,
Illustrative example of the effect of data miss-alignment, p.57 ,
Two stages in producing static time-driven cache-conscious schedules to be implemented on a Kalray MPPA-256 compute cluster ,
The difference in the execution of a task in a basic cache-conscious schedule and an adapted cache-conscious schedule ,
59 4.10 The illustrative example of assigning the trigger time of tasks, p.62 ,
The mapping and the schedule of all tasks on two cores of the application whose DAG was depicted in Figure 4.9 ,
, The schedule graph constructed based on the scheduling information in the adapted cache-conscious schedule as depicted in Figure 4.11, p.72
The fraction of the overall overhead by each practical issue to the length of schedule graphs ,