The PARSEC benchmark suite: Characterization and architectural implications, PACT. ACM, pp.72-81, 2008. ,
Rodinia: A benchmark suite for heterogeneous computing, IEEE Workload Characterization Symposium, vol.0, pp.44-54, 2009. ,
MIMD Synchronization on SIMT Architectures, 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016. ,
Tarantula: a vector extension to the alpha architecture, 29th Annual International Symposium on Computer Architecture. IEEE, pp.281-292, 2002. ,
Out-of-order vector architectures, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pp.160-170, 1997. ,
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware, ACM Transactions on Architecture and Code Optimization (TACO), vol.6, p.7, 2009. ,
High-performance Cholesky factorization for GPU-only execution, Proceedings of the General Purpose GPUs, pp.42-52, 2017. ,
Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading, Proceedings. Fifth International Symposium On. IEEE, pp.64-67, 1999. ,
URL : https://hal.archives-ouvertes.fr/inria-00073298
, Intel 64 and IA-32 architectures optimization reference manual, 2017.
DITVA: Dynamic Inter-Thread Vectorization Architecture, J. Parallel and Distrib. Comput, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01655904
Whole-function vectorization, CGO. IEEE, pp.141-150, 2011. ,
Using intra-core loop-task accelerators to improve the productivity and performance of task-based parallel programs, Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, pp.759-773, 2017. ,
Exploring the design space of SPMD divergence management on data-parallel architectures, 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, pp.101-113, 2014. ,
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pp.469-480, 2009. ,
NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, pp.39-55, 2008. ,
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp.337-348, 2010. ,
Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation, SIGPLAN Not, vol.40, pp.190-200, 2005. ,
Discerning the Dominant Out-of-Order Performance Advantage: Is It Speculation or Dynamism, Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13, pp.241-252, 2013. ,
Execution Drafting: Energy Efficiency Through Computation Deduplication, Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47, pp.432-444, 2014. ,
Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads, Parallel Comput, vol.40, pp.548-558, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01087054
A Survey of CPU-GPU Heterogeneous Computing Techniques, ACM Comput. Surv, vol.47, 2015. ,
The GPU Computing Era, IEEE Micro, vol.30, pp.56-69, 2010. ,
Speculative dynamic vectorization, 29th Annual International Symposium on Computer Architecture. IEEE, pp.271-280, 2002. ,
Control-flow independence reuse via dynamic vectorization, 19th IEEE International Parallel and Distributed Processing Symposium, p.10, 2005. ,
ispc: A SPMD compiler for high-performance CPU programming, Innovative Parallel Computing (InPar), 2012. ,
SYRANT: SYmmetric Resource Allocation on Not-taken and Taken Paths, ACM Transactions on Architecture and Code Optimization (TACO) -HIPEAC Papers, vol.8, p.4, 2012. ,
Efficient Out-of-Order Execution of Guarded ISAs, ACM Transactions on Architecture and Code Optimization, vol.11, pp.1-21, 2014. ,
Two-Stage, Pipelined Register Renaming, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.19, pp.1926-1931, 2011. ,
A New Case for the TAGE Branch Predictor, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44, pp.117-127, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00639193
Efficiently scaling out-of-order cores for simultaneous multithreading, In ACM SIGARCH Computer Architecture News, vol.44, pp.431-443, 2016. ,
The ARM Scalable Vector Extension, IEEE Micro, vol.37, issue.2, pp.26-39, 2017. ,
Dynamic Vectorization: A Mechanism for Exploiting Far-Flung ILP in Ordinary Programs, 26th International Symposium on Computer Architecture, pp.16-27, 1999. ,
Register renaming and scheduling for dynamic execution of predicated code, International Symposium on High-Performance Computer Architecture (HPCA), pp.15-25, 2001. ,
The Performance Potential for Single Application Heterogeneous Systems, 8th Workshop on Duplicating, Deconstructing, and Debunking, 2009. ,