Performance analysis of idle programs, ACM SIGPLAN Notices, vol.45, issue.10, pp.739-753, 2010. ,
DOI : 10.1145/1932682.1869519
The landscape of parallel computing research: A view from berkeley, 2006. ,
Achieving predictable performance through better memory controller placement in many-core CMPs, ACM SIGARCH Computer Architecture News, vol.37, issue.3, pp.451-461, 2009. ,
DOI : 10.1145/1555815.1555810
Mitigating Amdahl's Law through EPI Throttling, Computer Architecture, 2005. ISCA '05. Proceedings . 32nd International Symposium on, pp.298-309, 2005. ,
DOI : 10.1145/1080695.1069995
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.6394
The kill rule for multicore, Design Automation Conference, pp.750-753, 2007. ,
SimpleScalar: an infrastructure for computer system modeling, Computer, vol.35, issue.2, pp.59-67, 2002. ,
DOI : 10.1109/2.982917
Validity of the single processor approach to achieving large scale computing capabilities, spring joint computer conference, AFIPS '67 (Spring), Proceedings of the, pp.483-485, 1967. ,
Energy-performance tradeoffs in processor architecture and circuit design, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.26-36, 2010. ,
DOI : 10.1145/1816038.1815967
Redefining the Role of the CPU in the Era of CPU-GPU Integration, IEEE Micro, vol.32, issue.6, pp.324-340, 2012. ,
DOI : 10.1109/MM.2012.57
Intel unveils 72-core x86 knights landing cpu for exascale supercomputing, 2013. ,
A mechanistic performance model for superscalar in-order processors, 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pp.14-24, 2012. ,
DOI : 10.1109/ISPASS.2012.6189202
Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance, ACM Transactions on Architecture and Code Optimization, vol.11, issue.4, pp.1-5026, 2015. ,
DOI : 10.1145/2678277
Brook for gpus: Stream computing on graphics hardware, ACM SIGGRAPH 2004 Papers, SIGGRAPH '04, pp.777-786, 2004. ,
Cilk: An efficient multithreaded runtime system, 1995. ,
Throughput-effective onchip networks for manycore accelerators, Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MI- CRO '43, pp.421-432, 2010. ,
The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.72-81, 2008. ,
DOI : 10.1145/1454115.1454128
Fidelity and scaling of the PARSEC benchmark inputs, IEEE International Symposium on Workload Characterization (IISWC'10), pp.1-10, 2010. ,
DOI : 10.1109/IISWC.2010.5649519
Design challenges of technology scaling, Micro, IEEE, vol.19, issue.4, pp.23-29, 1999. ,
Thousand core chips, Proceedings of the 44th annual conference on Design automation, DAC '07, pp.746-749, 2007. ,
DOI : 10.1145/1278480.1278667
CellSs: a Programming Model for the Cell BE Architecture, ACM/IEEE SC 2006 Conference (SC'06), pp.5-5, 2006. ,
DOI : 10.1109/SC.2006.17
Bronis de Supinski, and Martin Schulz. A regression-based approach to scalability prediction, Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS '08, pp.368-377, 2008. ,
The impact of performance asymmetry in emerging multicore architectures, Computer Architecture, 2005. ISCA '05. Proceedings. 32nd International Symposium on, pp.506-517, 2005. ,
A benchmark suite for high performance java. Concurrency -Practice and Experience, pp.375-388, 2000. ,
A communication characterisation of Splash-2 and Parsec, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.86-97, 2009. ,
DOI : 10.1109/IISWC.2009.5306792
Reducing memory latency via non-blocking and prefetching caches, ACM SIGPLAN Notices, vol.27, issue.9, pp.51-61, 1992. ,
DOI : 10.1145/143371.143486
Rodinia: A benchmark suite for heterogeneous computing, Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC '09, pp.44-54, 2009. ,
Fast simulation of computer architectures, 1995. ,
Yoga: A hybrid dynamic vliw/ooo processor, 2014. ,
Shade: A fast instruction-set simulator for execution profiling, Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIG- METRICS '94, pp.128-137, 1994. ,
Toward a software infrastructure for the cyclops-64 cellular architecture, In the 20th International Symposium on High Performance Computing Systems and Applications (HPCS2006), 2006. ,
Computer architecture in the many-core era, Computer Design ICCD 2006. International Conference on, pp.1-1, 2006. ,
Design of ion-implanted mosfet's with very small physical dimensions. Solid-State Circuits, IEEE Journal, vol.9, issue.5, pp.256-268, 1974. ,
Microarchitectural Design Space Exploration Using an Architecture-Centric Approach, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.262-271, 2007. ,
DOI : 10.1109/MICRO.2007.12
OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998. ,
DOI : 10.1109/99.660313
The new linux perf tools, Slides from Linux Kongress, 2010. ,
Rapid identification of architectural bottlenecks via precise event counting, ACM SIGARCH Computer Architecture News, vol.39, issue.3, pp.353-364, 2011. ,
DOI : 10.1145/2024723.2000107
An introduction to analysis and optimization with amd codeanalyst performance analyzer, 2008. ,
Control flow modeling in statistical simulation for accurate and efficient processor design studies, Computer Architecture Proceedings. 31st Annual International Symposium on, pp.350-361, 2004. ,
Dark silicon and the end of multicore scaling, Proceedings of the 38th Annual International Symposium on Computer Architecture, pp.365-376, 2011. ,
Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications, 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pp.145-155, 2012. ,
DOI : 10.1109/ISPASS.2012.6189221
Modeling critical sections in amdahl's law and its implications for multicore design, Conference Proceedings Annual International Symposium on Computer Architecture, pp.362-370, 2010. ,
A mechanistic performance model for superscalar out-of-order processors, ACM Transactions on Computer Systems, vol.27, issue.2 ,
DOI : 10.1145/1534909.1534910
Dinero iv: trace-driven uniprocessor cache simulator ,
Perfmon: Linux performance monitoring for ia-64 Downloadable software with documentation, 2003. ,
Intel avx: New frontiers in performance improvements and energy efficiency, 2008. ,
Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), p.83, 2006. ,
DOI : 10.1109/SC.2006.55
Vliw machines: Multiprocessors we can acutally program, CompCon, pp.299-305, 1984. ,
A study of single-chip processor/cache organizations for large numbers of transistors, Computer Architecture Proceedings the 21st Annual International Symposium on, pp.338-347, 1994. ,
Comparative evaluation of latency reducing and tolerating techniques, Proceedings of the 18th Annual International Symposium on Computer Architecture, ISCA '91, pp.254-263, 1991. ,
Evaluating scalability of multi-threaded applications on a many-core platform, 2012. ,
Using MPI: portable parallel programming with the message-passing interface, 1999. ,
Mibench: A free, commercially representative embedded benchmark suite, Workload Characterization, pp.3-14, 2001. ,
Towards extremely fast context switching in a block-multithreaded processor, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies, pp.592-599, 1996. ,
DOI : 10.1109/EURMIC.1996.546486
Reevaluating Amdahl's law, Communications of the ACM, vol.31, issue.5, pp.532-533, 1988. ,
DOI : 10.1145/42411.42415
Adapteva: More flops, less watts, 2011. ,
Scaling, power, and the future of cmos, Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, pp.7-15, 2005. ,
Exploring the design space of future cmps, Parallel Architectures and Compilation Techniques Proceedings. 2001 International Conference on, pp.199-210, 2001. ,
When slower is faster: On heterogeneous multicores for reliable systems, Proceedings of USENIX ATC, 2013. ,
A 48-core ia-32 message-passing processor with dvfs in 45nm cmos, Solid- State Circuits Conference Digest of Technical Papers (ISSCC), pp.108-109, 2010. ,
Microarchitecture-independent workload characterization . Micro, IEEE, vol.27, issue.3, pp.63-72, 2007. ,
DOI : 10.1109/mm.2007.56
An integrated GPU power and performance model, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.280-289, 2010. ,
DOI : 10.1145/1816038.1815998
Haswell: The fourth-generation intel core processor, IEEE Micro, issue.2, pp.6-20, 2014. ,
Amdahl's law in the multicore era, Computer, vol.41, issue.7, pp.33-38, 2008. ,
Computer Architecture: A Quantitative Approach, 2003. ,
An Approach to Performance Prediction for Parallel Applications, Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Euro-Par'05, pp.196-205, 2005. ,
DOI : 10.1007/11549468_24
Core fusion, ACM SIGARCH Computer Architecture News, vol.35, issue.2, pp.186-197, 2007. ,
DOI : 10.1145/1273440.1250686
Efficiently exploring architectural design spaces via predictive modeling, SIGPLAN Not, issue.11, pp.41195-206, 2006. ,
little system architecture from arm: saving power through heterogeneous multiprocessing and task context migration, DAC, pp.1143-1146, 2012. ,
The openmp implementation of nas parallel benchmarks and its performance, 1999. ,
Amdahl's law for predicting the future of multicores considered harmful, ACM SIGARCH Computer Architecture News, vol.40, issue.2, pp.1-9, 2012. ,
DOI : 10.1145/2234336.2234338
Super-scalar Processor Design, pp.89-25892, 1989. ,
Intel Xeon Phi Coprocessor High Performance Programming, 2013. ,
Construction and Use of Linear Regression Models for Processor Performance Analysis, The Twelfth International Symposium on High-Performance Computer Architecture, 2006., pp.99-108, 2006. ,
DOI : 10.1109/HPCA.2006.1598116
A Predictive Performance Model for Superscalar Processors, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pp.161-170, 2006. ,
DOI : 10.1109/MICRO.2006.6
NVIDIA cuda software and gpu parallel computing architecture, Proceedings of the 6th international symposium on Memory management , ISMM '07, pp.103-104, 2007. ,
DOI : 10.1145/1296907.1296909
URL : http://hdl.handle.net/2099.2/571
256 many-core processor, 2013. ,
Lonestar: A suite of parallel irregular programs, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.65-76, 2009. ,
DOI : 10.1109/ISPASS.2009.4919639
Keshav Pingali, and Calin Casçaval. How much parallelism is there in irregular applications? SIGPLAN Not, pp.3-14, 2009. ,
Single-isa heterogeneous multi-core architectures: the potential for processor power reduction, Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pp.81-92, 2003. ,
Optimistic parallelism requires abstractions, ACM SIGPLAN Notices, vol.42, issue.6, pp.211-222, 2007. ,
DOI : 10.1145/1273442.1250759
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.8687
A first-order superscalar processor model, SIGARCH Comput. Archit. News, vol.32, issue.2, p.338, 2004. ,
Composable Lightweight Processors, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.381-394, 2007. ,
DOI : 10.1109/MICRO.2007.41
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.8778
Morphcore: An energy-efficient microarchitecture for high performance ilp and high throughput tlp, Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pp.305-316, 2012. ,
Power7: IBM's Next-Generation Server Processor, IEEE Micro, vol.30, issue.2, pp.7-15, 2010. ,
DOI : 10.1109/MM.2010.38
Core architecture optimization for heterogeneous chip multiprocessors, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, pp.23-32, 2006. ,
DOI : 10.1145/1152154.1152162
Abstract execution: A technique for efficiently tracing programs. Software: Practice and Experience, pp.1241-1258, 1990. ,
Accurate and efficient regression modeling for microarchitectural performance and power prediction, In ACM SIGPLAN Notices, vol.41, pp.185-194, 2006. ,
Methods of inference and learning for performance modeling of parallel applications, Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '07, pp.249-258, 2007. ,
Pin: Building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pp.190-200, 2005. ,
Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications, Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC'12, pp.6-6, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00991709
Oprofile manual, 2004. ,
When prefetching works, when it doesn’t, and why, ACM Trans. Archit. Code Optim, vol.92, issue.1, pp.1-229, 2012. ,
DOI : 10.1145/2133382.2133384
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.2743
Cooperative prefetching: Compiler and hardware support for effective instruction prefetching in modern processors, Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 31, pp.182-194, 1998. ,
NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008. ,
DOI : 10.1109/MM.2008.31
Introduction to intel advanced vector extensions, 2011. ,
Composite Cores: Pushing Heterogeneity Into a Core, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp.317-328, 2012. ,
DOI : 10.1109/MICRO.2012.37
Mediabench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pp.330-335, 1997. ,
Limits of control flow on parallelism, Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA '92, pp.46-57, 1992. ,
On Performance Analysis of a Multithreaded Application Parallelized by Different Programming Models Using Intel VTune, Proceedings of the 11th International Conference on Parallel Computing Technologies , PaCT'11, pp.317-331, 2011. ,
DOI : 10.1007/978-3-642-03869-3_62
Papi: A portable interface to hardware performance counters, Proceedings of the Department of Defense HPCMP Users Group Conference, pp.7-10, 1999. ,
Parallel performance measurement of heterogeneous parallel systems with gpus, Parallel Processing (ICPP), 2011 International Conference on, pp.176-185, 2011. ,
Simics: A full system simulation platform, Computer, vol.35, issue.2, pp.50-58, 2002. ,
Combining branch predictors, 1993. ,
Implications of Merging Phases on Scalability of Multi-core Architectures, 2011 International Conference on Parallel Processing, pp.622-631, 2011. ,
DOI : 10.1109/ICPP.2011.74
Cramming more components onto integrated circuits, Proceedings of the IEEE, pp.82-85, 1998. ,
Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors Pthreads programming, In IN PROCEEDINGS OF THE INTERNATIONAL CON- FERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, pp.2-10, 1998. ,
Impact of serial scaling of multi-threaded programs in many-core era, 5th Workshop on Applications for Multi-Core Architectures, SBAC-PAD, 2014. ,
Plast: parallel local alignment search tool for database comparison, BMC Bioinformatics, vol.10, p.329, 2009. ,
From single core to multi-core, Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design , ICCAD '06, pp.67-72, 2006. ,
DOI : 10.1145/1233501.1233516
Intel threading building blocks, Journal of Computing Sciences in Colleges, vol.23, issue.4, pp.298-298, 2008. ,
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite, ACM SIGARCH Computer Architecture News, vol.35, issue.2, pp.412-423, 2007. ,
DOI : 10.1145/1273440.1250713
Complexity-effective superscalar processors, ACM SIGARCH Computer Architecture News, vol.25, issue.2, pp.206-218, 1997. ,
DOI : 10.1145/384286.264201
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.527.5571
New microarchitecture challenges in the coming generations of cmos process technologies, 45th Annual IEEE/ACM International Symposium on Microarchitecture, 1999. ,
Characterization of the eembc benchmark suite, 2007. ,
EOLE: Paving the Way for an Effective Implementation of Value Prediction Research Report RR- 8402 A fait l'objet d'une publication au Bebop: A cost effective predictor infrastructure for superscalar value prediction, International Symposium on Computer Architecture (ISCA) 2014 High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on, pp.13-25, 2013. ,
Instrumentation Tools, Fast Simulation of Computer Architectures, pp.47-86, 1995. ,
DOI : 10.1007/978-1-4615-2361-1_3
MMX technology extension to the Intel architecture, IEEE Micro, vol.16, issue.4, pp.42-50, 1996. ,
DOI : 10.1109/40.526924
VTune performance analyzer essentials, 2005. ,
Complete computer system simulation: The simos approach. Parallel & Distributed Technology: Systems & Applications, IEEE, vol.3, issue.4, pp.34-43, 1995. ,
Tiptop: Hardware Performance Counters for the Masses, 2012 41st International Conference on Parallel Processing Workshops, 2011. ,
DOI : 10.1109/ICPPW.2012.58
URL : https://hal.archives-ouvertes.fr/hal-00639173
Studying microarchitectural structures with object code reordering, Proceedings of the Workshop on Binary Instrumentation and Applications, WBIA '09, pp.7-16, 2009. ,
DOI : 10.1145/1791194.1791196
Increasing processor performance by implementing deeper pipelines, Computer Architecture Proceedings. 29th Annual International Symposium on, pp.25-34, 2002. ,
ATOM, ACM SIGPLAN Notices, vol.39, issue.4, pp.528-539, 2004. ,
DOI : 10.1145/989393.989446
Defying amdahl's law -dal ,
A new case for the TAGE branch predictor, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.117-127, 2011. ,
DOI : 10.1145/2155620.2155635
URL : https://hal.archives-ouvertes.fr/hal-00639193
A 0.18 /spl mu/m cmos ia32 microprocessor with a 4 ghz integer execution unit, Solid-State Circuits Conference Digest of Technical Papers. ISSCC. 2001 IEEE International, pp.324-325, 2001. ,
Thread migration to improve synchronization performance, Workshop on Operating System Interference in High Performance Applications, p.35, 2006. ,
Hardware/Software Helper Thread Prefetching on Heterogeneous Many Cores, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, pp.214-221, 2014. ,
DOI : 10.1109/SBAC-PAD.2014.39
URL : https://hal.archives-ouvertes.fr/hal-01087752
A study of branch prediction strategies, 25 years of the international symposia on Computer architecture (selected papers) , ISCA '98, pp.135-148, 1981. ,
DOI : 10.1145/285930.285980
Accelerating critical section execution with asymmetric multi-core architectures, ACM SIGPLAN Notices, vol.44, issue.3, pp.253-264, 2009. ,
DOI : 10.1145/1508284.1508274
Importance of single-core performance in the multicore era, Proceedings of the Thirty-fifth Australasian Computer Science Conference, pp.107-114, 2012. ,
The basics of performance-monitoring hardware. Micro, IEEE, vol.22, issue.4, pp.64-71, 2002. ,
A 40nm 16-core 128-thread cmt sparc soc processor, Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp.98-99, 2010. ,
The free lunch is over: A fundamental turn toward concurrency in software, Dr. Dobb's Journal, vol.30, issue.3, 2005. ,
Simultaneous multithreading: Maximizing on-chip parallelism, Computer Architecture, 1995. Proceedings ., 22nd Annual International Symposium on, pp.392-403, 1995. ,
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
Spec benchmark suite: designed for today's advanced systems, SPEC Newsletter, vol.1, issue.11, 1989. ,
SP@CE - An SP-Based Programming Model for Consumer Electronics Streaming Applications, Languages and Compilers for Parallel Computing, pp.33-48, 2007. ,
DOI : 10.1007/978-3-540-72521-3_4
Conservation cores: Reducing the energy of mature computations, SIGARCH Comput. Archit. News, vol.38, issue.1, pp.205-218, 2010. ,
Limits of instruction-level parallelism, Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, pp.176-188, 1991. ,
Guided region prefetching: A cooperative hardware/software approach, Proceedings of the 30th Annual International Symposium on Computer Architecture, ISCA '03, pp.388-398, 2003. ,
Extending amdahl's law for energy-efficient computing in the many-core era, Computer, issue.12, pp.4124-4155, 2008. ,
Hitting the memory wall: Implications of the obvious, SIGARCH Comput. Archit. News, vol.23, issue.1, pp.20-24, 1995. ,
Intel clears up post tejas confusion, online at : http://www.crn.com/news/channel-programs/18842588/intel-clears-up- post-tejas-confusion.htm, 2004. ,
The splash-2 programs: Characterization and methodological considerations, In ACM SIGARCH Computer Architecture News, vol.23, pp.24-36, 1995. ,
Sparc64 X: Fujitsu's New-Generation 16-Core Processor for Unix Servers, IEEE Micro, vol.33, issue.6, pp.16-24, 2013. ,
DOI : 10.1109/MM.2013.126
The effect of communication and synchronization on Amdahl???s law in multicore systems, Parallel Computing, vol.40, issue.1, pp.1-16, 2014. ,
DOI : 10.1016/j.parco.2013.11.001
Two-level adaptive training branch prediction, Proceedings of the 24th Annual International Symposium on Microarchitecture, pp.51-61, 1991. ,
Performance modeling of memory latency hiding techniques ,
Amdahl considère une quantité de travail fixe quand la loi de Gustafson considère que la quantité de travail passè a l'´ echelle avec le nombre de coeurs ,
erreur obtenue lors de la prédiction de la performance pour {I = 32, P ? 240}, pour différentes applications, p.12 ,