E. Altman, M. Arnold, S. Fink, and N. Mitchell, Performance analysis of idle programs, ACM SIGPLAN Notices, vol.45, issue.10, pp.739-753, 2010.
DOI : 10.1145/1932682.1869519

R. Asanovic, B. C. Bodik, J. J. Catanzaro, K. Gebis, D. A. Keutzer et al., The landscape of parallel computing research: A view from berkeley, 2006.

N. D. Abts, J. Enright-jerger, D. Kim, M. H. Gibson, and . Lipasti, Achieving predictable performance through better memory controller placement in many-core CMPs, ACM SIGARCH Computer Architecture News, vol.37, issue.3, pp.451-461, 2009.
DOI : 10.1145/1555815.1555810

E. [. Annavaram, J. Grochowski, and . Shen, Mitigating Amdahl's Law through EPI Throttling, Computer Architecture, 2005. ISCA '05. Proceedings . 32nd International Symposium on, pp.298-309, 2005.
DOI : 10.1145/1080695.1069995
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.6394

M. [. Agarwal and . Levy, The kill rule for multicore, Design Automation Conference, pp.750-753, 2007.

T. Austin, E. Larson, and D. Ernst, SimpleScalar: an infrastructure for computer system modeling, Computer, vol.35, issue.2, pp.59-67, 2002.
DOI : 10.1109/2.982917

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, spring joint computer conference, AFIPS '67 (Spring), Proceedings of the, pp.483-485, 1967.

A. Azizi, B. C. Mahesri, S. J. Lee, M. Patel, and . Horowitz, Energy-performance tradeoffs in processor architecture and circuit design, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.26-36, 2010.
DOI : 10.1145/1816038.1815967

S. M. Arora, S. Nath, S. B. Mazumdar, D. M. Baden, and . Tullsen, Redefining the Role of the CPU in the Era of CPU-GPU Integration, IEEE Micro, vol.32, issue.6, pp.324-340, 2012.
DOI : 10.1109/MM.2012.57

[. Anthony, Intel unveils 72-core x86 knights landing cpu for exascale supercomputing, 2013.

S. [. Breughe, L. Eyerman, and . Eeckhout, A mechanistic performance model for superscalar in-order processors, 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pp.14-24, 2012.
DOI : 10.1109/ISPASS.2012.6189202

M. B. Breughe, S. Eyerman, and L. Eeckhout, Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance, ACM Transactions on Architecture and Code Optimization, vol.11, issue.4, pp.1-5026, 2015.
DOI : 10.1145/2678277

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian et al., Brook for gpus: Stream computing on graphics hardware, ACM SIGGRAPH 2004 Papers, SIGGRAPH '04, pp.777-786, 2004.

D. Robert, . Blumofe, F. Christopher, . Joerg, C. Bradley et al., Cilk: An efficient multithreaded runtime system, 1995.

A. Bakhoda, J. Kim, and T. M. Aamodt, Throughput-effective onchip networks for manycore accelerators, Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MI- CRO '43, pp.421-432, 2010.

[. Bienia, S. Kumar, K. Singh, and . Li, The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.72-81, 2008.
DOI : 10.1145/1454115.1454128

K. [. Bienia and . Li, Fidelity and scaling of the PARSEC benchmark inputs, IEEE International Symposium on Workload Characterization (IISWC'10), pp.1-10, 2010.
DOI : 10.1109/IISWC.2010.5649519

]. S. Bor99 and . Borkar, Design challenges of technology scaling, Micro, IEEE, vol.19, issue.4, pp.23-29, 1999.

[. Borkar, Thousand core chips, Proceedings of the 44th annual conference on Design automation, DAC '07, pp.746-749, 2007.
DOI : 10.1145/1278480.1278667

P. Bellens, M. Josep, . Perez, M. Rosa, J. Badia et al., CellSs: a Programming Model for the Cell BE Architecture, ACM/IEEE SC 2006 Conference (SC'06), pp.5-5, 2006.
DOI : 10.1109/SC.2006.17

J. Bradley, B. Barnes, D. K. Rountree, J. Lowenthal, and . Reeves, Bronis de Supinski, and Martin Schulz. A regression-based approach to scalability prediction, Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS '08, pp.368-377, 2008.

R. [. Balakrishnan, M. Rajwar, K. Upton, and . Lai, The impact of performance asymmetry in emerging multicore architectures, Computer Architecture, 2005. ISCA '05. Proceedings. 32nd International Symposium on, pp.506-517, 2005.

M. Bull, A. Lorna, . Smith, D. Martin, . Westhead et al., A benchmark suite for high performance java. Concurrency -Practice and Experience, pp.375-388, 2000.

C. [. Barrow-williams, S. Fensch, and . Moore, A communication characterisation of Splash-2 and Parsec, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.86-97, 2009.
DOI : 10.1109/IISWC.2009.5306792

[. Chen and J. Baer, Reducing memory latency via non-blocking and prefetching caches, ACM SIGPLAN Notices, vol.27, issue.9, pp.51-61, 1992.
DOI : 10.1145/143371.143486

M. Shuai-che, J. Boyer, D. Meng, J. W. Tarjan, S. Sheaffer et al., Rodinia: A benchmark suite for heterogeneous computing, Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC '09, pp.44-54, 2009.

M. Thomas, . Conte, E. Charles, and . Gimarc, Fast simulation of computer architectures, 1995.

[. Carlos, A. J. Jose, M. Rustam, and Y. N. Patt, Yoga: A hybrid dynamic vliw/ooo processor, 2014.

[. Cmelik and D. Keppel, Shade: A fast instruction-set simulator for execution profiling, Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIG- METRICS '94, pp.128-137, 1994.

J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao, Toward a software infrastructure for the cyclops-64 cellular architecture, In the 20th International Symposium on High Performance Computing Systems and Applications (HPCS2006), 2006.

]. B. Dal06 and . Dally, Computer architecture in the many-core era, Computer Design ICCD 2006. International Conference on, pp.1-1, 2006.

H. Robert, . Dennard, H. Fritz, . Gaensslen, E. Leo-rideout et al., Design of ion-implanted mosfet's with very small physical dimensions. Solid-State Circuits, IEEE Journal, vol.9, issue.5, pp.256-268, 1974.

T. [. Dubach, M. F. Jones, and . O-'boyle, Microarchitectural Design Space Exploration Using an Architecture-Centric Approach, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.262-271, 2007.
DOI : 10.1109/MICRO.2007.12

L. Dagum and R. Menon, OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998.
DOI : 10.1109/99.660313

A. Carvalho-de-melo, The new linux perf tools, Slides from Linux Kongress, 2010.

J. Demme and S. Sethumadhavan, Rapid identification of architectural bottlenecks via precise event counting, ACM SIGARCH Computer Architecture News, vol.39, issue.3, pp.353-364, 2011.
DOI : 10.1145/2024723.2000107

J. Paul, . Drongowski, . Amd-codeanalyst, . Team, . Boston-design et al., An introduction to analysis and optimization with amd codeanalyst performance analyzer, 2008.

R. H. Lieven-eeckhout, B. Bell-jr, K. Stougie, . De-bosschere, K. Lizy et al., Control flow modeling in statistical simulation for accurate and efficient processor design studies, Computer Architecture Proceedings. 31st Annual International Symposium on, pp.350-361, 2004.

E. Hadi-esmaeilzadeh, R. Blem, . St, K. Amant, D. Sankaralingam et al., Dark silicon and the end of multicore scaling, Proceedings of the 38th Annual International Symposium on Computer Architecture, pp.365-376, 2011.

S. Eyerman, K. D. Bois, and L. Eeckhout, Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications, 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pp.145-155, 2012.
DOI : 10.1109/ISPASS.2012.6189221

[. Eyerman and L. Eeckhout, Modeling critical sections in amdahl's law and its implications for multicore design, Conference Proceedings Annual International Symposium on Computer Architecture, pp.362-370, 2010.

S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith, A mechanistic performance model for superscalar out-of-order processors, ACM Transactions on Computer Systems, vol.27, issue.2
DOI : 10.1145/1534909.1534910

J. Edler and M. D. Hill, Dinero iv: trace-driven uniprocessor cache simulator

S. Eranian, Perfmon: Linux performance monitoring for ia-64 Downloadable software with documentation, 2003.

M. Firasta, P. Buxton, K. Jinbo, S. Nasri, and . Kuo, Intel avx: New frontiers in performance improvements and energy efficiency, 2008.

D. Kayvon-fatahalian, . Reiter-horn, J. Timothy, L. Knight, M. Leem et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), p.83, 2006.
DOI : 10.1109/SC.2006.55

A. Joseph, . Fisher, J. John, and . Donnell, Vliw machines: Multiprocessors we can acutally program, CompCon, pp.299-305, 1984.

G. [. Farrens, A. R. Tyson, and . Pleszkun, A study of single-chip processor/cache organizations for large numbers of transistors, Computer Architecture Proceedings the 21st Annual International Symposium on, pp.338-347, 1994.

J. Gupta, K. Hennessy, T. Gharachorloo, W. Mowry, and . Weber, Comparative evaluation of latency reducing and tolerating techniques, Proceedings of the 18th Annual International Symposium on Computer Architecture, ISCA '91, pp.254-263, 1991.

H. Gupta, K. Kim, and . Schwan, Evaluating scalability of multi-threaded applications on a many-core platform, 2012.

[. Gropp, E. Lusk, and A. Skjellum, Using MPI: portable parallel programming with the message-passing interface, 1999.

R. Matthew, . Guthaus, S. Jeffrey, D. Ringenberg, . Ernst et al., Mibench: A free, commercially representative embedded benchmark suite, Workload Characterization, pp.3-14, 2001.

T. [. Grunewald and . Ungerer, Towards extremely fast context switching in a block-multithreaded processor, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies, pp.592-599, 1996.
DOI : 10.1109/EURMIC.1996.546486

J. L. Gustafson, Reevaluating Amdahl's law, Communications of the ACM, vol.31, issue.5, pp.532-533, 1988.
DOI : 10.1145/42411.42415

[. Gwennap, Adapteva: More flops, less watts, 2011.

M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar et al., Scaling, power, and the future of cmos, Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, pp.7-15, 2005.

J. Huh, D. Burger, and S. W. Keckler, Exploring the design space of future cmps, Parallel Architectures and Compilation Techniques Proceedings. 2001 International Conference on, pp.199-210, 2001.

[. Hruby, H. Bos, and A. S. Tanenbaum, When slower is faster: On heterogeneous multicores for reliable systems, Proceedings of USENIX ATC, 2013.

]. J. Hdh-+-10, S. Howard, Y. Dighe, S. Hoskote, D. Vangal et al., A 48-core ia-32 message-passing processor with dvfs in 45nm cmos, Solid- State Circuits Conference Digest of Technical Papers (ISSCC), pp.108-109, 2010.

L. [. Hoste and . Eeckhout, Microarchitecture-independent workload characterization . Micro, IEEE, vol.27, issue.3, pp.63-72, 2007.
DOI : 10.1109/mm.2007.56

S. Hong and H. Kim, An integrated GPU power and performance model, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.280-289, 2010.
DOI : 10.1145/1816038.1815998

R. Hko-+-14-]-per-hammarlund, R. B. Kumar, R. Osborne, R. Rajwar, R. D. Singhal et al., Haswell: The fourth-generation intel core processor, IEEE Micro, issue.2, pp.6-20, 2014.

D. Mark, . Hill, R. Michael, and . Marty, Amdahl's law in the multicore era, Computer, vol.41, issue.7, pp.33-38, 2008.

L. John, D. A. Hennessy, and . Patterson, Computer Architecture: A Quantitative Approach, 2003.

[. Ipek, R. Bronis, M. De-supinski, S. A. Schulz, and . Mckee, An Approach to Performance Prediction for Parallel Applications, Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Euro-Par'05, pp.196-205, 2005.
DOI : 10.1007/11549468_24

[. Ipek, M. Kirman, N. Kirman, and J. F. Martinez, Core fusion, ACM SIGARCH Computer Architecture News, vol.35, issue.2, pp.186-197, 2007.
DOI : 10.1145/1273440.1250686

E. Ipek, S. A. Mckee, R. Caruana, R. Bronis, M. De-supinski et al., Efficiently exploring architectural design spaces via predictive modeling, SIGPLAN Not, issue.11, pp.41195-206, 2006.

B. Jeff and . Big, little system architecture from arm: saving power through heterogeneous multiprocessing and task context migration, DAC, pp.1143-1146, 2012.

M. Hao-qiang-jin, J. Frumkin, and . Yan, The openmp implementation of nas parallel benchmarks and its performance, 1999.

C. [. Juurlink and . Meenderinck, Amdahl's law for predicting the future of multicores considered harmful, ACM SIGARCH Computer Architecture News, vol.40, issue.2, pp.1-9, 2012.
DOI : 10.1145/2234336.2234338

]. W. Joh89 and . Johnson, Super-scalar Processor Design, pp.89-25892, 1989.

J. Jeffers and J. Reinders, Intel Xeon Phi Coprocessor High Performance Programming, 2013.

[. Joseph, K. Vaswani, J. Matthew, and . Thazhuthaveetil, Construction and Use of Linear Regression Models for Processor Performance Analysis, The Twelfth International Symposium on High-Performance Computer Architecture, 2006., pp.99-108, 2006.
DOI : 10.1109/HPCA.2006.1598116

[. Joseph, K. Vaswani, J. Matthew, and . Thazhuthaveetil, A Predictive Performance Model for Superscalar Processors, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pp.161-170, 2006.
DOI : 10.1109/MICRO.2006.6

D. Kirk, NVIDIA cuda software and gpu parallel computing architecture, Proceedings of the 6th international symposium on Memory management , ISMM '07, pp.103-104, 2007.
DOI : 10.1145/1296907.1296909
URL : http://hdl.handle.net/2099.2/571

[. Kalray, 256 many-core processor, 2013.

M. Kulkarni, M. Burtscher, C. Casçaval, and K. Pingali, Lonestar: A suite of parallel irregular programs, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.65-76, 2009.
DOI : 10.1109/ISPASS.2009.4919639

M. Kulkarni, R. Burtscher, and . Inkulu, Keshav Pingali, and Calin Casçaval. How much parallelism is there in irregular applications? SIGPLAN Not, pp.3-14, 2009.

]. R. Kfj-+-03, K. I. Kumar, N. P. Farkas, P. Jouppi, D. M. Ranganathan et al., Single-isa heterogeneous multi-core architectures: the potential for processor power reduction, Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pp.81-92, 2003.

K. Kulkarni, B. Pingali, G. Walter, K. Ramanarayanan, . Bala et al., Optimistic parallelism requires abstractions, ACM SIGPLAN Notices, vol.42, issue.6, pp.211-222, 2007.
DOI : 10.1145/1273442.1250759
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.8687

S. Tejas, J. E. Karkhanis, and . Smith, A first-order superscalar processor model, SIGARCH Comput. Archit. News, vol.32, issue.2, p.338, 2004.

S. Kim, . Sethumadhavan, S. Madhu, N. Govindan, D. Ranganathan et al., Composable Lightweight Processors, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.381-394, 2007.
DOI : 10.1109/MICRO.2007.41
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.8778

. Ksh-+-12-]-k, M. A. Khubaib, M. Suleman, C. Hashemi, Y. N. Wilkerson et al., Morphcore: An energy-efficient microarchitecture for high performance ilp and high throughput tlp, Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pp.305-316, 2012.

[. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd, Power7: IBM's Next-Generation Server Processor, IEEE Micro, vol.30, issue.2, pp.7-15, 2010.
DOI : 10.1109/MM.2010.38

[. Kumar, D. M. Tullsen, and N. P. Jouppi, Core architecture optimization for heterogeneous chip multiprocessors, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, pp.23-32, 2006.
DOI : 10.1145/1152154.1152162

R. James and . Larus, Abstract execution: A technique for efficiently tracing programs. Software: Practice and Experience, pp.1241-1258, 1990.

C. Benjamin, . Lee, M. David, and . Brooks, Accurate and efficient regression modeling for microarchitectural performance and power prediction, In ACM SIGPLAN Notices, vol.41, pp.185-194, 2006.

C. Benjamin, D. M. Lee, . Brooks, R. Bronis, M. De-supinski et al., Methods of inference and learning for performance modeling of parallel applications, Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '07, pp.249-258, 2007.

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., Pin: Building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pp.190-200, 2005.

J. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller, Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications, Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC'12, pp.6-6, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00991709

J. Levon, Oprofile manual, 2004.

[. Lee, H. Kim, and R. Vuduc, When prefetching works, when it doesn’t, and why, ACM Trans. Archit. Code Optim, vol.92, issue.1, pp.1-229, 2012.
DOI : 10.1145/2133382.2133384
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.2743

C. Luk and T. C. Mowry, Cooperative prefetching: Compiler and hardware support for effective instruction prefetching in modern processors, Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 31, pp.182-194, 1998.

J. [. Lindholm, S. Nickolls, J. Oberman, and . Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008.
DOI : 10.1109/MM.2008.31

[. Lomont, Introduction to intel advanced vector extensions, 2011.

A. Lukefahr, S. Padmanabha, R. Das, M. Faissal, R. Sleiman et al., Composite Cores: Pushing Heterogeneity Into a Core, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp.317-328, 2012.
DOI : 10.1109/MICRO.2012.37

[. Lee, M. Potkonjak, and W. , Mediabench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pp.330-335, 1997.

M. S. Lam and R. P. Wilson, Limits of control flow on parallelism, Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA '92, pp.46-57, 1992.

A. Marowka, On Performance Analysis of a Multithreaded Application Parallelized by Different Programming Models Using Intel VTune, Proceedings of the 11th International Conference on Parallel Computing Technologies , PaCT'11, pp.317-331, 2011.
DOI : 10.1007/978-3-642-03869-3_62

J. Philip, S. Mucci, C. Browne, G. Deane, and . Ho, Papi: A portable interface to hardware performance counters, Proceedings of the Department of Defense HPCMP Users Group Conference, pp.7-10, 1999.

. D. Mbs-+-11-]-a, S. Malony, S. Biersdorff, H. Shende, S. Jagode et al., Parallel performance measurement of heterogeneous parallel systems with gpus, Parallel Processing (ICPP), 2011 International Conference on, pp.176-185, 2011.

S. Peter, M. Magnusson, J. Christensson, D. Eskilson, G. Forsgren et al., Simics: A full system simulation platform, Computer, vol.35, issue.2, pp.50-58, 2002.

S. Mcfarling, Combining branch predictors, 1993.

M. M. Manivannan, B. Juurlink, and P. Stenstrom, Implications of Merging Phases on Scalability of Multi-core Architectures, 2011 International Conference on Parallel Processing, pp.622-631, 2011.
DOI : 10.1109/ICPP.2011.74

]. G. Moo98 and . Moore, Cramming more components onto integrated circuits, Proceedings of the IEEE, pp.82-85, 1998.

P. Michaud, A. Seznec, and S. Jourdan, Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors Pthreads programming, In IN PROCEEDINGS OF THE INTERNATIONAL CON- FERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, pp.2-10, 1998.

. [. Natarajan, A. Bharath, and . Seznec, Impact of serial scaling of multi-threaded programs in many-core era, 5th Workshop on Applications for Multi-Core Architectures, SBAC-PAD, 2014.

H. Van, D. Nguyen, and . Lavenier, Plast: parallel local alignment search tool for database comparison, BMC Bioinformatics, vol.10, p.329, 2009.

J. Parkhurst, J. Darringer, and B. Grundmann, From single core to multi-core, Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design , ICCAD '06, pp.67-72, 2006.
DOI : 10.1145/1233501.1233516

C. Phe08 and . Pheatt, Intel threading building blocks, Journal of Computing Sciences in Colleges, vol.23, issue.4, pp.298-298, 2008.

[. Phansalkar, A. Joshi, and L. K. John, Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite, ACM SIGARCH Computer Architecture News, vol.35, issue.2, pp.412-423, 2007.
DOI : 10.1145/1273440.1250713

[. Palacharla, N. P. Jouppi, and J. E. Smith, Complexity-effective superscalar processors, ACM SIGARCH Computer Architecture News, vol.25, issue.2, pp.206-218, 1997.
DOI : 10.1145/384286.264201
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.527.5571

F. J. Pollack, New microarchitecture challenges in the coming generations of cmos process technologies, 45th Annual IEEE/ACM International Symposium on Microarchitecture, 1999.

J. Poovey, Characterization of the eembc benchmark suite, 2007.

A. Perais and A. Seznec, EOLE: Paving the Way for an Effective Implementation of Value Prediction Research Report RR- 8402 A fait l'objet d'une publication au Bebop: A cost effective predictor infrastructure for superscalar value prediction, International Symposium on Computer Architecture (ISCA) 2014 High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on, pp.13-25, 2013.

[. Pierce, D. Michael, T. Smith, and . Mudge, Instrumentation Tools, Fast Simulation of Computer Architectures, pp.47-86, 1995.
DOI : 10.1007/978-1-4615-2361-1_3

U. [. Peleg and . Weiser, MMX technology extension to the Intel architecture, IEEE Micro, vol.16, issue.4, pp.42-50, 1996.
DOI : 10.1109/40.526924

J. Reinders, VTune performance analyzer essentials, 2005.

M. Rosenblum, A. Stephen, E. Herrod, A. Witchel, and . Gupta, Complete computer system simulation: The simos approach. Parallel & Distributed Technology: Systems & Applications, IEEE, vol.3, issue.4, pp.34-43, 1995.

E. Rohou, Tiptop: Hardware Performance Counters for the Masses, 2012 41st International Conference on Parallel Processing Workshops, 2011.
DOI : 10.1109/ICPPW.2012.58
URL : https://hal.archives-ouvertes.fr/hal-00639173

[. Rahman, Z. Wang, and D. A. Jiménez, Studying microarchitectural structures with object code reordering, Proceedings of the Workshop on Binary Instrumentation and Applications, WBIA '09, pp.7-16, 2009.
DOI : 10.1145/1791194.1791196

D. [. Sprangle and . Carmean, Increasing processor performance by implementing deeper pipelines, Computer Architecture Proceedings. 29th Annual International Symposium on, pp.25-34, 2002.

[. Srivastava and A. Eustace, ATOM, ACM SIGPLAN Notices, vol.39, issue.4, pp.528-539, 2004.
DOI : 10.1145/989393.989446

A. Seznec, Defying amdahl's law -dal

A. Seznec, A new case for the TAGE branch predictor, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.117-127, 2011.
DOI : 10.1145/2155620.2155635
URL : https://hal.archives-ouvertes.fr/hal-00639193

]. D. Shu-+-01, G. Sager, M. Hinton, T. Upton, T. D. Chappell et al., A 0.18 /spl mu/m cmos ia32 microprocessor with a 4 ghz integer execution unit, Solid-State Circuits Conference Digest of Technical Papers. ISSCC. 2001 IEEE International, pp.324-325, 2001.

S. Sridharan, B. Keck, R. Murphy, S. Chandra, and P. Kogge, Thread migration to improve synchronization performance, Workshop on Operating System Interference in High Performance Applications, p.35, 2006.

A. [. Swamy, A. Ketterlin, and . Seznec, Hardware/Software Helper Thread Prefetching on Heterogeneous Many Cores, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, pp.214-221, 2014.
DOI : 10.1109/SBAC-PAD.2014.39
URL : https://hal.archives-ouvertes.fr/hal-01087752

J. E. Smith, A study of branch prediction strategies, 25 years of the international symposia on Computer architecture (selected papers) , ISCA '98, pp.135-148, 1981.
DOI : 10.1145/285930.285980

M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt, Accelerating critical section execution with asymmetric multi-core architectures, ACM SIGPLAN Notices, vol.44, issue.3, pp.253-264, 2009.
DOI : 10.1145/1508284.1508274

T. Sato, H. Mori, R. Yano, and T. Hayashida, Importance of single-core performance in the multicore era, Proceedings of the Thirty-fifth Australasian Computer Science Conference, pp.107-114, 2012.

]. B. Spr02 and . Sprunt, The basics of performance-monitoring hardware. Micro, IEEE, vol.22, issue.4, pp.64-71, 2002.

A. Leon and . Strong, A 40nm 16-core 128-thread cmt sparc soc processor, Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp.98-99, 2010.

[. Sutter, The free lunch is over: A fundamental turn toward concurrency in software, Dr. Dobb's Journal, vol.30, issue.3, 2005.

S. [. Tullsen, H. M. Eggers, and . Levy, Simultaneous multithreading: Maximizing on-chip parallelism, Computer Architecture, 1995. Proceedings ., 22nd Annual International Symposium on, pp.392-403, 1995.

G. [. Treibig, G. Hager, and . Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010.
DOI : 10.1109/ICPPW.2010.38

[. Uniejewski, Spec benchmark suite: designed for today's advanced systems, SPEC Newsletter, vol.1, issue.11, 1989.

A. L. Varbanescu, M. Nijhuis, A. González-escribano, H. Sips, H. Bos et al., SP@CE - An SP-Based Programming Model for Consumer Electronics Streaming Applications, Languages and Compilers for Parallel Computing, pp.33-48, 2007.
DOI : 10.1007/978-3-540-72521-3_4

. Vsg-+-10-]-ganesh, J. Venkatesh, N. Sampson, S. Goulding, V. Garcia et al., Conservation cores: Reducing the energy of mature computations, SIGARCH Comput. Archit. News, vol.38, issue.1, pp.205-218, 2010.

D. W. Wall, Limits of instruction-level parallelism, Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, pp.176-188, 1991.

D. Wang, K. S. Burger, S. K. Mckinley, C. C. Reinhardt, and . Weems, Guided region prefetching: A cooperative hardware/software approach, Proceedings of the 30th Annual International Symposium on Computer Architecture, ISCA '03, pp.388-398, 2003.

D. Hyuk, W. Hsien-hsin, and S. Lee, Extending amdahl's law for energy-efficient computing in the many-core era, Computer, issue.12, pp.4124-4155, 2008.

. A. Wm, S. A. Wulf, and . Mckee, Hitting the memory wall: Implications of the obvious, SIGARCH Comput. Archit. News, vol.23, issue.1, pp.20-24, 1995.

A. Wolfe, Intel clears up post tejas confusion, online at : http://www.crn.com/news/channel-programs/18842588/intel-clears-up- post-tejas-confusion.htm, 2004.

. Wot-+-95-]-steven-cameron, M. Woo, E. Ohara, . Torrie, A. Singh et al., The splash-2 programs: Characterization and methodological considerations, In ACM SIGARCH Computer Architecture News, vol.23, pp.24-36, 1995.

T. Toshio-yoshida, Y. Maruyama, R. Akizuki, N. Kan, K. Kiyota et al., Sparc64 X: Fujitsu's New-Generation 16-Core Processor for Unix Servers, IEEE Micro, vol.33, issue.6, pp.16-24, 2013.
DOI : 10.1109/MM.2013.126

[. Yavits, A. Morad, and R. Ginosar, The effect of communication and synchronization on Amdahl???s law in multicore systems, Parallel Computing, vol.40, issue.1, pp.1-16, 2014.
DOI : 10.1016/j.parco.2013.11.001

Y. Tse, Y. N. Yeh, and . Patt, Two-level adaptive training branch prediction, Proceedings of the 24th Annual International Symposium on Microarchitecture, pp.51-61, 1991.

[. Zhou and T. Conte, Performance modeling of memory latency hiding techniques

L. Loi and D. , Amdahl considère une quantité de travail fixe quand la loi de Gustafson considère que la quantité de travail passè a l'´ echelle avec le nombre de coeurs

M. Diagramme-en-bo??tesbo??tes-montrant-les-valeurs, M. , M. , and .. , erreur obtenue lors de la prédiction de la performance pour {I = 32, P ? 240}, pour différentes applications, p.12