D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter et al., The NAS parallel benchmarks---summary and preliminary results, Proceedings of the 1991 ACM/IEEE conference on Supercomputing , Supercomputing '91, pp.158-165, 1991.
DOI : 10.1145/125826.125925

S. Blagodurov, S. Zhuravlev, and A. Fedorova, Contention-Aware Scheduling on Multicore Systems, ACM Transactions on Computer Systems, vol.28, issue.4, pp.1-8, 2010.
DOI : 10.1145/1880018.1880019

B. D. Bui, M. Caccamo, L. Sha, and J. Martinez, Impact of cache partitioning on multitasking real time embedded systems, 4th IEEE Int. Conf. on Embedded and Real-Time Computing Systems and Applications, pp.101-110, 2008.

D. Dauwe, E. Jonardi, R. Friese, S. Pasricha, A. A. Maciejewski et al., A Methodology for Co-Location Aware Application Performance Modeling in Multicore Computing, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop
DOI : 10.1109/IPDPSW.2015.38

J. Dongarra, Report on the sunway taihulight system PDF). www. netlib. org, 2016.

T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud et al., A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-8311, 2012.
DOI : 10.1109/SC.2012.11

A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert et al., Scheduling the I/O of HPC Applications Under Congestion, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.1013-1022, 2015.
DOI : 10.1109/IPDPS.2015.116
URL : https://hal.archives-ouvertes.fr/hal-00983789

M. R. Garey and D. S. Johnson, Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979.

N. Guan, M. Stigge, W. Yi, and G. Yu, Cache-aware scheduling and analysis for multicores, Proceedings of the seventh ACM international conference on Embedded software, EMSOFT '09, pp.245-254, 2009.
DOI : 10.1145/1629335.1629369
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.9358

A. Hartstein, V. Srinivasan, T. Puzak, and P. Emma, On the nature of cache miss behavior: Is it ? 2, The Journal of Instruction-Level Parallelism, vol.10, pp.1-22, 2008.

L. He, H. Zhu, and S. A. Jarvis, Developing Graph-Based Co-Scheduling Algorithms on Multicore Computers, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.6, pp.1617-1632, 2016.
DOI : 10.1109/TPDS.2015.2468223
URL : http://wrap.warwick.ac.uk/71239/1/WRAP_Jarvis_tpds-co-scheduling-submit.pdf

. Intel, Intel 64 and IA-32 architectures software developer's manual, 2014.

Y. Jiang, X. Shen, J. Chen, and R. Tripathi, Analysis and approximation of optimal coscheduling on chip multiprocessors, Proc. 17th Int. Conf. Parallel Architectures Compilation Techniques, ser. PACT '08, pp.220-229, 2008.

A. Krishna, A. Samih, and Y. Solihin, Data sharing in multi-threaded applications and its impact on chip design, 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pp.125-134, 2012.
DOI : 10.1109/ISPASS.2012.6189219

E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu, Evaluating STT-RAM as an energy-efficient main memory alternative, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.256-267, 2013.
DOI : 10.1109/ISPASS.2013.6557176
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.600.2661

M. A. Laurenzano, M. M. Tikir, L. Carrington, and A. Snavely, PEBIL: Efficient static binary instrumentation for Linux, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.175-183, 2010.
DOI : 10.1109/ISPASS.2010.5452024
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.170.2621

J. Leverich and C. Kozyrakis, Reconciling high server utilization and sub-millisecond quality-of-service, Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, p.4, 2014.
DOI : 10.1145/2592798.2592821

D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis, Improving Resource Efficiency at Scale with Heracles, ACM Transactions on Computer Systems, vol.34, issue.2, p.6, 2016.
DOI : 10.1145/2882783

D. Molka, D. Hackenberg, R. Schone, and W. E. Nagel, Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture, 2015 44th International Conference on Parallel Processing, pp.739-748, 2015.
DOI : 10.1109/ICPP.2015.83

S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, Reducing memory interference in multicore systems via application-aware memory channel partitioning, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.374-385, 2011.
DOI : 10.1145/2155620.2155664

A. J. Pena and P. Balaji, Toward the efficient use of multiple explicitly managed memory subsystems, 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp.123-131, 2014.
DOI : 10.1109/CLUSTER.2014.6968756

M. K. Qureshi and Y. N. Patt, Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06), pp.423-432, 2006.
DOI : 10.1109/MICRO.2006.49

B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang et al., Scaling the bandwidth wall, ACM SIGARCH Computer Architecture News, vol.37, issue.3, pp.371-382, 2009.
DOI : 10.1145/1555815.1555801

C. Sewell, K. Heitmann, H. Finkel, G. Zagaris, S. T. Parete-koon et al., Large-scale compute-intensive analysis via a combined in-situ and co-scheduling workflow approach, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, p.50, 2015.
DOI : 10.1145/2807591.2807663

K. Tian, Y. Jiang, and X. Shen, A study on optimally co-scheduling jobs of different lengths on chip multiprocessors, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.41-50, 2009.
DOI : 10.1145/1531743.1531752

Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang, SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.406-418, 2014.
DOI : 10.1109/MICRO.2014.53

H. Zhu, L. He, B. Gao, K. Li, J. Sun et al., Modelling and Developing Co-scheduling Strategies on Multicore Processors, 2015 44th International Conference on Parallel Processing, pp.220-229, 2015.
DOI : 10.1109/ICPP.2015.31

S. Zhuravlev, S. Blagodurov, and A. Fedorova, Addressing shared resource contention in multicore processors via scheduling, ACM SIGPLAN Notices, vol.45, issue.3, pp.129-142, 2010.
DOI : 10.1145/1735971.1736036
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.369.7458