D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens et al., The DASH prototype: Logic overhead and performance, IEEE Transactions on Parallel and Distributed Systems, vol.4, issue.1, pp.41-61, 1993.
DOI : 10.1109/71.205652

T. Mu, J. Tao, M. Schulz, and S. A. Mckee, Interactive locality optimization on NUMA architectures, Proceedings of the 2003 ACM symposium on Software visualization , SoftVis '03, p.133, 2003.
DOI : 10.1145/774833.774853

J. Marathe and F. Mueller, Hardware profile-guided automatic page placement for ccNUMA systems, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.90-99, 2006.
DOI : 10.1145/1122971.1122987

A. Joseph, J. Pete, and R. Alistair, Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, pp.338-352, 2006.

J. D. Mccalpin, STREAM: Sustainable memory bandwidth in high performance computers, 1995.

J. Y. Haoqiang-jin and M. Frumkin, The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance NAS System Division -NASA Ames Research Center Available: https://www.nas.nasa.gov, 1999.

C. P. Ribeiro, M. Castro, L. G. Fernandes, A. Carissimi, and J. Méhaut, Memory Affinity for Hierarchical Shared Memory Multiprocessors, 2009 21st International Symposium on Computer Architecture and High Performance Computing, 2009.
DOI : 10.1109/SBAC-PAD.2009.16

URL : https://hal.archives-ouvertes.fr/hal-00788914

Z. Smith, Bandwidth: a memory bandwidth benchmark for x86 x86_64 ARM based Linux and ARM Windows MobileCE

. The-benchit and . Project, Performance Measurement for Scientific Applications, 2010.

D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System, 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pp.261-270, 2009.
DOI : 10.1109/PACT.2009.22

D. H. Bailey, E. Barzcz, L. Dagum, and H. D. Simon, Nas parallel benchmark results, IEEE Concurrency, vol.1, issue.1, pp.43-51, 1993.

M. F. Jin, J. A. Yan, E. Kayi, T. Kornkven, S. El-ghazawi et al., The OpenMP Implementation of NAS Parallel Benchmarks and its Performance Available: www.nas.nasa.gov/News/Techreports A NUMA API for Linux Performance evaluation of clusters with ccnuma nodes -a case study, High Performance Computing and Communications, 10th IEEE International Conference on, pp.320-327, 1999.

S. R. Alam, R. F. Barrett, J. A. Kuehn, P. C. Roth, and J. S. Vetter, Characterization of Scientific Workloads on Systems with Multi-Core Processors, 2006 IEEE International Symposium on Workload Characterization, pp.225-236, 2006.
DOI : 10.1109/IISWC.2006.302747

H. Pourreza and P. Graham, On the programming impact ofmulticore ,multi-processor nodes in mpi clusters, High Performance Computing Systems and Applications, Annual International Symposium on, 2007.

A. M. Deflumere and S. R. Alam, Exploring multi-core limitations through comparison of contemporary systems, " in TAPIA '09: The Fifth Richard Tapia Celebration of Diversity in Computing Conference, pp.75-80, 2009.

. Intel, Server and Embedded Processor Technology -Intel Available: http://www.intel.com INRIA Centre de recherche INRIA Grenoble ? Rhône-Alpes 655, 2009.