X. Zhang and X. Qin, Performance prediction and evaluation of parallel processing on a NUMA multiprocessor, IEEE Transactions on Software Engineering, vol.17, issue.10, pp.1059-1068, 1991.
DOI : 10.1109/32.99193

R. P. Larowe, . Jr, C. S. Ellis, and M. A. Holliday, Evaluation of NUMA memory management through modeling and measurements, IEEE Transactions on Parallel and Distributed Systems, vol.3, issue.6, pp.686-701, 1992.
DOI : 10.1109/71.180624

T. B. Brecht, On the importance of parallel application placement in NUMA multiprocessors . InProc, SEDMS IV, Symposium on Experiences with Distributed and Multiprocessor Systems, USENIX Association, pp.1-18, 1993.

M. A. Holliday and M. Stumm, Performance evaluation of hierarchical ring-based shared memory multiprocessors, IEEE Transactions on Computers, vol.43, issue.1, pp.52-67, 1994.
DOI : 10.1109/12.250609

U. Drepper, What every programmer should know about memory, 2007.

A. Kleen, A NUMA API for linux, Novell Inc, 2005.

C. P. Ribeiro, J. Mé-haut, A. Carissimi, and L. G. Fernandes, Memory Affinity for Hierachical Shared Memory Multiprocessors, 21st International Symposium on Computer Architecture and High Performance Computing, pp.59-66, 2009.
DOI : 10.1109/sbac-pad.2009.16

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.596.4598

C. Lameter, Local and remote memory: Memory in a Linux, NUMA system, 2006.

F. Broquedis, N. Furmento, B. Goglin, P. Wacrenier, and R. Namyst, ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, 2010.
DOI : 10.1007/s10766-010-0136-3

URL : https://hal.archives-ouvertes.fr/inria-00496295

R. Yang, J. Antony, A. Rendell, D. Robson, and P. Strazdins, Profiling Directed NUMA Optimization on Linux Systems: A Case Study of the Gaussian Computational Chemistry Code, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.1046-1057, 2011.
DOI : 10.1109/IPDPS.2011.100

C. Mccurdy and J. Vetter, Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.87-96, 2010.
DOI : 10.1109/ISPASS.2010.5452060

E. Cruz, C. Pousa, M. Alves, A. Carissimi, P. Navaux et al., Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms, IEEE International Parallel & Distributed Processing Symposium, pp.551-558, 2011.

M. Diener, F. Madruga, E. Rodrigues, M. Alves, J. Schneider et al., Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), pp.491-496, 2010.
DOI : 10.1109/HPCC.2010.114

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.2173

C. Osiakwan and S. Akl, The maximum weight perfect matching problem for complete weighted graphs is in pc, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, pp.880-887, 1990.

M. Castro, L. G. Fernandes, C. P. Ribeiro, J. Mé-haut, and M. S. De-aguiar, NUMA-ICTM: A parallel version of ICTM exploiting memory placement strategies for NUMA machines, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-8, 2009.
DOI : 10.1109/IPDPS.2009.5161155

URL : https://hal.archives-ouvertes.fr/hal-00788917

E. Cruz, M. Alves, A. Carissimi, P. Navaux, C. Pousa et al., Memory-aware Thread and Data Mapping for Hierarchical Multi-core Platforms, International Journal of Networking and Computing, pp.97-116, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00953051

M. Tudor, Y. Teo, and S. See, Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems, 2011 International Conference on Parallel Processing, pp.602-611, 2011.
DOI : 10.1109/ICPP.2011.59

E. R. Rodrigues, F. L. Madruga, P. O. Navaux, and J. Panetta, Multi-core aware process mapping and its impact on communication overhead of parallel applications, 2009 IEEE Symposium on Computers and Communications, pp.811-817, 2009.
DOI : 10.1109/ISCC.2009.5202271

J. Hursey, J. M. Squyres, and T. Dontje, Locality-Aware Parallel Process Mapping for Multi-core HPC Systems, 2011 IEEE International Conference on Cluster Computing, pp.527-531, 2011.
DOI : 10.1109/CLUSTER.2011.59

P. J. Drongowski, Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors, 2007.