J. Antony, P. P. Janes, and A. P. Rendell, Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, Proceedings of the International Conference on High Performance C omputing (HiPC), 2006.
DOI : 10.1007/11945918_35

E. Ayguade, M. Gonzalez, X. Martorell, and G. Jost, Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications, 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004.

S. Benkner and T. Brandes, Efficient parallel programming on scalable shared memory systems with High Performance Fortran, Concurrency: Practice and Experience, pp.789-803, 2002.
DOI : 10.1002/cpe.649

T. Brecht, On the Importance of Parallel Application Placement in NUMA Multiprocessors, Proceedings of the Fourth Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), 1993.

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks et al., Introduction to UPC and Language Specification, 1999.

B. M. Chapman, F. Bregier, A. Patil, and A. Prabhakar, Achieving performance under OpenMP on cc- NUMA and software distributed shared memory systems, Concurrency: Practice and Experience, pp.713-739, 2002.

B. M. Chapman, L. Huang, H. Jin, G. Jost, and B. R. De-supinski, Extending openmp worksharing directives for multithreading, EuroPar'06 Parallel Processing, 2006.

R. Dolbeau, S. Bihan, and F. Bodin, HMPP: A hybrid multi-core parallel programming environment, 2007.

A. Duran, J. M. Perez, E. Ayguade, R. Badia, and J. Labarta, Extending the openmp tasking model to allow dependant tasks, IWOMP Proceedings, 2008.

M. Frigo, C. E. Leiserson, and K. H. Randall, The Implementation of the Cilk-5 Multithreaded Language, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1998.

B. Goglin and N. Furmento, Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009.
DOI : 10.1109/IPDPS.2009.5161101

URL : https://hal.archives-ouvertes.fr/inria-00358172

H. Löf and S. Holmgren, affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system, 19th ACM International Conference on Supercomputing, pp.387-392

J. D. Mccalpin, Memory bandwidth and machine balance in current high performance computers, IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter pp, pp.19-25, 1995.

D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé, User-level dynamic page migration for multiprogrammed shared-memory multiprocessors, Proceedings 2000 International Conference on Parallel Processing, pp.95-103, 2000.
DOI : 10.1109/ICPP.2000.876083

D. S. Nikolopoulos, C. D. Polychronopoulos, T. S. Papatheodorou, J. Labarta, and E. Ayguadé, Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors, Journal of Parallel and Distributed Computing, vol.62, issue.6, pp.1069-1103, 2002.
DOI : 10.1006/jpdc.2001.1817

M. Nordén, H. Löf, J. Rantakokko, and S. Holmgren, Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers, Second International Workshop on OpenMP, 2006.
DOI : 10.1007/978-3-540-68555-5_31

F. Song, S. Moore, and J. Dongarra, Feedback-directed thread scheduling with memory considerations, Proceedings of the 16th international symposium on High performance distributed computing , HPDC '07, 2007.
DOI : 10.1145/1272366.1272380

M. Steckermeier and F. Bellosa, Using Locality Information in Userlevel Scheduling, 1995.

C. Terboven, D. Mey, D. Schmidl, H. Jin, and T. Reichstein, Data and thread affinity in openmp programs, Proceedings of the 2008 workshop on Memory access on future processors a solved problem?, MAW '08, pp.377-384, 2008.
DOI : 10.1145/1366219.1366222

S. Thibault, R. Namyst, and P. A. Wacrenier, Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, Euro-Par, 2007.
DOI : 10.1007/978-3-540-74466-5_6

URL : https://hal.archives-ouvertes.fr/inria-00154506

R. Yang, J. Antony, P. P. Janes, and A. P. Rendell, Memory and Thread Placement Effects as a Function of Cache Usage: A Study of the Gaussian Chemistry Code on the SunFire X4600 M2, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008), pp.31-36, 2008.
DOI : 10.1109/I-SPAN.2008.13