J. Dongarra, The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community, International Journal of High Performance Computing Applications, vol.23, issue.4, pp.309-322, 2009.
DOI : 10.1177/1094342009347714

C. Ma, Y. M. Teo, V. March, N. Xiong, I. R. Pop et al., An approach for matching communication patterns in parallel applications, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009.
DOI : 10.1109/IPDPS.2009.5161035

B. W. Kernighan and S. Lin, An Efficient Heuristic Procedure for Partitioning Graphs, Bell System Technical Journal, vol.49, issue.2, pp.291-307, 1970.
DOI : 10.1002/j.1538-7305.1970.tb01770.x

T. Hoefler and M. Snir, Generic topology mapping strategies for large-scale parallel architectures, Proceedings of the international conference on Supercomputing, ICS '11, pp.75-84, 2011.
DOI : 10.1145/1995896.1995909

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

B. Brandfass, T. Alrutz, and T. Gerhold, Rank reordering for MPI communication optimization, Computers & Fluids, vol.80, pp.372-380, 2013.
DOI : 10.1016/j.compfluid.2012.01.019

G. Mercier and E. Jeannot, Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, EuroMPI, pp.39-49, 2011.
DOI : 10.1007/978-3-642-24449-0_7

URL : https://hal.archives-ouvertes.fr/hal-00643151

T. Ogasawara, NUMA-aware memory manager with dominant-thread-based copying GC, Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications,OOPSLA'09Ne wY ork, pp.377-390, 2009.

Q. Yin and T. Roscoe, VF2x: Fast, Efficient Virtual Network Mapping for Real Testbed Workloads, Testbeds and Research Infrastructure. Development of Networks and Communities of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp.271-286, 2012.
DOI : 10.1007/978-3-642-12963-6_3

T. Hatazaki, Rank reordering strategy for MPI topology creation functions, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.1-8, 1998.
DOI : 10.1007/BFb0056575

A. L. Rosenberg, Issues in the study of graph embeddings, WG'80, pp.150-176, 1981.
DOI : 10.1007/3-540-10291-4_12

S. Bokhari, On the Mapping Problem, IEEE Transactions on Computers, vol.30, issue.3, pp.207-214, 1981.
DOI : 10.1109/TC.1981.1675756

S. Lee and J. K. Aggarwal, A mapping strategy for parallel processing, IEEE Transactions on Computers, vol.36, issue.4, pp.433-442, 1987.

S. W. Bollinger and S. F. Midkiff, Heuristic technique for processor and link assignment in multicomputers, IEEE Transactions on Computers, vol.40, issue.3, pp.325-333, 1991.
DOI : 10.1109/12.76410

C. Sudheer and A. Srinivasan, Optimization of the hop-byte metric for effective topology aware mapping, 2012 19th International Conference on High Performance Computing, pp.1-9, 2012.
DOI : 10.1109/HiPC.2012.6507513

H. D. Simon and S. Teng, How Good is Recursive Bisection?, SIAM Journal on Scientific Computing, vol.18, issue.5, pp.1436-1445, 1997.
DOI : 10.1137/S1064827593255135

G. Karypis and V. Kumar, METIS ? unstructured graph partitioning and sparse matrix ordering system, version 2.0, " tech. rep, 1995.

K. Schloegel, G. Karypis, and V. Kumar, Parallel multilevel algorithms for multiconstraint graph partitioning (distinguished paper), Proceedings from the 6th International Euro-Par Conference on Parallel Processing, pp.296-310, 2000.

B. Hendrickson and R. Leland, The Chaco user's guide: Version 2.0, Tech. Rep, pp.94-2692, 1994.

K. Devine, E. Boman, R. Heaphy, B. Hendrickson, and C. Vaughan, Zoltan data management services for parallel dynamic applications, Computing in Science & Engineering, vol.4, issue.2, pp.90-97, 2002.
DOI : 10.1109/5992.988653

F. Pellegrini, Static mapping by dual recursive bipartitioning of process architecture graphs, Proceedings of IEEE Scalable High Performance Computing Conference, pp.486-493, 1994.
DOI : 10.1109/SHPCC.1994.296682

C. Walshaw and M. Cross, JOSTLE - Multilevel Graph Partitioning Software: An Overview, Mesh Partitioning Techniques and Domain Decomposition Techniques (F. Magoules, pp.27-58, 2007.
DOI : 10.4203/csets.17.2

N. R. Adiga, M. A. Blumrich, D. Chen, P. Coteus, A. Gara et al., Blue Gene/L torus interconnection network, IBM Journal of Research and Development, vol.49, issue.2.3, pp.265-276, 2005.
DOI : 10.1147/rd.492.0265

D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara et al., The IBM Blue Gene/Q interconnection network and message unit, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-2610, 2011.
DOI : 10.1145/2063384.2063419

R. Alverson, D. Roweth, and L. Kaplan, The Gemini System Interconnect, 2010 18th IEEE Symposium on High Performance Interconnects, pp.83-87, 2010.
DOI : 10.1109/HOTI.2010.23

Y. Ajima, S. Sumimoto, and T. Shimizu, Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers, Computer, vol.42, issue.11, pp.36-40, 2009.
DOI : 10.1109/MC.2009.370

C. E. Leiserson, Fat-trees: Universal networks for hardware-efficient supercomputing, IEEE Transactions on Computers, vol.34, issue.10, pp.892-901, 1985.
DOI : 10.1109/TC.1985.6312192

F. Petrini, M. Vanneschi, S. R. Ohring, M. Ibel, S. K. Das et al., K-ary n-trees: High performance networks for massively parallel architectures, " tech. rep., 1995. 30 On generalized fattrees, inIPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, p.37, 1995.

J. Kim, W. Dally, S. Scott, and D. Abts, Cost-Efficient Dragonfly Topology for Large-Scale Systems, IEEE Micro, vol.29, issue.1, pp.33-40, 2009.
DOI : 10.1109/MM.2009.5

B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel et al., The PERCS High-Performance Interconnect, 2010 18th IEEE Symposium on High Performance Interconnects, 2010.
DOI : 10.1109/HOTI.2010.16

G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese et al., Cray cascade: a scalable HPC system based on aD r a g o n fl yn e t w o r k, nProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis,S C' 1 2, pp.1-1039, 2012.

P. Balaji, R. Gupta, A. Vishnu, and P. H. Beckman, Mappingc o m m u n i c a t i o nl a y o u t s to network hardware characteristics on massive-scale Blue Gene systems, Computer Science ? R&D, vol.26, pp.3-4247, 2011.

B. E. Smith and B. Bode, Performance Effects of Node Mappings on the IBM BlueGene/L Machine, Lecture Notes in Computer Science, vol.3648, pp.1005-1013, 2005.
DOI : 10.1007/11549468_110

H. Yu, I. Chung, and J. E. Moreira, Blue Gene system software---Topology mapping for Blue Gene/L supercomputer, Proceedings of the 2006 ACM/IEEE conference on Supercomputing , SC '06, p.116, 2006.
DOI : 10.1145/1188455.1188576

H. Subramoni, S. Potluri, K. Kandalla, B. Barth, J. Vienne et al., Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, p.12, 2012.
DOI : 10.1109/SC.2012.47

M. J. Rashti, J. Green, P. Balaji, A. Afsahi, and W. Gropp, Multi-core and Network Aware MPI Topology Functions, Lecture Notes in Computer Science, vol.6960, pp.50-60, 2011.
DOI : 10.1007/978-3-642-24449-0_8

J. L. Träff-alamitos and C. , Implementing the MPI process topology mechanism, Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp.1-14, 2002.

S. Ito, K. Goto, and K. Ono, Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments, Computers & Fluids, vol.80, pp.88-93, 2013.
DOI : 10.1016/j.compfluid.2012.04.024

S. Von-alfthan, I. Honkonen, and M. Palmroth, Topology Aware Process Mapping, Lecture Notes in Computer Science, vol.7782, pp.297-308, 2013.
DOI : 10.1007/978-3-642-36803-5_21

E. Jeannot and G. Mercier, Near-optimal placement of MPIp ro cesseso nh ierarch ical NUMA architectures, Euro-Par 2010-Parallel Processing, pp.199-210, 2010.

E. Jeannot, G. Mercier, and F. Tessier, Process placement in multicore clusters: Algorithmic issues and practical techniques, IEEE Transactions on Parallel and Distributed Systems, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00921605

T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. De-supinski, R. Thakur et al., The scalable process topology interface of MPI 2.2, Concurrency and Computation: Practice and Experience, pp.293-310, 2010.
DOI : 10.1002/cpe.1643

E. Cuthill and J. Mckee, Reducing the bandwidth of sparses y m m e t r i cm a t r i c e s, Proceedings of the 1969 24th national conference,A C M' 6 9, pp.157-172, 1969.

H. Chen, W. Chen, J. Huang, B. Robert, and H. Kuhn, MPIPP, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.353-360, 2006.
DOI : 10.1145/1183401.1183451

E. Rodrigues, F. Madruga, P. Navaux, and J. Panetta, Multicore aware process mapping and its impact on communication overhead of parallel applications, Proceedings of the IEEE Symp. on Comp. and Comm, pp.811-817, 2009.

G. Mercier and J. Clet-ortega, Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments, EuroPVM/MPI,v ol.5759ofLecture Notes in Computer Science, pp.104-115, 2009.
DOI : 10.1007/978-3-642-03770-2_17

URL : https://hal.archives-ouvertes.fr/inria-00392581

J. Dümmler, T. Rauber, and G. Rünger, Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters, 2008 37th International Conference on Parallel Processing, pp.141-148, 2008.
DOI : 10.1109/ICPP.2008.42

H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, and J. P. Vary, Topology-Aware Mappings for Large-Scale Eigenvalue Problems, Euro-Par 2012 Parallel Processing ? 18th International Conference,v o l .7 4 8 4o fLecture Notes in Computer Science, pp.830-842, 2012.
DOI : 10.1007/978-3-642-32820-6_82

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings of the 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

J. Hursey, J. M. Squyres, and T. Dontje, Locality-Aware Parallel Process Mapping for Multi-core HPC Systems, 2011 IEEE International Conference on Cluster Computing, pp.527-531, 2011.
DOI : 10.1109/CLUSTER.2011.59

G. B. Justin, L. Whitt, and M. Fahey, Cray MPT: MPI on the CrayXT, 2011.

D. Solt, A profile based approach for topology aware MPI rank placement, 2007.

E. Duesterwald, R. W. Wisniewski, P. F. Sweeney, G. Cascaval, and S. E. Smith, Method and system for optimizing communication in MPI programs for an execution environment, 2008.

V. Venkatesan, R. Anand, E. Gabriel, and J. Subhlok, Optimized process placement for collective I/O operations, Proceedings of the 20th European MPI Users' Group Meeting on, EuroMPI '13, 2013.
DOI : 10.1145/2488551.2488567

L. Kale and S. Krishnan, CHARM++: A portable concurrent object oriented system based on C++, Proceedings of Object-Oriented Programming, Systems, Languages and Applications (OOPSLA) 93, pp.91-108, 1993.

A. Bhatelé, L. V. Kalé, and S. Kumar, Dynamic topology aware load balancing algorithms for molecular dynamics applications, Proceedings of the 23rd international conference on Conference on Supercomputing, ICS '09, pp.110-116, 2009.
DOI : 10.1145/1542275.1542295

A. Bhatel and L. V. , Benefits of topology aware mappingformeshinterconnects, Parallel Processing Letters, pp.549-566, 2008.

L. L. Pilla, C. P. Ribeiro, D. Cordeiro, C. Mei, A. Bhatele et al., A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems, 2012 41st International Conference on Parallel Processing, pp.118-127, 2012.
DOI : 10.1109/ICPP.2012.9

URL : https://hal.archives-ouvertes.fr/hal-00788012

U. Consortium, UPC language specifications, v1.2, 2005.
DOI : 10.2172/862127