H. Torsten, M. Guillaume, and . Chapter, An Overview of Process Mapping Techniques and Algorithms in High-Performance Computing, High Performance Computing on Complex Environments. Sous la dir. d'Emmanuel JEANNOT et Julius ?ILINSKAS, pp.65-84, 2014.

G. Yiannis, Topology-Aware Job Mapping. » In : International Journal of high-Performance Computing applications, vol.32, pp.14-27, 2018.

G. Brice, Hardware topology management in MPI applications through hierarchical communicators, Parallel Computing, vol.76, pp.70-90, 2018.

J. Emmanuel, M. Guillaume, and T. François, Process Placement in Multicore Clusters : Algorithmic Issues and Practical Techniques, vol.25, pp.993-1002, 2014.

B. Darius, M. Guillaume, and G. William, « Implementation and Evaluation of Shared-Memory Communication and Synchronization Operations in MPICH2 using the Nemesis Communication Subsystem, Parallel Computing, vol.33, pp.634-644, 2007.

A. Olivier, « High Performance Computing on Heterogeneous Clusters with the Madeleine II Communication Library, Cluster Computing, vol.5, pp.43-54, 2002.

, Conférences internationales avec comité de lecture

V. Purushotham and . Bangalore, « Exposition, Clarification, and Expansion of MPI Semantic Terms and Conventions : Is a nonblocking function permitted to block ?, Proceedings of the 26th European MPI Users' Group Meeting, 2019.

B. George, Online Dynamic Monitoring of MPI Communications, Euro-Par 2017 : Parallel Processing -23rd International Conference on Parallel and Distributed Computing, pp.49-62, 2017.

G. Yiannis, Topology-aware resource management for HPC applications, Proceedings of the 18th International Conference on Distributed Computing and Networking, p.17, 2017.

J. Emmanuel, F. Mansouri, and M. Guillaume, A hierarchical model to manage hardware topology in MPI applications, Proceedings of the 24th

M. European and . Users, Group Meeting, vol.9, pp.1-9, 2017.

J. Emmanuel, « Communication and Topology-aware Load Balancing in Charm++ with TreeMatch, IEEE Cluster, pp.1-8, 2013.

J. Emmanuel and M. Guillaume, « Improving MPI Applications Performance on Multicore Clusters with Rank Reordering, pp.39-49, 2011.

B. François, Hwloc : a Generic Framework for Managing Hardware Affinities in HPC Applications, Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), 2010.

E. Jeannot and M. Guillaume, Sous la dir. de Pasqua D'AMBRA, Mario Rosario GUARRACINO et Domenico TALIA. T. 6272, Euro-Par 2010 Parallel Processing Europar, pp.199-210, 2010.

B. Darius, Cache-Efficient, Intranode Large-Message MPI Communication with MPICH2-Nemesis, Proceedings of the 38th International Conference on Parallel Processing, 2009.

M. Guillaume and C. Jérôme, Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments, EuroPVM/MPI. T. 5759, pp.104-115, 2009.

M. Guillaume, An Efficient Support for High-Performance Networks in MPICH2, Proceedings of 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS'09), 2009.

B. Darius, M. Guillaume, and G. William, « Data Transfer in a SMP System : Study and Application to MPI », Proc. 35th International Conference on Parallel Processing, 2006.

B. Darius, M. Guillaume, and G. William, « Design and Evaluation of Nemesis : a Scalable, Low-Latency, Message-Passing Communication Subsystem », Proc. 6th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2006.

B. Darius, M. Guillaume, and G. William, « Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem, Recent Advances in Parallel Virtual Machine and Message Passing Interface : Proceedings of the 13th European PVM/MPI Users Group Meeting (Euro PVM/MPI, 2006.

S. Bonn, , 2006.

A. Olivier, M. «. Guillaume, and . Mpich/madiii, Cluster of Clusters-Enabled MPI Implementation », Proc. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp.26-35, 2003.

A. Olivier, G. Mercier, N. «. Raymond, and . Mpich/madeleine, True Multi-Protocol MPI for High-Performance Networks », Proc. 15th International Parallel and Distributed Processing Symposium (IPDPS, p.51, 2001.

A. Olivier, A Portable and Efficient Communication Library for High-Performance Cluster Computing, IEEE International Conference on Cluster Computing, pp.78-87, 2000.

, Ateliers internationaux avec comité de lecture

G. Yiannis, G. Mercier, V. Adèle, and . Large, Scale Experiment for Topology-Aware Resource Management, Euro-Par 2017 : Parallel Processing Workshops -Euro-Par 2017 International Workshops, pp.179-186, 2017.

J. Emmanuel, G. Mercier, T. Francois, and A. Topology, Aware Hierarchical and Distributed Load-Balancing in Charm++, First International Workshop on Communication Optimizations in HPC, COMHPC@SC 2016, pp.63-72, 2016.

A. Olivier, High-Performance Multi-Rail Support with the NewMadeleine Communication Library, HCW 2007 : the Sixteenth International Heterogeneity in Computing Workshop, held in conjunction with IPDPS, p.9, 2007.

, Conférences françaises avec comité de lecture

T. François, M. Guillaume, and . Treematch, Un algorithme de placement de processus sur architectures multicoeurs, Compass 2013, 2013.

. Divers,

D. Balkanski and . Guillaume-mercier, Performance Evaluation of MPICH-Madeleine against the Multi-Protocol MPI Implementations for Homogeneous and Heterogeneous SMP Clusters », Proc. 5th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2005.

, Rapports de recherche

G. Brice, Hardware topology management in MPI applications through hierarchical communicators, 2018.

J. Emmanuel, M. Guillaume, and T. François, Process Placement in Multicore Clusters : Algorithmic Issues and Practical Techniques, 2013.

B. Darius, M. Guillaume, and G. William, Data Transfer in a SMP System : Study and Application to MPI, 2005.

B. Darius, M. Guillaume, and G. William, Design and Evaluation of Nemesis : a Scalable, Low-Latency, Message-Passing Communication Subsystem, 2005.

A. Olivier, M. Guillaume, and N. Raymond, MPICH-Madeleine : a True Multi-Protocol MPI for High-Performance Networks, 2000.

A. Olivier, A Portable and Efficient Communication Library for High-Performance Cluster Computing, 2000.

. Thèse,

M. Guillaume and . High, Performance Portable Communication in Hierarchical, Heterogeneous and Dynamical Environments ». In french only, 2004.

H. Ahmed, M. Abdel-gawad, . Thottethodi, B. «. Abhinav, and . Rahtm, Routing Algorithm Aware Hierarchical Task Mapping, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.325-335, 2014.

T. Agarwal, Topology-aware task mapping for reducing communication contention on large parallel machines, pp.25-29, 2006.

A. Hasan-metin, Aware Mappings for Large-Scale Eigenvalue Problems, Euro-Par 2012 Parallel Processing -18th International Conference. T. 7484, pp.830-842, 2012.

A. Carl, Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers, EuroMPI 2011, pp.61-70, 2011.

A. Sebastian-von, H. Ilja, and P. Minna, Sous la dir. de Pekka MANNINEN et Per ÖSTER, Applied Parallel and Scientific Computing, pp.978-981, 2013.

B. W. Kernighan-and-s and . Lin, « An efficient heuristic procedure for partitioning graphs, Bell System Technical Journal, vol.49, issue.2, pp.291-307, 1970.

B. Takanobu, I. Yoshifumi, and Y. Tsutomu, « A network-topology independent task allocation strategy for parallel computers, Proceedings Supercomputing '90, pp.878-887, 1990.

D. H. Bailey, NAS Parallel Benchmark Results. Rapp. tech. 94-006. RNR, 1994.

B. Pavan, Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems, Computer Science -R&D, vol.26, pp.247-256, 2011.

P. Bar, Running Parallel Applications with Topology-Aware Grid Middleware, Fifth International Conference on e-Science, pp.292-299, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00684522

F. Berman and S. Lawrence, On Mapping Parallel Algorithms into Parallel Architectures, vol.5, pp.90018-90027, 1987.

D. E. Bernholdt, « A survey of MPI usage in the US exascale computing project, Concurrency and Computation : Practice and Experience 0.0 (). e4851 cpe, vol.4851

B. Gyan, Optimizing task layout on the Blue Gene/L supercomputer, IBM Journal of Research and Development, vol.49, pp.489-500, 2005.

B. Abhinav, E. J. Bohm, and V. K. Laxmikant, Topology aware task mapping techniques : an api and case study, pp.301-302, 2009.

B. Abhinav, E. J. Bohm, and V. K. Laxmikant, Optimizing communication for Charm++ applications by reducing network contention, Concurrency and Computation : Practice and Experience, vol.23, pp.211-222, 2011.

A. Bhatele and V. K. Laxmikant, Benefits of Topology Aware Mapping for Mesh Interconnects, Parallel Processing Letters, vol.18, pp.549-566, 2008.

B. Abhinav, SC Conference on High Performance Computing Networking, Storage and Analysis, SC '12, p.97, 2012.

B. Abhinav, Optimizing the performance of parallel applications on a 5D torus via task mapping, 21st International Conference on High Performance Computing, HiPC, pp.1-10, 2014.

A. Bienz, W. D. Gropp, N. Luke, . Olson, and . Tapspmv, Topology-Aware Parallel Sparse Matrix Vector Multiplication, 2016.

S. H. Bokhari, On the Mapping Problem, IEEE Transactions on Computers, vol.30, pp.207-214, 1981.

S. , W. Bollinger, and F. M. Scott, Heuristic Technique for Processor and Link Assignment in Multicomputers, IEEE Trans. Comput, vol.40, pp.325-333, 1991.

E. G. Boman, The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing : Partitioning, ordering and coloring, Scientific Programming, vol.20, pp.129-150, 2012.

C. Bordage, C. Foyer, G. Brice, and . Netloc, Revised Selected Papers. T. 10659. Lecture Notes in Computer Science, pp.157-166, 2017.

C. Bordage and J. Emmanuel, 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.523-532, 2018.

B. Brandfass, T. Alrutz, and T. G. , Rank Reordering for MPI Communication Optimization, Computer & Fluids 80 (juil. 2013), pp.372-380

R. Brightwell, H. Trammell, P. «. Kevin, and . Smartmap, Operating System Support for Efficient Data Sharing Among Processes on a Multi-Core Processor, 2008.

S. Browne, A Portable Programming Interface for Performance Evaluation on Modern Processors, The International Journal of High Performance Computing Applications, vol.14, pp.189-204, 2000.

C. Smith, B. Mcmillan, and I. Lumb, « Topology Aware Scheduling in the LSF Distributed Resource Manager, Proceedings of the Cray User Group Meeting, 2001.

C. Nicolas, « A batch scheduler with high level components, Cluster computing and Grid 2005 (CCGrid05), 2005.

C. Franck and E. Daniel, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.12, 2000.

V. Ümit, . Çatalyürek, A. Cevdet, and . Patoh, Partitioning Tool for Hypergraphs) ». In : Encyclopedia of Parallel Computing, pp.1479-1487, 2011.

C. Lei, « Designing An Efficient Kernel-level and User-level Hybrid Approach for MPI Intra-node Communication on Multi-core Systems, Proceedings of the IEEE International Conference on Parallel Processing (ICPP-2008), 2008.

/. Champion and . Pro,

G. Daniel, . Chavarría-miranda, N. Jarek, and T. Vinod, « Topologyaware tile mapping for clusters of SMPs, Proceedings of the Third Conference on Computing Frontiers, pp.383-392, 2006.

C. Hu, MPIPP : an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters, Proceedings of the 20th Annual International Conference on Supercomputing, pp.1-59593, 2006.

C. Suresh and J. E. Richard, Predicting the Effect of Mapping on the Communication Performance of Large Multicomputers, Proceedings of the International Conference on Parallel Processing, ICPP '91, vol.II, pp.1-4, 1991.

C. Jérôme, Exploitation efficace des architectures parallèles de type grappes de NUMA à l aide de modèles hybrides de programmation, 2012.

Y. Chris, Cross Memory Attach, 2010.

. Cnrs and . Ada, Executing a hybrid MPI/OpenMP job in batch under the Intel environment, 2016.

I. Compaq and M. , Virtual Interface Architecture Specification V 1.0, 1997.

C. Adaptive, . Torque-resource, and . Manager,

C. Camille, T. Hérault, C. «. Franck, and . Mpi, Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, pp.466-477, 2009.

C. Cray and A. ,

. Cray and . Cray, Performance Measurement and Analysis Tool, 2017.

H. M. Eduardo and . Cruz, A Task Mapping Algorithm to Improve Communication and Load Balancing in Clusters of Multicore Systems, ACM Transactions on Parallel Computing, vol.5, p.24, 2019.

E. Cuthill and J. Mckee, « Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference. ACM '69, pp.157-172, 1969.

D. Solt, A profile based approach for topology aware MPI rank placement, 2007.

D. D. Erik, A Threads-Only MPI Implementation for the Development of Parallel Programs, Proceedings of the 11th International Symposium on High Performance Computing Systems (HPCS'97), pp.153-163, 1997.

R. H. Dennard, Design of ion-implanted MOSFET's with very small physical dimensions, IEEE Solid-State Circuits Society Newsletter, vol.12, issue.1, pp.38-50, 1974.

D. Mehmet, Exploiting Geometric Partitioning in Task Mapping for Parallel Computers, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.27-36, 2014.

D. Mehmet, 2015 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, pp.197-206, 2015.

D. Mehmet, Hypergraph partitioning for multiple communication cost metrics : Model and methods, vol.77, pp.69-83, 2015.

K. Devine, « Zoltan Data Management Services for Parallel Dynamic Applications, Computing in Science and Engineering, vol.4, pp.90-97, 2002.

M. Diener, Characterizing communication and page usage of parallel applications for thread and data mapping, Perform. Eval, vol.88, pp.18-36, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01146859

D. Jack, The International Exascale Software Project : A Call To Cooperative Action By the Global High-Performance Community, Int. J. High Perform. Comput. Appl, vol.23, pp.1094-3420, 2009.

D. Jörg, R. Thomas, and R. Gudula, « Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters, 2008 International Conference on Parallel Processing, pp.141-148, 2008.

G. E. Fagg and J. J. Dongarra, « Heterogeneous MPI Application Interoperation and Process Management under PVMPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Proceedings of the 4th European PVM/MPI Users' Group Meeting. T. 1332. Lecture Notes in Computer Science, pp.91-98, 1997.

E. Duesterwald, R. W. Wisniewski, P. F. Sweeney, G. Cascaval-and, and S. E. Smith, Method and System for Optimizing Communication in MPI Programs for an Execution Environment, 2008.

M. Charles, . Fiduccia, and M. M. Robert, A linear-time heuristic for improving network partitions, Proceedings of the 19th Design Automation Conference, DAC '82, pp.175-181, 1982.

F. Ian, J. Geisler, and T. Steven, MPI on the I-WAY : A Wide-Area, Multimethod Implementation of the Message Passing Interface

, Interconnect topology-aware resource assignment

, FUJITSU. Hardware Topology Aware MPI extensions on Fujitsu PRIMEHPC FX10, FX100, 2018.

G. Edgar, Distributed Computing in a Heterogeneous Computing Environment ». In : Recent Advances in Parallel Virtual Machine and Message Passing Interface. Sous la dir. de Vassil ALEXANDROV et Jack DONGARRA, Lecture Notes in Computer Sciences, pp.180-188, 1998.

G. Edgar, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.

J. J. Galvez and V. K. Laxmikant, Automatic topology mapping of diverse large-scale parallel applications, Proceedings of the International Conference on Supercomputing, vol.17, pp.1-17, 2017.

F. García, A. Calderón, J. Carretero, and . Mimpi, A Multithread-Safe Implementation of MPI, Recent Advances in PVM and MPI. 6th PVM/MPI European User's Group Meeting. T. 1697. Lecture Notes in Computer Science, pp.207-214, 1999.

W. L. George, J. G. Hagedorn, and J. E. Impi, Making MPI Interoperable, Journal of research of the National Institute of Standards and Technology, vol.105, pp.343-348, 2000.

G. Balazs, R. Riesen, and I. Yutaka, Making the Case for Portable MPI Process Pinning, Poster presented at the 25th European MPI Users' Group Meeting, 2018.

G. Brice, « Towards generic Communication Mechanisms and better Affinity Management in Clusters of Hierarchical Nodes ». Habilitation à diriger des recherches

G. Brice, J. Hursey, M. Jeffrey, . Squyres, and . Netloc, Towards a Comprehensive View of the HPC System Topology, 43rd International Conference on Parallel Processing Workshops, pp.216-225, 2014.

G. Brice and M. Stéphanie, « Dodging Non-Uniform I/O Access in Hierarchical Collective Operations for Multicore Clusters, CASS 2011 : The 1st Workshop on Communication Architecture for Scalable Systems, held in conjunction with IPDPS, 2011.

. Brice-goglin, M. «. Stéphanie, and . Knem, Generic and Scalable Kernel-Assisted Intra-node MPI Communication Framework, Journal of Parallel and Distributed Computing (JPDC) 73.2 (fév. 2013), pp.176-188

R. L. Graham, M. Galen, . Shipman, and A. Support-for-multi-core, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, pp.130-140, 2008.

G. William, Learning from the Success of MPI, High Performance Computing -HiPC, pp.81-94, 2001.

G. «. William and . Mpich2, A New Start for MPI Implementations, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, p.7, 2002.

D. G. William, Using Node Information to Implement MPI Cartesian Topologies, Proceedings of the 25th European MPI Users' Group Meeting, vol.18, 2018.

R. Gupta, S. Sathish, and . Vadhiyar, Application-oriented adaptive MPI_Bcast for grids, 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), Proceedings, pp.25-29, 2006.

K. Samuel and . Gutierrez, Accommodating Thread-Level Heterogeneity in Coupled Parallel Applications, pp.469-478, 2017.

H. M. Eduardo, M. Cruz, . Diener, O. A. Philippe, and . Navaux, « State-of-the-Art Sharing-Aware Mapping Methods ». In : Thread and Data Mapping for Multicore Systems : Improving Communication and Memory Accesses, Springer International Publishing, pp.35-48, 2018.

T. H. , Rank Reordering Strategy for MPI Topology Creation Functions ». In : Recent Advances in Parallel Virtual Machine and Message Passing Interface. Sous la dir. de V. ALEXANDROV et J. DONGARRA. T. 1497. Lecture Notes in Computer Science, pp.188-195, 1998.

H. E. Yun, Using OpenMP at NERSC, Présentation invitée à la OpenMPCon 2015. (cf. diapositive numéro 20.) Sept. 2015

H. Poxon, X. Mpi-for-cray, . Systems, . Olcf-workshop, and . Fév, , 2013.

H. Bruce and L. Robert, The Chaco User's Guide : Version 2.0. Rapp. tech. SAND94-2692, 1994.

. Hlrn and . Placeme, , 2011.

H. Torsten and S. Marc, Generic Topology Mapping Strategies for Large-Scale Parallel Architectures, pp.75-84, 2011.

H. Torsten and T. Jesper-larsson, 23rd IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2009.

H. Torsten, The scalable process topology interface of MPI 2.2, Concurrency and Computation : Practice and Experience, vol.23, pp.293-310, 2011.

H. Torsten, MPI : a new hybrid approach to parallel programming with MPI plus shared memory, Computing 95, vol.12, pp.1121-1136, 2013.

D. J. Holmes, Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale, Proceedings of the 23rd European MPI Users' Group Meeting, pp.121-129, 2016.

H. Chao, O. Sky-lawlor-et-laxmikant, V. K. Adaptive, and M. , Revised Papers. T. 2958. Lecture Notes in Computer Science, Languages and Compilers for Parallel Computing, 16th International Workshop, LCPC 2003, pp.306-322, 2003.

J. Hursey, M. Jeffrey, and . Squyres, Advancing application process affinity experimentation : open MPI's LAMA-based affinity interface, 20th European MPI Users's Group Meeting, EuroMPI '13, pp.163-168, 2013.

J. Hursey, J. M. Squyres, D. Terry, and . Locality, Aware Parallel Process Mapping for Multi-core HPC Systems, 2011 IEEE International Conference on Cluster Computing (CLUSTER), pp.527-531, 2011.

P. Husbands, C. James, and . Hoe, MPI-StarT : delivering network performance to numerical applications, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pp.1-15, 1998.

I. Toshiyuki, An Architecture of Stampi : MPI Library on a Cluster of Parallel Computers, Proceedings of EuroPVM/MPI2000. T. 1908. Lecture Notes in Computer Science, pp.200-207, 2000.

. Intel-r, Using MPI Tuner for Intel R MPI Library on Linux* OS, 2016.

*. Intel-r-.-intel-r-mpi-libray-for-linux and . Os, , 2019.

I. Yutaka, NB : cet article n'est disponible qu'en langue japonaise), Cluster Computing Research Center, 2003.

I. Satoshi, G. Kazuya, and O. Kenji, « Automatically Optimized Core Mapping to Subdomains of Domain Decomposition Method on Multicore Parallel Environments, Computer & Fluids (avr. 2012

J. Emmanuel and S. Richard, Improving MPI Application Communication Time with an Introspection Monitoring Library, p.23, 2019.

J. Hyun-wook, Support for High-Performance MPI Intra-Node Communication on Linux Cluster, Proceedings of the IEEE International Conference on Parallel Processing (ICPP-2005), 2005.

J. Hyun-wook, Lightweight Kernel-Level Primitives for High-Performance MPI Intra-Node Communication over Multi-Core Systems, Proceedings of the IEEE International Conference on Cluster Computing (Cluster'07, 2007.

A. Kako, Approximation Algorithms for the Weighted Independent Set Problem, LNCS, vol.3787, pp.341-350, 2005.

N. Karonis, B. Toonen, and I. F. Mpich-g2, A Grid-Enabled Implementation of the Message Passing Interface, Journal of Parallel and Distributed Computing. T. 63. 5. Mai, pp.551-563, 2003.

N. T. Karonis, Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance, Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), pp.377-384, 2000.

K. George and K. Vipin, METIS -Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0. Rapp. tech, 1995.

K. Thilo, MPI's Collective Communication Operations for Clustered Wide Area Systems, Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'99), pp.131-140, 1999.

A. K. Numa-api-for and L. Novell,

K. Thorsten, A Heterogeneous Computing Environment to Solve the 768-bit RSA Challenge, Cluster Computing, 2010.

K. Thorsten, Sous la dir. de Tal RABIN. T. 6223. Lecture Notes in Computer Science. The original publication is available at www.springerlink.com, CRYPTO 2010, pp.333-350, 2010.

M. K. , Jahresber. Deutsch. Math. -Verein, vol.300, 1955.

K. Bill, « Blue Waters and Resource Management -Now and in the Future, Présentation à la MoabCon. (cf. diapositive numéro 14, 2013.

L. Ping, . Sayantan-sur, and K. P. Dhabaleswar, Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems, Proceedings of the International Supercomputing Conference (ISC'10), 2010.

L. Institut-pierre-simon, Intégration de la parallélisation mixte MPI-OpenMP dans les configurations de l'IPSL, 2016.

L. Alexey, . Lastovetsky, R. «. Ravi, and . Hmpi, Towards a Message-Passing Library for Heterogeneous Networks of Computers, 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), p.102, 2003.

A. Lastovetsky, R. Ravi, and . Heterompi, Towards a message-passing library for heterogeneous networks of computers, Journal of Parallel and Distributed Computing, vol.66, issue.2, 2006.

L. Robert, L. Bautista-gomez-et-pavan, B. «-portable-topology-aware, and M. , 23rd IEEE International Conference on Parallel and Distributed Systems, pp.710-719, 2017.

L. Gunho, Topology-aware Resource Allocation for Data-intensive Workloads ». In : APSys '10, pp.1-6, 2010.

S. Lee and J. K. Aggarwal, A Mapping Strategy for Parallel Processing, IEEE Trans. Comput, vol.36, pp.433-442, 1987.

A. Edgar and . León, « mpibind : a memory-centric affinity algorithm for hybrid applications, Proceedings of the International Symposium on Memory Systems, MEMSYS 2017, pp.262-264, 2017.

L. I. Dongyang, W. Yunlan, and Z. Wei, Topology-Aware Process Mapping on Clusters Featuring NUMA and Hierarchical Network, pp.74-81, 2013.

L. I. Kangkang, M. Maciej, and N. Jarek, Topology-aware Job Allocation in 3D Torus-based HPC Systems with Hard Job Priority Constraints, pp.515-524, 2017.

L. I. Kangkang, M. Maciej, N. Jarek, and . Topology, Aware Scheduling on Blue Waters with Proactive Queue Scanning and Migration-Based Job Placement, Job Scheduling Strategies for Parallel Processing, pp.978-981, 2017.

L. I. Shigang, H. Torsten, and S. Marc, NUMA-aware shared-memory collective communication for MPI, The 22nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC'13, pp.85-96, 2013.

L. Chuang, Design and Evaluation of a Resource Selection Framework for Grid Applications, Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing. HPDC '02, p.63, 2002.

M. Jesús, L. Álvarez, J. Carlos-díaz, J. A. Martín, and . Rico-gallego, « Formal modeling and performance evaluation of a run-time rank remapping technique in Broadcast, Allgather and Allreduce MPI collective operations, Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.963-972, 2017.

L. Raúl and P. Christian, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, pp.187-194, 2007.

L. Giorgio, 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.586-595, 2015.

L. Miao, High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2, 39th International Conference on Parallel Processing, pp.377-386, 2010.

L. Ewing and G. William, MPICH Working Note : The Second-Generation ADI for the MPICH Implementation of MPI, 1996.

M. A. Chao, An Approach for Matching Communication Patterns in Parallels Applications, Proceedings of 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS'09), 2009.

M. A. Teng, Recent Advances in the Message Passing Interface -17th European MPI Users' Group Meeting, pp.265-274, 2010.

M. A. Teng, 2011 IEEE International Conference on Cluster Computing (CLUSTER), pp.196-204, 2011.

M. A. Teng, HierKNEM : An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters, pp.970-982, 2012.

M. A. Teng, Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms, J. Parallel Distrib. Comput, vol.73, pp.1000-1010, 2013.

R. Amith and . Mamidala, « Efficient SMP-aware MPI-level broadcast over Infini-Band's hardware multicast, 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), Proceedings, pp.25-29, 2006.

K. B. Manwade, D. B. Kulkarni, and B. Clustmap-;-de-subhash, A Topology-Aware MPI Process Placement Algorithm for Multi-core Clusters, Intelligent Computing and Information and Communication. Sous la dir, pp.67-76, 2018.

M. Motohiko, « Efficient MPI Collective Operations for Clusters in Longand-Fast Networks, Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006.

A. J. Mcpherson, N. Vijay, C. Marcelo, . Brodman, and T. U. Peng, « Static Approximation of MPI Communication Graphs for Optimized Process Placement, Languages and Compilers for Parallel Computing. Sous la dir, pp.978-981, 2015.

J. M. Mellor-crummey and M. L. Scott, Algorithms for Scalable Synchronization on Shared-Memeory Multiprocessors, vol.9, pp.21-65, 1991.

M. George, Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks, Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.228-237, 2017.

S. H. Mirsadeghi and A. A. Ptram, A Parallel Topology-and Routing-Aware Mapping Framework for Large-Scale HPC Systems, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Mai, pp.386-396, 2016.

S. Hessam, M. Ahmad, and A. Topology, Aware Rank Reordering for MPI Collectives, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops, pp.1759-1768, 2016.

G. E. Moore, Cramming more components onto integrated circuits, IEEE Solid-State Circuits Society Newsletter, vol.11, issue.3, pp.33-35, 1965.

M. Stéphanie, Optimizing MPI communication within large multicore nodes with kernel assistance, 24th IEEE International Symposium on Parallel and Distributed Processing, pp.1-7, 2010.

. Mpi-connect,

. Mpiconnect and . Url,

N. Raymond, M. Jean-françois, and . Pm2, Parallel Multithreaded Machine. A Computing Environment for Distributed Architectures, Parallel Computing : State-of-the-Art and Perspectives, Proceedings of the conference ParCo, pp.0-444, 1995.

N. Javier, Reducing complexity in tree-like computer interconnection networks, Parallel Computing, vol.36, pp.71-85, 2010.

, Methods to Check Process and Thread Affinity, 2019.

N. Bradford, B. Dick, J. Proulx, and F. , Pthreads Programming, pp.1-56592, 1996.

N. Christoph and R. Rolf, Topology aware Cartesian grid mapping with MPI. Poster at EuroMPI, 2018.

. G. Oracle and . Engine,

J. M. Orduña, F. Silla, and D. José, On the development of a communicationaware task mapping technique, Journal of Systems Architecture, vol.50, pp.207-220, 2004.

P. Scott and P. Avneesh, VMI 2.0 : A Dynamically Reconfigurable Messaging Layer for Availability, Usability and Management, 2000.

J. Antonio, P. , J. Navaridas, and M. José, Revised Papers. T. 5798. Lecture Notes in Computer Science, Effects of Topology-Aware Allocation Policies on Scheduling Performance ». In : Job Scheduling Strategies for Parallel Processing, 14th International Workshop, pp.138-156, 2009.

. Pbsworks,

P. François and P. Scotch, Graph Partitioning Software : An Overview, Combinatorial Scientific Computing. Sous la dir, pp.373-406, 2012.

P. Marc, C. Patrick, J. «. Hervé, and . Mpc-mpi, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users' Group Meeting, pp.94-103, 2009.

P. Andréa, A parallel SCRIP interpolation library for OASIS, 2018.

P. Simon, Enabling hierarchy-aware MPI collectives in dynamically changing topologies, Proceedings of the 24th European MPI Users' Group Meeting, vol.2, pp.1-2, 2017.

P. Simon, Revisiting locality-awareness in view of dynamically changing topologies, Parallel Computing, vol.77, pp.1-18, 2018.

, Portable Linux Processor Affinity (PLPA)

P. Martin, S. Silke, and B. Thomas, Message Passing Interface Library for Inhomogeneous Coupled Clusters, Proceedings of ACM/IEEE International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003.

P. Howard, G. Igor, and B. Darius, « A uGNI-Based MPICH2 Nemesis Network Module for the Cray XE, Recent Advances in the Message Passing Interface -18th European MPI Users' Group Meeting, 2011.

. Springer, , pp.978-981, 2011.

Q. Peixin, Job Placement and Network Routing on Fat-Tree Systems, The 47th International Conference on Parallel Processing, vol.36, 2018.

J. Khalid, H. , and A. Lastovetsky, Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms ». In : 42nd International Conference on Parallel Processing, pp.754-762, 2013.

R. Rolf and . Mpi-glue, Interoperable high-performance MPI combining different vendor's MPI worlds, Rapp. tech. Avr, 1998.

R. Rolf, G. Hager, and J. Gabriele, « Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes, Proceedings of the 17th

, Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp.427-436, 2009.

R. Thomas, More Message Passing Performance with the Multithreaded MPICH Device, 1997.

R. Rajesh, L. Miron, S. Marvin, and . Matchmaking, Distributed Resource Management for High Throughput Computing ». In : HPDC'7, pp.28-31, 1998.

M. J. Rashti, Multi-core and Network Aware MPI Topology Functions ». In : Recent Advances in the Message Passing Interface -18th European MPI Users' Group Meeting, pp.50-60, 2011.

E. Rodrigues, Multicore aware process mapping and its impact on communication overhead of parallel applications, Proceedings of the IEEE Symposium on Computers and Communications, pp.811-817, 2009.

L. Arnold and . Rosenberg, Issues in the Study of Graph Embeddings, WG '80 : Proceedings of the International Workshop on Graphtheoretic Concepts in Computer Science, pp.3-540, 1981.

S. Thibault, R. Namyst, and . Pierre-andré-wacrenier, Building Portable Thread Schedulers for Hierarchical Multiprocessors : the BubbleSched Framework, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00154506

A. Cipriano and . Santos, Chap. Policy-Based Resource Assignment in Utility Computing Environments, pp.100-111, 2004.

S. Kirk, K. George, and K. Vipin, « Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (Distinguished Paper), Proceedings from the 6th International Euro-Par Conference on Parallel Processing, pp.3-540, 2000.

C. Schulz, T. Jesper-larsson, S. De-costas, and . Iliopoulos, Sous la dir, Leibniz International Proceedings in Informatics (LIPIcs). Dagstuhl, Germany : Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, vol.4, 2017.

S. Sangmin, A Lightweight Low-Level Threading and Tasking Framework, IEEE Trans. Parallel Distrib. Syst, vol.29, pp.512-526, 2018.

S. John, D. Sudip, J. Morrison-;-laginha, and M. Palma, High Performance Computing for Computational Science -VECPAR 2010. Sous la dir. de José M, pp.978-981, 2011.

D. Horst, . Simon, and T. Shang-hua, « How Good is Recursive Bisection ? » In : SIAM, sept. 1997), vol.18, pp.1436-1445

B. E. Smith and B. Brett, Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, pp.1005-1013, 2005.

S. Lorna and B. Mark, « Development of mixed mode MPI / OpenMP applications, Scientific Programming, vol.9, pp.83-98, 2001.

O. Sonmez, H. H. Mohamed, and D. H. Epema, « Communication-Aware Job Placement Policies for the KOALA Grid Scheduler, Proc. of the Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), pp.0-7695, 2006.

S. Mohsen, A. Morteza, and Z. Ghobad, « A Novel Process Mapping Strategy in Clustered Environments, 2012.

S. Mohsen, A. Morteza, and Z. Ghobad, « Improving internode communications in multi-core clusters using a contention-free process mapping algorithm, The Journal of Supercomputing, vol.66, pp.488-513, 2013.

M. Jeffrey, . Squyres, L. Andrew, L. «-a-component-architecture-for, and . Mpi-», Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, pp.379-387, 2003.

S. Craig and . Steele, Placement of communicating processes on multiprocessors networks, pp.4122012-143033166, 1985.

H. Subramoni, Design of a Scalable Infiniband Topology Service to Enable Network-Topology-Aware Placement of Processes, Proceedings of the 2012 ACM/IEEE conference on Supercomputing (CDROM), p.12, 2012.

C. D. Sudheer and A. Srinivasan, « Optimization of the hop-byte metric for effective topology aware mapping, High Performance Computing (HiPC), 2012 19th International Conference on, pp.1-9, 2012.

H. Tang and Y. Tao, Optimizing threaded MPI execution on SMP clusters, Proceedings of the 15th international conference on Supercomputing, pp.381-392, 2001.

T. Kenjiro, A. Andrew, . Chien, and . Heuristic, Algorithm for Mapping Communicating Tasks on Heterogeneous Resources, 9th Heterogeneous Computing Workshop, pp.102-115, 2000.

T. François, « Placement d'applications parallèles en fonction de l'affinité et de la topologie, 2015.

, CD-ROM, Jesper Larsson TRÄFF. « Implementing the MPI process topology mechanism, vol.40, 2002.

, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, pp.392-400, 2002.

T. Jesper-larsson, Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03), pp.56-65, 2003.

T. Jesper-larsson, « Direct graph k-partitioning with a Kernighan-Lin like heuristic, Oper. Res. Lett, vol.34, issue.6, pp.621-629, 2006.

J. Larsson, T. Antoine, and R. «. Mpi, Collectives and Datatypes for Hierarchical All-to-all Communication, 21st European MPI Users' Group Meeting, EuroMPI/ASIA '14, p.27, 2014.

T. François, A multithreaded communication engine for multicore architectures, 22nd IEEE International Symposium on Parallel and Distributed Processing, pp.1-7, 2008.

T. Dave, Integrating New Capabilities into NetPIPE, Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, pp.37-44, 2003.

V. Geoffroy and B. David, Poster presented at the 25th European MPI Users' Group Meeting, 2018.

P. L. Vaughan, Migrating from PVM to MPI, part I : the Unify System, 5th Symposium on the Frontiers of Massively Parallel Computation. Sous la dir. d'IEEE Computer Society Technical Committee on COMPUTER ARCHITECTURE. 12. McLean, pp.488-495, 1995.

V. Vishwanath, Optimized process placement for collective I/O operations, 20th European MPI Users's Group Meeting, EuroMPI '13, pp.31-36, 2013.

J. T. Vogelstein, « Fast Approximate Quadratic Programming for Graph Matching, PLoS One, vol.10, 2015.

V. Nagavijayalakshmi, Locality Conscious Processor Allocation and Scheduling for Mixed Parallel Applications, 2006.

W. Scott, Minimising Communication Costs on a SMP Cluster using Process Placement, 2005.

C. Walshaw, M. Cross, and . Jostle, Parallel Multilevel Graph-Partitioning Software -An Overview, Mesh Partitioning Techniques and Domain Decomposition Techniques. Sous la dir. de F. MAGOULES, pp.978-979, 2007.

W. U. Jingjin, X. Xuanxing, and L. Zhiling, « Hierarchical task mapping for parallel applications on supercomputers, The Journal of Supercomputing, vol.71, pp.1776-1802, 2015.

W. U. Jingjin, X. Xuanxing, and L. Zhiling, « Hierarchical task mapping for parallel applications on supercomputers, The Journal of Supercomputing, vol.71, pp.1776-1802, 2015.

, YAMPII

Y. Xu, Balancing job performance with system performance via localityaware scheduling on torus-connected systems, Cluster'2014, pp.140-148, 2014.

. Andyb, M. Yoo, G. «. Et-mark, and . Slurm, Simple Linux Utility for Resource Management, Job Scheduling Strategies for Parallel Processing, pp.44-60, 2003.

H. Yu, I. Chung, and J. E. Moreira, Blue Gene System Software -Topology Mapping for Blue Gene/L Supercomputer, Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006.

Z. Ghobad, S. Mohsen, and A. Morteza, New Process Placement Algorithm in Multi-core Clusters Aimed to Reducing Network Interface Contention, Advances in Computer Science, Engineering & Applications. Sous la dir. de David C. WYLD, Jan ZIZKA et Dhinaharan NAGAMALAI, pp.978-981, 2012.

Z. Jidong, C. Wenguang, and Z. Weimin, PHANTOM : predicting performance of parallel applications on large-scale parallel machines using a single node, Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.978-979, 2010.

Z. Jidong, FACT : fast communication trace collection for parallel applications through program slicing, Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009.

Z. Jidong, Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.143-153, 2014.

J. Zhang, The Netherlands, Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, 2009.

. Springer, , pp.978-981, 2009.

Z. Hao, Hierarchical Collectives in MPICH2, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users' Group Meeting, pp.325-326, 2009.