D. Buntinas, G. Mercier, and W. Gropp, Data Transfers between Processes in an SMP System: Performance Study and Application to MPI, 2006 International Conference on Parallel Processing (ICPP'06), pp.487-496, 2006.
DOI : 10.1109/ICPP.2006.31

R. Brightwell, T. Hudson, and K. Pedretti, SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, 2008.
DOI : 10.1109/SC.2008.5218881

L. Chai, P. Lai, H. Jin, and D. K. Panda, Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems, 2008 37th International Conference on Parallel Processing, 2008.
DOI : 10.1109/ICPP.2008.16

B. Goglin and S. Moreaud, KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework, Journal of Parallel and Distributed Computing, vol.73, issue.2, pp.176-188, 2013.
DOI : 10.1016/j.jpdc.2012.09.016

URL : https://hal.archives-ouvertes.fr/hal-00731714

C. Yeoh, Cross Memory Attach, 2010.

D. Buntinas, B. Goglin, D. Goodell, G. Mercier, and S. Moreaud, Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis, 2009 International Conference on Parallel Processing, pp.462-469, 2009.
DOI : 10.1109/ICPP.2009.22

URL : https://hal.archives-ouvertes.fr/inria-00390064

T. Ma, G. Bosilca, A. Bouteiller, and J. J. Dongarra, Locality and Topology Aware Intra-node Communication among Multicore CPUs, Proceedings of the 17th European MPI Users Group Conference, ser. Lecture Notes in Computer Science, 2010.
DOI : 10.1007/978-3-642-15646-5_28

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Moreaud, B. Goglin, D. Goodell, and R. Namyst, Optimizing MPI communication within large multicore nodes with kernel assistance, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.
DOI : 10.1109/IPDPSW.2010.5470849

URL : https://hal.archives-ouvertes.fr/inria-00451471

B. Putigny, B. Goglin, and D. Barthou, A benchmark-based performance model for memory-bound HPC applications, 2014 International Conference on High Performance Computing & Simulation (HPCS), 2014.
DOI : 10.1109/HPCSim.2014.6903790

URL : https://hal.archives-ouvertes.fr/hal-00985598

M. S. Papamarcos and J. H. Patel, A low-overhead coherence solution for multiprocessors with private cache memories, ACM SIGARCH Computer Architecture News, vol.12, issue.3, pp.348-354, 1984.
DOI : 10.1145/773453.808204

D. Buntinas, G. Mercier, and W. Gropp, Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem, Recent Advances in Parallel Virtual Machine and Message Passing Interface: Proc. 13th European PVM/MPI Users Group Meeting, 2006.
DOI : 10.1007/11846802_19

URL : https://hal.archives-ouvertes.fr/hal-00344339

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

S. Pellegrini, T. Hoefler, and T. Fahringer, On the Effects of CPU Caches on MPI Point-to-Point Communications, 2012 IEEE International Conference on Cluster Computing, pp.495-503, 2012.
DOI : 10.1109/CLUSTER.2012.22

A. Pesterev, N. Zeldovich, and R. T. Morris, Locating cache performance bottlenecks using data profiling, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.335-348, 2010.
DOI : 10.1145/1755913.1755947

A. Denis, A High Performance Superpipeline Protocol for InfiniBand, Proceedings of the 17th International Euro- Par Conference, pp.276-287, 2011.
DOI : 10.1007/978-3-642-23397-5_27

URL : https://hal.archives-ouvertes.fr/inria-00586015

M. Chaarawi, J. M. Squyres, E. Gabriel, and S. Feki, A Tool for Optimizing Runtime Parameters of Open MPI, Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.210-217, 2008.
DOI : 10.1007/BFb0056559

S. Pellegrini, J. Wang, T. Fahringer, and H. Moritsch, Optimizing MPI Runtime Parameter Settings by Using Machine Learning, EuroPVM/MPI, ser, pp.196-206, 2009.
DOI : 10.1007/978-3-642-03770-2_26

S. R. Garea and T. Hoefler, Modeling Communication in Cache-Coherent SMP Systems -A Case-Study with Xeon Phi, Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pp.6-2013