S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors, International Journal of High Performance Computing Applications, vol.14, issue.3, pp.189-204, 2000.
DOI : 10.1177/109434200001400303

D. Buntinas, G. Mercier, and W. Gropp, Data Transfers between Processes in an SMP System: Performance Study and Application to MPI. Parallel Processing, ICPP 2006. International Conference on, pp.487-496, 2006.

D. Buntinas, G. Mercier, and W. Gropp, Design and evaluation of Nemesis, a scalable low-latency message-passing communication subsystem, Proceedings of the 6th IEEE International Symposium on Cluster Computing and the Grid (CCGRID '06), pp.521-530, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00344350

D. Buntinas, G. Mercier, and W. Gropp, Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem, Recent Advances in Parallel Virtual Machine and Message Passing Interface: Proc. 13th European PVM/MPI Users Group Meeting, 2006.
DOI : 10.1007/11846802_19

URL : https://hal.archives-ouvertes.fr/hal-00344339

L. Chai, P. Lai, H. Jin, and D. K. Panda, Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems, 2008 37th International Conference on Parallel Processing, 2008.
DOI : 10.1109/ICPP.2008.16

B. Goglin, High Throughput Intra-Node MPI Communication with Open-MX, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009.
DOI : 10.1109/PDP.2009.20

URL : https://hal.archives-ouvertes.fr/inria-00331209

R. L. Graham, T. S. Woodall, and J. M. Squyres, Open MPI: A Flexible High Performance MPI, Proceedings, 6th Annual International Conference on Parallel Processing and Applied Mathematics, 2005.
DOI : 10.1007/11752578_29

A. Grover and C. Leech, Accelerating Network Receive Processing (Intel I/O Acceleration Technology), Proceedings of the Linux Symposium, pp.281-288, 2005.

H. Jin, S. Sur, L. Chai, and D. K. Panda, Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems, 2007 IEEE International Conference on Cluster Computing, 2007.
DOI : 10.1109/CLUSTR.2007.4629263

M. Koop, W. Huang, K. Gopalakrishnan, and D. K. Panda, Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand, 2008 16th IEEE Symposium on High Performance Interconnects, 2008.
DOI : 10.1109/HOTI.2008.26

I. Myricom, Myrinet Express (MX): A High Performance, Low-Level, Message-Passing Interface for Myrinet, 2006.

K. Vaidyanathan, L. Chai, W. Huang, and D. K. Panda, Efficient asynchronous memory copy operations on multi-core systems and I/OAT, 2007 IEEE International Conference on Cluster Computing, 2007.
DOI : 10.1109/CLUSTR.2007.4629228