F. Cappello and D. Etiemble, MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks, ACM/IEEE SC 2000 Conference (SC'00), p.12, 2000.
DOI : 10.1109/SC.2000.10001

D. Buntinas, G. Mercier, and W. Gropp, Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem, Parallel Computing, vol.33, issue.9, pp.634-644, 2006.
DOI : 10.1016/j.parco.2007.06.003
URL : https://hal.archives-ouvertes.fr/hal-00344327

L. Chai, P. Lai, H. Jin, and D. K. Panda, Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems, 2008 37th International Conference on Parallel Processing, 2008.
DOI : 10.1109/ICPP.2008.16

D. Buntinas, B. Goglin, D. Goodell, G. Mercier, and S. Moreaud, Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis, 2009 International Conference on Parallel Processing, 2009.
DOI : 10.1109/ICPP.2009.22
URL : https://hal.archives-ouvertes.fr/inria-00390064

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

D. Buntinas, G. Mercier, and W. Gropp, Data Transfers between Processes in an SMP System: Performance Study and Application to MPI, 2006 International Conference on Parallel Processing (ICPP'06), pp.487-496, 2006.
DOI : 10.1109/ICPP.2006.31

B. Goglin, High Throughput Intra-Node MPI Communication with Open-MX, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009.
DOI : 10.1109/PDP.2009.20
URL : https://hal.archives-ouvertes.fr/inria-00331209

R. Brightwell, T. Hudson, and K. Pedretti, SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, 2008.
DOI : 10.1109/SC.2008.5218881

M. Koop, W. Huang, K. Gopalakrishnan, and D. K. Panda, Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand, 2008 16th IEEE Symposium on High Performance Interconnects, 2008.
DOI : 10.1109/HOTI.2008.26

H. Jin, S. Sur, L. Chai, and D. K. Panda, Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems, 2007 IEEE International Conference on Cluster Computing, 2007.
DOI : 10.1109/CLUSTR.2007.4629263

A. Grover and C. Leech, Accelerating Network Receive Processing (Intel I/O Acceleration Technology), Proceedings of the Linux Symposium, pp.281-288, 2005.

K. Vaidyanathan, L. Chai, W. Huang, and D. K. Panda, Efficient asynchronous memory copy operations on multi-core systems and I/OAT, 2007 IEEE International Conference on Cluster Computing, 2007.
DOI : 10.1109/CLUSTR.2007.4629228

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter et al., The Nas Parallel Benchmarks, International Journal of High Performance Computing Applications, vol.5, issue.3, pp.63-73, 1991.
DOI : 10.1177/109434209100500306

T. Hoefler, A. Lumsdaine, and W. Rehm, Implementation and performance analysis of non-blocking collective operations for MPI, Proceedings of the 2007 ACM/IEEE conference on Supercomputing , SC '07, 2007.
DOI : 10.1145/1362622.1362692