. Coraid, The Linux Storage People

S. Sumimoto, K. Kumon, and P. Ehernet-krma, A High Performance Remote Memory Access Facility Using Multiple Gigabit Ethernet Cards, 3rd International Symposium on Cluster Computing and the Grid (CCGrid2003, pp.326-334, 2003.

J. Chen, W. Watson, I. , R. Edwards, and W. Mao, Message Passing for Linux Clusters with Gigabit Ethernet Mesh Connections, IPDPS'05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) -Workshop 9, 2005.

S. Sumimoto, K. Ooe, K. Kumon, T. Boku, M. Sato et al., A scalable communication layer for multi-dimensional hyper crossbar network using multiple gigabit ethernet, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.107-115, 2006.
DOI : 10.1145/1183401.1183418

S. Karlsson, S. Passas, G. Kotsis2, and A. Bilas, MultiEdge: An Edge-based Communication Subsystem for Scalable Commodity Servers, 2007 IEEE International Parallel and Distributed Processing Symposium, p.28, 2007.
DOI : 10.1109/IPDPS.2007.370218

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.7727

I. Myricom, Myrinet Express (MX): A High Performance, Low-Level, Message-Passing Interface for Myrinet, 2006.

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra et al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings, 11th European PVM/MPI Users' Group Meeting, pp.97-104, 2004.
DOI : 10.1007/978-3-540-30218-6_19

D. Buntinas, G. Mercier, and W. Gropp, Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem, Parallel Computing, vol.33, issue.9, pp.634-644, 2006.
DOI : 10.1016/j.parco.2007.06.003

URL : https://hal.archives-ouvertes.fr/hal-00344327

B. Goglin, Design and Implementation of Open-MX: High-Performance Message Passing over generic Ethernet hardware, in: CAC 2008: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS, 2008.

B. Goglin, Improving message passing over Ethernet with I/OAT copy offload in Open-MX, 2008 IEEE International Conference on Cluster Computing, pp.223-231, 2008.
DOI : 10.1109/CLUSTR.2008.4663775

URL : https://hal.archives-ouvertes.fr/inria-00288757

T. E. Anderson, D. E. Culler, and D. A. Patterson, A case for NOW (Networks of Workstations), IEEE Micro, vol.15, issue.1, pp.54-64, 1995.
DOI : 10.1109/40.342018

T. Sterling, D. Savarese, D. J. Becker, J. E. Dorband, U. A. Ranawake et al., BEOWULF: A parallel workstation for scientific computation, Proceedings of the 24th International Conference on Parallel Processing, pp.11-14, 1995.

R. P. Martin, A. M. Vahdat, D. E. Culler, and T. E. Anderson, Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture, Proceedings of the 24th Annual International Symposium on Computer Architecture, pp.85-97, 1997.

D. D. Clark, V. Jacobson, J. Romkey, and H. Salwen, An analysis of TCP processing overhead, IEEE Communications Magazine, vol.27, issue.6, pp.23-29, 1989.
DOI : 10.1109/35.29545

A. Barak, I. Gilderman, and I. Metrik, Performance of the communication layers of TCP/IP with the Myrinet gigabit LAN, Computer Communications, vol.22, issue.11, p.22
DOI : 10.1016/S0140-3664(99)00071-7

T. Von-eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser, Active Messages: a Mechanism for Integrated Communication and Computation, Proceedings of the 19th Int'l Symp. on Computer Architecture, 1992.

S. Pakin, V. Karamcheti, and A. A. Chien, Fast messages: efficient, portable communication for workstation clusters and MPPs, IEEE Concurrency, vol.5, issue.2, pp.60-73, 1997.
DOI : 10.1109/4434.588295

Z. Yi and P. P. Waskiewicz, Enabling Linux Network Support of Hardware Multiqueue Devices, Proceedings of the Linux Symposium (OLS2007), pp.305-310, 2007.

L. Grossman, Large Receive Offload Implementation in Neterion 10GbE Ethernet Driver, Proceedings of the Linux Symposium (OLS2005), pp.195-200, 2005.

D. Cohen, T. Talpey, A. Kanevsky, U. Cummings, M. Krause et al., Remote Direct Memory Access over the Converged Enhanced Ethernet Fabric: Evaluating the Options, 2009 17th IEEE Symposium on High Performance Interconnects, pp.123-130, 2009.
DOI : 10.1109/HOTI.2009.23

P. Shivam, P. Wyckoff, and D. K. Panda, EMP, Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '01, p.57, 2001.
DOI : 10.1145/582034.582091

M. J. Rashti and A. Afsahi, 10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G, 2007 IEEE International Parallel and Distributed Processing Symposium, p.234, 2007.
DOI : 10.1109/IPDPS.2007.370480

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.353

D. Dalessandro, A. Devulapalli, and P. Wyckoff, Design and Implementation of the iWarp Protocol in Software, Proceedings of PDCS'05, pp.471-476, 2005.

P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur, Toward Efficient Support for Multithreaded MPI Communication, Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.120-129, 2008.
DOI : 10.1006/jpdc.2000.1674

B. Goglin, High Throughput Intra-Node MPI Communication with Open-MX, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp.173-180, 2009.
DOI : 10.1109/PDP.2009.20

URL : https://hal.archives-ouvertes.fr/inria-00331209

H. K. Chu, Zero-Copy TCP in Solaris, Proceedings of the USENIX Annual Technical Conference, pp.253-264, 1996.

S. Passas, K. Magoutis, and A. , Bilas, Towards 100 Gbit/s Ethernet: Multicore-based Parallel Communication Protocol Design, Proceedings of the 23rd international conference on Supercomputing (ICS'09), pp.214-224, 2009.

G. Regnier, S. Makineni, I. Illikkal, R. Iyer, D. Minturn et al., TCP onloading for data center servers, Computer, vol.37, issue.11, pp.48-58, 2004.
DOI : 10.1109/MC.2004.223

A. Grover and C. Leech, Accelerating Network Receive Processing (Intel I/O Acceleration Technology, Proceedings of the Linux Symposium (OLS2005), pp.281-288, 2005.

K. Vaidyanathan and D. K. Panda, Benefits of I/O Acceleration Technology (I/OAT) in Clusters, 2007 IEEE International Symposium on Performance Analysis of Systems & Software, pp.220-229, 2007.
DOI : 10.1109/ISPASS.2007.363752

K. Vaidyanathan, W. Huang, L. Chai, and D. K. Panda, Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT, 2007 IEEE International Parallel and Distributed Processing Symposium, p.234, 2007.
DOI : 10.1109/IPDPS.2007.370479

K. Salah, To coalesce or not to coalesce, AEU - International Journal of Electronics and Communications, vol.61, issue.4, pp.215-225, 2007.
DOI : 10.1016/j.aeue.2006.04.007

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter et al., The Nas Parallel Benchmarks, International Journal of High Performance Computing Applications, vol.5, issue.3, pp.63-73, 1991.
DOI : 10.1177/109434209100500306

B. Goglin, NIC-Assisted Cache-Efficient Receive Stack for Message Passing over Ethernet, Proceedings of the 15th International Euro-Par Conference, pp.1065-1077, 2009.
DOI : 10.1145/1080695.1069976

URL : https://hal.archives-ouvertes.fr/inria-00379168