J. Pjesivac-grbovic, T. Angskun, G. Bosilca, G. Fagg, E. Gabriel et al., Performance Analysis of MPI Collective Operations, 19th IEEE International Parallel and Distributed Processing Symposium, 2005.
DOI : 10.1109/IPDPS.2005.335

R. Thakur, R. Rabenseifner, and W. Gropp, Optimization of Collective Communication Operations in MPICH, International Journal of High Performance Computing Applications, vol.19, issue.1, pp.49-66, 2005.
DOI : 10.1177/1094342005051521

R. Rabenseifner, Optimization of Collective Reduction Operations, Computational Science -ICCS Lecture Notes in Computer Science, vol.3036, pp.1-9, 2004.
DOI : 10.1007/978-3-540-24685-5_1

P. Liu, M. Kuo, and D. Wang, An Approximation Algorithm and Dynamic Programming for Reduction in Heterogeneous Environments, Algorithmica, vol.33, issue.4, pp.425-453, 2009.
DOI : 10.1007/s00453-007-9113-7

A. Legrand, L. Marchal, and Y. Robert, Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms, Journal of Parallel and Distributed Computing, vol.65, issue.12, pp.1497-1514, 2005.
DOI : 10.1016/j.jpdc.2005.05.021

URL : https://hal.archives-ouvertes.fr/hal-00789425

J. Dean and G. S. Mapreduce, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica, Improving MapReduce performance in heterogeneous environments, Proc. of the 8th USENIX conf. on Operating systems design and implementation, pp.29-42, 2008.

A. Bar-noy, S. Kipnis, and B. Schieber, AN OPTIMAL ALGORITHM FOR COMPUTING CENSUS FUNCTIONS IN MESSAGE-PASSING SYSTEMS, Parallel Processing Letters, vol.03, issue.01, pp.19-23, 1993.
DOI : 10.1142/S0129626493000046

J. Bruck and C. Ho, EFFICIENT GLOBAL COMBINE OPERATIONS IN MULTI-PORT MESSAGE-PASSING SYSTEMS, Parallel Processing Letters, vol.03, issue.04, pp.335-346, 1993.
DOI : 10.1142/S012962649300037X

R. Van-de-geijn, On Global Combine Operations, Journal of Parallel and Distributed Computing, vol.22, issue.2, pp.324-328, 1994.
DOI : 10.1006/jpdc.1994.1091

Q. Ali, V. Pai, and S. Midkiff, Advanced collective communication in Aspen International Conference for High Performance Computing, Networking, Storage and Analysis, SC '08, pp.83-93, 2008.

H. Ritzdorf and J. Träff, Collective operations in NEC's high-performance MPI libraries, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006.
DOI : 10.1109/IPDPS.2006.1639334

A. Bar-noy, J. Bruck, C. Ho, S. Kipnis, and B. Schieber, Computing global combine operations in the multiport postal model, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.8, pp.896-900, 1995.
DOI : 10.1109/71.406965

R. Rabenseifner and J. Träff, More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems. Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, 2004.

E. Chan, M. Heimlich, A. Purkayastha, and R. Van-de-geijn, Collective communication: theory, practice, and experience, Concurrency and Computation: Practice and Experience, vol.49, issue.13, pp.1749-1783, 2007.
DOI : 10.1002/cpe.1206

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.9821

P. Sanders, J. Speck, and J. Träff, Two-tree algorithms for full bandwidth broadcast, reduction and scan, Parallel Computing, vol.35, issue.12, pp.581-594, 2009.
DOI : 10.1016/j.parco.2009.09.001

T. Kielmann, R. Hofman, H. Bal, A. Plaat, and R. Bhoedjang, MPI's reduction operations in clustered wide area systems, 1999.

A. Agarwal, O. Chapelle, M. Dudík, and J. Langford, A Reliable Effective Terascale Linear Learning System, CoRR, pp.1110-4198, 2011.

L. Canon, E. Jeannot, R. Sakellariou, and W. Zheng, Comparative Evaluation Of The Robustness Of DAG Scheduling Heuristics, Proceedings of CoreGRID Integration Workshop, 2008.
DOI : 10.1007/978-0-387-09457-1_7

URL : https://hal.archives-ouvertes.fr/inria-00333903

A. Benoit, F. Dufossé, M. Gallet, Y. Robert, and B. Gaujal, Computing the throughput of probabilistic and replicated streaming applications, Proc. of SPAA, Symp. on Parallelism in Algorithms and Architectures, pp.166-175, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00555890

L. Canon and E. Jeannot, Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments, IEEE Transactions on Parallel and Distributed Systems, vol.21, issue.4, pp.532-546, 2010.
DOI : 10.1109/TPDS.2009.84

URL : https://hal.archives-ouvertes.fr/inria-00261376

G. Cordasco, R. Chiara, and A. Rosenberg, On scheduling dag s for volatile computing platforms: Area-maximizing schedules, Journal of Parallel and Distributed Computing, vol.72, issue.10, pp.1347-1360, 2012.
DOI : 10.1016/j.jpdc.2012.06.007

T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, 2009.

L. Canon, Scheduling associative reductions with homogeneous costs when overlapping communications and computations, 20th Annual International Conference on High Performance Computing, 2013.
DOI : 10.1109/HiPC.2013.6799124

URL : https://hal.archives-ouvertes.fr/hal-00675964

U. Lublin and D. Feitelson, The workload on parallel supercomputers: modeling the characteristics of rigid jobs, Journal of Parallel and Distributed Computing, vol.63, issue.11, pp.1105-1122, 2003.
DOI : 10.1016/S0743-7315(03)00108-4

D. Feitelson, Workload modeling for computer systems performance evaluation. Book Draft, Version 0, 38, 2013.