A. Agarwal, O. Chapelle, M. Dudík, and J. Langford, A Reliable Effective Terascale Linear Learning System, 2011.

Q. Ali, V. S. Pai, and S. P. Midkiff, Advanced collective communication in aspen, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.83-93, 2008.
DOI : 10.1145/1375527.1375543

A. Bar-noy, J. Bruck, C. Ho, S. Kipnis, and B. Schieber, Computing global combine operations in the multiport postal model, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.8, pp.896-900, 1995.
DOI : 10.1109/71.406965

A. Bar-noy, S. Kipnis, and B. Schieber, AN OPTIMAL ALGORITHM FOR COMPUTING CENSUS FUNCTIONS IN MESSAGE-PASSING SYSTEMS, Parallel Processing Letters, vol.03, issue.01, pp.19-23, 1993.
DOI : 10.1142/S0129626493000046

A. Benoit, F. Dufossé, M. Gallet, Y. Robert, and B. Gaujal, Computing the throughput of probabilistic and replicated streaming applications, Proc. of SPAA, Symp. on Parallelism in Algorithms and Architectures, pp.166-175, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00555890

J. Bruck and C. Ho, EFFICIENT GLOBAL COMBINE OPERATIONS IN MULTI-PORT MESSAGE-PASSING SYSTEMS, Parallel Processing Letters, vol.03, issue.04, pp.335-346, 1993.
DOI : 10.1142/S012962649300037X

L. Canon and G. Antoniu, Scheduling associative reductions with homogeneous costs when overlapping communications and computations, 20th Annual International Conference on High Performance Computing, 2012.
DOI : 10.1109/HiPC.2013.6799124
URL : https://hal.archives-ouvertes.fr/hal-00675964

L. Canon and E. Jeannot, Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments, IEEE Transactions on Parallel and Distributed Systems, vol.21, issue.4, pp.532-546, 2010.
DOI : 10.1109/TPDS.2009.84
URL : https://hal.archives-ouvertes.fr/inria-00430920

L. Canon, E. Jeannot, R. Sakellariou, and W. Zheng, Comparative Evaluation Of The Robustness Of DAG Scheduling Heuristics, Proceedings of CoreGRID Integration Workshop, 2008.
DOI : 10.1007/978-0-387-09457-1_7
URL : https://hal.archives-ouvertes.fr/inria-00333903

E. W. Chan, M. F. Heimlich, A. Purkayastha, and R. A. Van-de-geijn, Collective communication: theory, practice, and experience, Concurrency and Computation: Practice and Experience, vol.49, issue.13, pp.1749-1783, 2007.
DOI : 10.1002/cpe.1206
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.9821

G. Cordasco, R. D. Chiara, and A. L. Rosenberg, On scheduling dag s for volatile computing platforms: Area-maximizing schedules, Journal of Parallel and Distributed Computing, vol.72, issue.10, pp.1347-1360, 2012.
DOI : 10.1016/j.jpdc.2012.06.007

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2009.

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

D. Feitelson, Workload modeling for computer systems performance evaluation . Book Draft, Version 0, 2013.

T. Kielmann, R. F. Hofman, H. E. Bal, A. Plaat, and R. A. Bhoedjang, MPI's reduction operations in clustered wide area systems, 1999.

A. Legrand, L. Marchal, and Y. Robert, Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms, Journal of Parallel and Distributed Computing, issue.12, pp.651497-1514, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00789425

P. Liu, M. Kuo, and D. Wang, An Approximation Algorithm and Dynamic Programming for Reduction in Heterogeneous Environments, Algorithmica, vol.33, issue.4, pp.425-453, 2009.
DOI : 10.1007/s00453-007-9113-7

U. Lublin and D. G. Feitelson, The workload on parallel supercomputers: modeling the characteristics of rigid jobs, Journal of Parallel and Distributed Computing, vol.63, issue.11, pp.1105-1122, 2003.
DOI : 10.1016/S0743-7315(03)00108-4

J. Pjesivac-grbovic, T. Angskun, G. Bosilca, G. Fagg, E. Gabriel et al., Performance Analysis of MPI Collective Operations, 19th IEEE International Parallel and Distributed Processing Symposium, 2005.
DOI : 10.1109/IPDPS.2005.335

R. Rabenseifner, Optimization of Collective Reduction Operations, Computational Science -ICCS 2004, pp.1-9, 2004.
DOI : 10.1007/978-3-540-24685-5_1

R. Rabenseifner and J. L. Träff, More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, 2004.
DOI : 10.1007/978-3-540-30218-6_13

H. Ritzdorf and J. L. Träff, Collective operations in NEC's highperformance MPI libraries, IEEE International Parallel and Distributed Processing Symposium, IPDPS, 2006.

P. Sanders, J. Speck, and J. L. Träff, Two-tree algorithms for full bandwidth broadcast, reduction and scan, Parallel Computing, vol.35, issue.12, pp.581-594, 2009.
DOI : 10.1016/j.parco.2009.09.001

R. Thakur, R. Rabenseifner, and W. Gropp, Optimization of Collective Communication Operations in MPICH, International Journal of High Performance Computing Applications, vol.19, issue.1, pp.49-66, 2005.
DOI : 10.1177/1094342005051521

R. A. Van-de-geijn, On Global Combine Operations, Journal of Parallel and Distributed Computing, vol.22, issue.2, pp.324-328, 1994.
DOI : 10.1006/jpdc.1994.1091

T. White, Hadoop: The definitive guide, 2010.

M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica, Improving mapreduce performance in heterogeneous environments, Proc. of the 8th USENIX conf. on Operating systems design and implementation, pp.29-42, 2008.

R. N°-8315 and R. Centre-grenoble-?-rhône-alpes, Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria, pp.249-6399