B. A. Shirazi, A. R. Hurson, and K. M. Kavi, Scheduling and load balancing in parallel and distributed systems, 1995.

M. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz, Datacutter: Middleware for filtering very large scientific datasets on archival storage systems, NASA conference publication. NASA, pp.119-134, 1998.

S. Hary and F. Ozguner, Precedence-constrained task allocation onto point-to-point networks for pipelined execution Parallel and Distributed Systems, IEEE Transactions on, vol.10, issue.8, pp.838-851, 1999.

Q. Wu and Y. Gu, Supporting Distributed Application Workflows in Heterogeneous Computing Environments, 2008 14th IEEE International Conference on Parallel and Distributed Systems, pp.3-10, 2008.
DOI : 10.1109/ICPADS.2008.40

Q. Wu, J. Gao, M. Zhu, N. Rao, J. Huang et al., Self-Adaptive Configuration of Visualization Pipeline Over Wide-Area Networks, IEEE Transactions on Computers, vol.57, issue.1, pp.55-68, 2008.
DOI : 10.1109/TC.2007.70777

D. Hochbaum, Approximation Algorithms for NP-hard Problems, 1997.

L. Epstein and R. Van-stee, Online bin packing with resource augmentation, Discrete Optimization, vol.4, issue.3-4, pp.322-333, 2007.
DOI : 10.1016/j.disopt.2007.09.004

URL : http://doi.org/10.1016/j.disopt.2007.09.004

V. Bharadwaj, D. Ghose, V. Mani, T. Robertazzi, J. Sohn et al., Scheduling Divisible Loads in Parallel and Distributed Systems Optimizing computing costs using divisible load analysis, IEEE Transactions on parallel and distributed systems, vol.9, issue.3, pp.225-234, 1996.

M. Hamdi and C. Lee, Dynamic load balancing of data parallel applications on a distributed network, Proceedings of the 9th international conference on Supercomputing , ICS '95, pp.170-179, 1995.
DOI : 10.1145/224538.224557

D. Altilar and Y. Paker, An optimal scheduling algorithm for parallel video processing, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241), 1998.
DOI : 10.1109/MMCS.1998.693650

M. Drozdowski, Selected Problems of Scheduling Tasks in Multiprocessor Computer Systems, 1998.

J. Blazewicz, M. Drozdowski, and M. Markiewicz, Divisible task scheduling ??? Concept and verification, Parallel Computing, vol.25, issue.1, pp.87-98, 1999.
DOI : 10.1016/S0167-8191(98)00104-5

R. Wang, A. Krishnamurthy, R. Martin, T. Anderson, and D. Culler, Modeling communication pipeline latency, Measurement and Modeling of Computer Systems (SIGMETRICS'98, pp.22-32, 1998.

M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz et al., A view of cloud computing, Communications of the ACM, vol.53, issue.4, pp.50-58, 2010.
DOI : 10.1145/1721654.1721672

W. Shih, S. Tseng, and C. Yang, Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce, 2010 International Conference on Information Science and Applications, pp.1-8, 2010.
DOI : 10.1109/ICISA.2010.5480515

G. Iyer, B. Veeravalli, and S. Krishnamoorthy, On handling large-scale polynomial multiplications in compute cloud environments using divisible load paradigm Aerospace and Electronic Systems, IEEE Transactions on, vol.48, issue.1, pp.820-831, 2012.

G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-spaccamela et al., Complexity and Approximation, 1999.
DOI : 10.1007/978-3-642-58412-1

URL : https://hal.archives-ouvertes.fr/hal-00906941

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica, Improving mapreduce performance in heterogeneous environments, Proceedings of the 8th USENIX conference on Operating systems design and implementation. USENIX Association, pp.29-42, 2008.

J. Berli´nskaberli´nska and M. Drozdowski, Scheduling divisible MapReduce computations, Journal of Parallel and Distributed Computing, vol.71, issue.3, pp.450-459, 2011.
DOI : 10.1016/j.jpdc.2010.12.004

T. White, Hadoop: The definitive guide, 2010.

S. Seo, E. Yoon, J. Kim, S. Jin, J. Kim et al., HAMA: An Efficient Matrix Computation with the MapReduce Framework, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp.721-726, 2010.
DOI : 10.1109/CloudCom.2010.17

O. Beaumont, L. Marchal, and Y. Robert, Scheduling Divisible Loads with Return Messages on Heterogeneous Master-Worker Platforms, High Performance Computing?HiPC, pp.498-507, 2005.
DOI : 10.1007/11602569_51

URL : https://hal.archives-ouvertes.fr/inria-00407398

O. Beaumont, L. Marchal, V. Rehn, and Y. Robert, FIFO scheduling of divisible loads with return messages under the one-port model, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, p.14, 2006.
DOI : 10.1109/IPDPS.2006.1639390

URL : https://hal.archives-ouvertes.fr/inria-00407383

O. Beaumont and A. Rosenberg, Link-heterogeneity vs. node-heterogeneity in clusters, 2010 International Conference on High Performance Computing, pp.1-8, 2010.
DOI : 10.1109/HIPC.2010.5713172

URL : https://hal.archives-ouvertes.fr/inria-00540578

J. T. Hung and T. Robertazzi, Scheduling nonlinear computational loads Aerospace and Electronic Systems, IEEE Transactions on, vol.44, issue.3, pp.1169-1182, 2008.

S. Suresh, A. Khayat, H. Kim, and T. Robertazzi, An analytical solution for scheduling nonlinear divisible loads for a single level tree network with a collective communication model, IEEE Transactions on Systems, Man, and Cybernetics, 2008.

S. Suresh, H. Kim, C. Run, T. Robertazzi, S. Suresh et al., Scheduling nonlinear divisible loads in a single level tree network The Journal of Supercomputing Scheduling second-order computational load in master-slave paradigm, Aerospace and Electronic Systems IEEE Transactions on, vol.48, issue.1, pp.1-21, 2011.

B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang, Mars, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.260-269, 2008.
DOI : 10.1145/1454115.1454152

K. Fatahalian, T. Knight, M. Houston, M. Erez, D. Horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), pp.4-4, 2006.
DOI : 10.1109/SC.2006.55

W. Frazer and A. Mckellar, Samplesort: A Sampling Approach to Minimal Storage Tree Sorting, Journal of the ACM, vol.17, issue.3, pp.496-507, 1970.
DOI : 10.1145/321592.321600

H. Li and K. Sevcik, Parallel sorting by over partitioning, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures , SPAA '94, pp.46-56, 1994.
DOI : 10.1145/181014.192329

G. Blelloch, C. Leiserson, B. Maggs, C. Plaxton, S. Smith et al., A comparison of sorting algorithms for the connection machine CM-2, Proceedings of the third annual ACM symposium on Parallel algorithms and architectures , SPAA '91, pp.3-16, 1991.
DOI : 10.1145/113379.113380

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002.
DOI : 10.1007/s00453-002-0962-9

URL : https://hal.archives-ouvertes.fr/hal-00807407

E. Solomonik and J. Demmel, Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms, Lecture Notes in Computer Science, vol.6853, pp.90-109, 2011.
DOI : 10.1007/978-3-642-23397-5_10

K. Goto and R. Van-de-geijn, Anatomy of highperformance matrix multiplication, ACM Trans. Math. Soft, vol.34, issue.3, p.12, 2008.

K. Matsumoto, N. Nakasato, T. Sakai, H. Yahagi, and S. G. Sedukhin, Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems, Procedia Computer Science, vol.4, issue.0, pp.342-351, 2011.
DOI : 10.1016/j.procs.2011.04.036

L. Blackford, J. Choi, A. Cleary, E. D. Azevedo, J. Demmel et al., ScaLAPACK user's guide. Siam Philadelphia, 1997.