P. Godard, V. Loechner, and C. Bastoul, « Efficient Out-of-core and Out-of-place Rectangular Matrix Transposition and Rotation ». En cours de soumission, IEEE Transactions on

P. Godard, V. Loechner, C. Bastoul, F. Soulier, and G. Muller, « A Flexible and Distributed Runtime System for High-Throughput Constrained Data Streams Generation, 20th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC) in conjunction with IPDPS, 2019.

P. G. , Échanges non bloquants de données ordonnées entre producteurs multiples et consommateur unique, Conférence d'informatique en Parallélisme, 2019.

P. Posters, V. Godard, C. Loechner, F. Bastoul, and . Soulier, 14th International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES). Fiuggi, Italie, juil, 2018.

P. Godard, V. Loechner, C. Bastoul, and F. Soulier, Journée Poster de l'École Doctorale de Mathématiques, Sciences de l'Information et de l'Ingénieur (MSII), 2017.

A. Martin, B. Paul, and C. Jianmin, A system for large-scale machine learning, 12th Symposium on Operating Systems Design and Implementation (USENIX), p.61, 2016.

A. Umut, . Acar, E. Guy, . Blelloch, D. Robert et al., « The data locality of work stealing, Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, p.62, 2000.

A. Adobe and . Print-engine, , p.15

A. Gabrielle, A. David, and F. Ian, The Cactus Worm : Experiments with dynamic resource discovery and allocation in a grid environment, The International Journal of High Performance Computing Applications, vol.15, p.99, 2001.

J. William, . Allen, O. Steven, and . Miller, Ink limiting in ink jet printing systems, US Patent, vol.5, p.16, 1997.

M. Gene and . Amdahl, « Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the, p.83, 1967.

F. Samuel, . Antao, B. Alexey, C. Arpith, and . Jacob, Offloading support for OpenMP in Clang and LLVM, Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC, p.57, 2016.

A. Ryo, O. Shuichi, N. Takashi, and M. Satoshi, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications. IEEE, p.57, 2011.

. The-apache-software-foundation.-apache and . Apex, , p.61

. The-apache-software-foundation.-apache and . Flink, , p.61

. The-apache-software-foundation.-apache and . Hadoop, , p.60

. The-apache-software-foundation.-apache and . Hbase, , p.60

. The-apache-software-foundation.-apache and . Hive, , p.60

. The-apache-software-foundation.-apache and . Spark, , p.60

. The-apache, . Foundation, and . Storm, , p.61

A. Cédric, T. Samuel, N. Raymond, W. Pierre-andré, and . Starpu, Concurrency and Computation : Practice and Experience, vol.23, p.57, 2011.

A. Cédric, A. Olivier, F. Nathalie, R. Namyst, T. Samuel et al., Task Programming over Clusters of Machines Enhanced with Accelerators, Recent Advances in the Message Passing Interface. Sous la dir. de Jesper Larsson TRÄFF, p.57, 2012.

A. Alok and S. Jeffrey, The Input/Output Complexity of Sorting and Related Problems, vol.31, p.32

A. Jan, P. Wah, and W. , Proceedings of 3rd IEEE International Conference on Image Processing, p.15, 1996.

B. Upendra, N. Purvi, and . Ramanuj, Enhanced max-min task scheduling algorithm in cloud computing, International Journal of Application or Innovation in Engineering and Management (IJAIEM), vol.2, issue.4, p.62, 2013.

B. Anton, A. Jemal, and B. Rajkumar, « Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing, Future generation computer systems, vol.28, p.62, 2012.

B. Guruduth, C. Tushar, and M. Bodhi, « An efficient multicast protocol for content-based publish-subscribe systems, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No. 99CB37003). IEEE, p.77, 1999.

B. Greg, « A method for implementing lock-free shared data structures, p.64, 1994.

B. Cedric, « Code generation in the polyhedral model is easier than you think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.58, 2004.

P. Daniel, . Bovet, and C. Marco, Understanding the Linux Kernel : from I/O ports to process management, vol.40, p.30, 2005.

B. Michael, « The determination of metameric mismatch limits of industrial colorant sets, p.14, 1985.

T. Gregory, . Byrd, J. Michael, and . Flynn, « Producer-consumer communication in distributed shared memory multiprocessors, Proceedings of the IEEE, vol.87, p.64, 1999.

B. Roberto and H. Torsten, « Notified access : Extending remote memory access programming models for producer-consumer synchronization, IEEE International Parallel and Distributed Processing Symposium, p.65, 2015.

B. Timo, A. Michael, and J. Emanuel, Thrill : High-performance algorithmic distributed batch data processing with C++, 2016 IEEE International Conference on Big Data (Big Data), p.60, 2016.

B. Motti and K. Michael, « Performance evaluation of the RDMA over ethernet (RoCE) standard in enterprise data centers infrastructure, Proceedings of the 3rd Workshop on Data Center-Converged and Virtual Ethernet Switching. International Teletraffic Congress, p.65, 2011.

D. Robert, . Blumofe, E. Charles, and . Leiserson, « Scheduling multithreaded computations by work stealing, Journal of the ACM (JACM), vol.46, p.62, 1999.

B. Uday, H. Albert, R. Jagannathan, and S. Ponnuswamy, Acm Sigplan Notices. T. 43. 6. ACM, p.59, 2008.

B. Dhruba, HDFS architecture guide, vol.53, p.60, 2008.

B. George, B. Aurelien, and D. Anthony, A generic distributed DAG engine for high performance computing, Parallel Computing, vol.38, p.63, 2012.

B. Muthu-manikandan, J. Ramanujam, and P. Sadayappan, Automatic C-to-CUDA code generation for affine programs, International Conference on Compiler Construction, p.59, 2010.

C. Ignacio, A. Srinivas, and K. Arvind, « Characterizing private clouds : A large-scale empirical analysis of enterprise clusters, Proceedings of the Seventh ACM Symposium on Cloud Computing. ACM, p.70, 2016.

C. Irina, D. Dave, and L. Yossi, NUMA-aware reader-writer locks, ACM SIGPLAN Notices. T. 48. 8. ACM. 2013, p.64

. Caldera and . Caldera, , p.41, 2018.

. Caldera and . Caldera, , p.143

C. Paris, K. Asterios, and E. Stephan, Apache flink : Stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol.36, issue.4, p.61, 2015.

C. Liqun, B. John, . Carter, and D. Donglai, « An adaptive cache coherence protocol optimized for producer-consumer sharing, IEEE 13th International Symposium on High Performance Computer Architecture. IEEE, p.64, 2007.

C. Chun, C. Jacqueline, and H. Mary, CHiLL : A framework for composing high-level loop transformations, Rapp. tech. Citeseer, p.59, 2008.

. Bibliographie,

C. David, L. Bradford, . Chamberlain, P. Hans, and . Zima, « The cascade high productivity language, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, p.58, 2004.

S. Jeffrey, . Chase, J. Andrew, . Gallatin, G. Kenneth et al., End system optimizations for high-speed TCP, IEEE Communications Magazine, vol.39, p.117, 2001.

C. Philippe, G. Christian, and S. Vijay, X10 : an object-oriented approach to non-uniform cluster computing, Acm Sigplan Notices. T. 40. 10. ACM, p.58, 2005.

C. Quan, G. Minyi, and D. Qianni, HAT : history-based auto-tuning MapReduce in heterogeneous environments, The Journal of Supercomputing, vol.64, p.63, 2013.

C. Chun, « Polyhedra scanning revisited, Conference on Programming Language Design and Implementation, p.58, 2012.

C. Bobby, M. Carl, P. T. Fu, C. , A. Joseph et al., Method for managing metamerism of color merchandise. US Patent 8,330,991, p.14, 2012.

C. Charisee, K. Gordon, R. John, L. Samuels, S. Nick et al., , p.59, 2012.

C. Mosharaf, Z. Matei, M. A. Justin, I. Michael, and S. Jordan-et-ion, « Managing data transfers in computer clusters with orchestra, ACM SIGCOMM Computer Communication Review, vol.41, p.99, 2011.

. Commission and . De-l'éclairage, ISO/CIE 10527-1991 : Colorimetric Observers

C. Simon, The Bip Buffer -The Circular Buffer with a Twist, 2014.

J. Richard, . Cloutier, E. Donald, and . Thomas, « The combination of scheduling, allocation, and mapping in a single algorithm, 27th ACM/IEEE design automation conference, p.89, 1990.

D. Benoit, F. Timur, and C. Mark, International Workshop on Passive and Active Network Measurement, p.99, 2005.

J. Dean, G. Sanjay, and . Mapreduce, Simplified Data Processing on Large Clusters, Communications of the ACM, vol.51, issue.1, p.60, 2008.

D. Jeffrey and G. Sanjay, System and method for efficient large-scale data processing, p.59, 2010.

D. James, L. Brian, S. Ponnuswamy, K. Sriram, and N. Jarek, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, p.62, 2009.

D. U. Jianzhong, Y. Joseph, and . Leung, Complexity of scheduling parallel task systems, SIAM Journal on Discrete Mathematics, vol.2, issue.4, p.99, 1989.

D. Ulrich, Futexes are tricky, Rapp. tech. Citeseer, p.65, 2005.

J. O. Eklundh, A Fast Computer Method for Matrix Transposing, IEEE Transactions on Computers 21.7 (juil. 1972), p.32

E. Kobra and M. Naghibzadeh, « A min-min max-min selective algorihtm for grid task scheduling, Central Asia on Internet. IEEE, p.62, 2007.

O. M. Elzeki, M. Z. Reshad, and M. A. Elsoud, « Improved max-min algorithm in cloud computing, International Journal of Computer Applications, vol.50, p.62, 2012.

E. Tarek, S. Et-lauren, and . Upc, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p.58, 2006.

P. Th, E. Pascal, A. Felber, G. Rachid, and K. Anne-marie, The many faces of publish/subscribe, vol.35, p.72, 2003.

G. Dror, . Feitelson, R. Larry, S. Uwe, C. Kenneth et al., « Theory and practice in parallel job scheduling, Workshop on Job Scheduling Strategies for Parallel Processing, p.61, 1997.

F. Keir and H. Tim, Concurrent programming without locks, vol.25, p.64, 2007.

F. Ian, T. Nicholas, and . Karonis, « A grid-enabled MPI : Message passing in heterogeneous distributed computing systems, SC'98 : Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, p.57, 1998.

F. Paul and L. Christian, Encyclopedia of parallel computing, p.58, 2011.

G. Dror, . Feitelson, and R. Larry, « Toward convergence in job schedulers for parallel supercomputers, Workshop on Job Scheduling Strategies for Parallel Processing, p.61, 1996.

G. Dror, . Feitelson, and R. Larry, Workshop on Job Scheduling Strategies for Parallel Processing, p.61, 1998.

F. Ian, R. Alain, and S. Volker, « A quality of service architecture that combines resource reservation and application adaptation, Eighth International Workshop on Quality of Service. IWQoS, p.117, 2000.

G. Edgar, E. Graham, G. Fagg, and . Bosilca, « Open MPI : Goals, concept, and design of a next generation MPI implementation, European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting, p.57, 2004.

G. Thierry, V. F. Joao, N. Lima, . Maillard, R. Bruno et al., A runtime system for data-flow task programming on heterogeneous architectures, p.63, 2013.

. Bibliographie,

G. Fred, K. Lars, and K. Bo, « Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion, ACM Trans. Math. Softw, vol.38, issue.3, p.32, 2012.

A. Luigi, . Grieco, and M. Saverio, « Performance evaluation and comparison of West-wood+, New Reno, and Vegas TCP congestion control, ACM SIGCOMM Computer Communication Review, vol.34, issue.2, p.116, 2004.

P. Godard, V. Loechner, C. Bastoul, F. Soulier, and G. Muller, « A Flexible and Distributed Runtime System for High-Throughput Constrained Data Streams Generation, 20th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC) in conjunction with IPDPS, vol.147, p.67

P. G. , Échanges non bloquants de données ordonnées entre producteurs multiples et consommateur unique, Conférence d'informatique en Parallélisme, vol.147, p.101

G. Inigo, J. Ferran, and R. Nou, Energy-aware scheduling in virtualized datacenters, 2010 IEEE International Conference on Cluster Computing. IEEE. 2010, p.62

G. Pedro, C. Ribeiro, F. Paulo, and . Heimdhal, A history-based policy engine for grids, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06). T. 1. IEEE, vol.8, p.63, 2006.

G. Anders, S. Håkan, and T. Philippas, « Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency, International Conference On Principles Of Distributed Systems, p.64, 2010.

L. John, . Gustafson, and . Reevaluating, Amdahl's law, Communications of the ACM, vol.31, p.83, 1988.

E. Guy, A. Schalnat, J. Dilger, G. Bowler, C. Randers-pehrson et al., , p.15

H. Mark, « Optimizing parallel reduction in CUDA, Nvidia developer technology, vol.2, p.57, 2007.

H. Matthias, K. Odej, A. Keller, and S. Achim, Scheduling in HPC resource management systems : Queuing vs. planning, p.63, 2003.

H. Mohammad, F. Majd, and . Sakr, « Locality-aware reduce task scheduling for Ma-pReduce, 2011 IEEE Third International Conference on Cloud Computing Technology and Science, p.62, 2011.

H. E. Xiaoshan, S. Xianhe, V. Gregor, and . Laszewski, « QoS guided min-min heuristic for grid task scheduling, Journal of Computer Science and Technology, vol.18, p.62, 2003.

C. Wilson, . Hsieh, E. William, and . Weihl, « Scalable reader-writer locks for parallel systems, Proceedings Sixth International Parallel Processing Symposium, p.64, 1992.

. Bibliographie,

. Jk-iliffe, « The use of the genie system in numerical calculation, International Tracts in Computer Science and Technology and Their Application. T. 2, p.29, 1961.

J. Alexandra, C. Philippe, D. Jean-françois, L. Vincent, J. Manuel et al., « Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons, International Journal of Parallel Programming, vol.42, p.59, 2014.

R. Jimenez-peris, M. Patino-martinez, and S. Arevalo, « Multithreaded rendezvous : a design pattern for distributed rendezvous, p.64, 1999.

K. Rashid, B. Rajkishore, and S. Tatiana, Adaptive heterogeneous scheduling for integrated GPUs, 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), vol.99, p.62, 2014.

S. D. Kaushik, C. Huang, R. W. Johnson, P. Sadayappan, and J. R. Johnson, « Efficient Transposition Algorithms for Large Matrices, Proceedings of the 1993 ACM/IEEE Conference on Supercomputing. Supercomputing '93, p.32, 1993.

K. Dzmitry, B. Pascal, and K. Samee-ullah, « DENS : data center energyefficient network-aware scheduling, Cluster computing, vol.16, p.62, 2013.

H. Alan, . Karp, P. Horace, and . Flatt, « Measuring parallel processor performance, Communications of the ACM, vol.33, p.83, 1990.

H. Charles, . Koelbel, B. David, . Loveman, S. Robert et al., The high performance Fortran handbook, p.58, 1994.

K. Hermann, « Event-triggered versus time-triggered real-time systems, Operating Systems of the 90s and Beyond, p.91, 1991.

S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, and P. Sadayappan, « Efficient parallel out-of-core matrix transposition, Proceedings IEEE International Conference on Cluster Computing, p.32, 2003.

K. Sriram, B. Gerald, C. Daniel, C. Lam, and S. Ponnuswamy, « Efficient parallel out-of-core matrix transposition, International Journal on High Performance Computing and Networking, vol.2, issue.4, p.32, 2006.

K. Jacek and S. Rafal, Completely Fair Scheduler and its tuning, Rapp. tech, p.62, 2009.

T. Nicholas, B. T. Karonis, F. Et-ian, and . Mpich-g2, A grid-enabled implementation of the message passing interface, Journal of Parallel and Distributed Computing, vol.63, p.57, 2003.

L. Katrina, D. Shuo, A. Goyal, B. Hari, and . Choreo, Networkaware task placement for cloud applications, Proceedings of the 2013 conference on Internet measurement conference, p.63, 2013.

L. Leslie, « A fast mutual exclusion algorithm, ACM Trans. Comput. Syst. 5, vol.1, p.64, 1987.

. Bibliographie,

P. C. Patrick, . Lee, B. U. Tian, and C. Girish, « A lock-free, cache-efficient shared ring buffer for multi-core architectures, Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, p.64, 2009.

N. Minh, L. Ê. , A. Guatto, A. Cohen, and P. Antoniu, « Correct and efficient bounded FIFO queues, 25th International Symposium on Computer Architecture and High Performance Computing. IEEE. 2013, p.64, 2013.

T. Martin, L. , C. Gonner, S. Klaus, and . Survey, Interpolation methods in medical image processing, IEEE transactions on medical imaging, vol.18, p.15, 1999.

L. Man-pages, Linux Programmer's Manual -TCP protocol -tcp

L. Ying, H. Lei, and W. U. Mingchuan, PPOpenCL : a performance-portable OpenCL compiler with host and kernel thread code fusion, Proceedings of the 28th International Conference on Compiler Construction, p.57, 2019.

J. Karel, L. Ahg-rinnooy, K. Peter, and B. , « Complexity of machine scheduling problems, Annals of discrete mathematics. T. 1, p.54, 1977.

S. Li, M. A. Maddah-ali, and A. S. Avestimehr.-«-coded-mapreduce, 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), p.60, 2015.

L. Birger, M. Peter, D. Rainer, and . Oscar, Optimum simultaneous scheduling, allocation and resource binding based on integer programming, p.89, 1994.

L. Karthik and R. Dionisio-de-niz-et-ragunathan, « Coordinated task scheduling, allocation and synchronization on multiprocessors, 30th IEEE Real-Time Systems Symposium, p.89, 2009.

L. I. Xin, T. Michael, and . Orchard, « New edge-directed interpolation, IEEE transactions on image processing, vol.10, p.15, 2001.

L. Vincent, PolyLib : A library for manipulating parameterized polyhedra, p.58

C. John, . Lin, P. «. Sanjoy, and . Rmtp, A reliable multicast transport protocol, Proceedings of IEEE INFOCOM'96. Conference on Computer Communications. T. 3. IEEE, p.99, 1996.

L. John, S. Lui, and D. Yuqin, « The rate monotonic scheduling algorithm : Exact characterization and average case behavior, Proceedings. Real-Time Systems Symposium. IEEE, p.62, 1989.

L. Miao, K. Dhabaleswar, . Panda, Z. Khaled, and I. Et-costin, « Congestion avoidance on manycore high performance computing systems, Proceedings of the 26th ACM international conference on Supercomputing, p.98, 2012.

L. Ronnier, Encyclopedia of color science and technology

L. Jiuxing, W. U. Jiesheng, K. Dhabaleswar, and . Panda, « High performance RDMA-based MPI implementation over InfiniBand, International Journal of Parallel Programming, vol.32, p.65, 2004.

K. Lakshman, Y. Raj, and F. Raphael, « Integrated CPU and network-I/O QoS management in an endsystem, Computer Communications, vol.21, p.117, 1998.

M. Rafael, J. Fadi, M. Kurdahi, and . Fernández, A framework for reconfigurable computing : task scheduling and context management, vol.9, p.63, 2001.

M. Keith, The colour science of dyes and pigments, p.14, 1986.

M. Xiangrui, B. Joseph, and Y. Burak, Machine learning in apache spark, The Journal of Machine Learning Research, vol.17, p.60, 2016.

M. Ján, Color gamut mapping. T. 10, p.16, 2008.

, Message-Passing Interface (MPI) Standard, Version 3, MPI FORUM, p.57

M. John, . Mellor-crummey, L. Michael, and . Scott, « Scalable reader-writer synchronization for shared-memory multiprocessors, ACM SIGPLAN Notices. T. 26. 7. ACM, p.64, 1991.

J. Chris, . Newburn, D. Serguei, and N. Ravi, Offload compiler runtime for the Intel® Xeon Phi coprocessor, 2013 IEEE International Symposium on Parallel & Distributed Processing, p.57, 2013.

O. Victor, R. David, and H. , « Stochastic clustered-dot dithering, Journal of electronic imaging, vol.8, p.17, 1999.

O. Gerald and N. Yasunori, Moiré patterns, vol.208, p.17, 1963.

. Openmp-arb, The OpenMP API 5.0 specification for parallel programming, vol.56, p.35, 2018.

S. Vivek, . Pai, A. Mohit, and B. Gaurov, Locality-aware request distribution in cluster-based network servers, ACM Sigplan Notices, vol.33, p.63, 1998.

P. Balaji, S. Aameek, L. Ling, and J. Bhushan, « Purlieus : locality-aware resource allocation for MapReduce in a cloud, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p.62, 2011.

F. John, . Patterson, D. Ralph, . Hill, L. Steven et al., An architecture for synchronous multi-user applications, Proceedings of the 1990 ACM conference on Computer-supported cooperative work, p.64, 1990.

. Bibliographie,

P. Jonathan, O. Amy, H. Balakrishnan, D. Shah, F. Hans et al., A centralized zero-queue datacenter network, ACM SIGCOMM Computer Communication Review, vol.44, p.62, 2015.

P. Anthony, V. Robert, . Kenyon, E. Donald, and . Troxel, « Comparison of interpolating methods for image resampling, IEEE Transactions on medical imaging, vol.2, p.15, 1983.

P. Jorda, C. Claris, and C. David, Resource-aware adaptive scheduling for mapreduce clusters, Proceedings of the 12th International Middleware Conference. International Federation for Information Processing, p.89, 2011.

P. Ioan, Method for generating customized ink/media transforms, p.16, 2007.

P. Harsh, R. Manas, and P. Aniket, « Introduction to real-time processing in Apache Apex, Int. J. Res. Advent Technol, p.61, 2016.

Q. Yue, Color conversion with toner/ink limitations. US Patent 8,395,831, p.16, 2013.

J. Mohammad, . Rashti, and A. Ahmad, 10-Gigabit iWARP Ethernet : comparative performance analysis with InfiniBand and Myrinet-10G, p.65, 2007.

J. Connelly, B. Andrew, and A. , Halide : a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, Acm Sigplan Notices. T. 48. 6. ACM, p.59, 2013.

R. Erik, A. Michael, G. Bruce, and S. Peter, Color transfer between images, vol.21, p.16, 2001.

R. Braden.-t/tcp--tcp, Extensions for Transactions Functional Specification, p.116

J. , C. V. Paxson-m.-allman, and M. Sargent, Computing TCP's Retransmission Timer, p.114

M. Scharf and A. Ford, Multipath TCP (MPTCP) Application Interface Considerations, p.117

V. , J. D. Borman-b.-braden, and E. R. Scheffenegger, TCP Extensions for High Performance, p.116

A. R. Ghanwani, . Krishnan-l, and . Yong, Mechanisms for Optimizing Link Aggregation Group (LAG) and Equal-Cost Multipath (ECMP) Component Link Utilization in Networks, p.117

N. John, Congestion Control in IP/TCP Internetworks, p.116

R. Mahesh, J. Holewinski, G. Vinod, and . Forma, A DSL for image processing applications to target GPUs and multi-core CPUs, Proceedings of the 8th Workshop on General Purpose Processing using GPUs, p.59, 2015.

R. Rolf, G. Hager, and J. Gabriele, « Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes, p.57, 2009.

R. Rajesh, L. Miron, S. Marvin, and . Matchmaking, Distributed resource management for high throughput computing, 17th International Symposium on High Performance Distributed Computing (HPDC), vol.69, p.63, 1998.

R. Amitabha, M. Ivo, Z. Willy, and . «-x-stream, Edge-centric graph processing using streaming partitions, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, p.61, 2013.

J. Christopher, . Rossbach, Y. U. Yuan, C. Jon, J. Martin et al., « Dandelion : a compiler and runtime for heterogeneous systems, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, p.58, 2013.

R. Krithi, J. A. Stankovic, and Z. Wei, « Distributed scheduling of tasks with deadlines and resource requirements, IEEE Transactions on Computers, vol.38, p.63, 1989.

L. Rizzo and V. Lorenzo, A Reliable Multicast data Distribution Protocol based on software FEC techniques, p.99, 1997.

S. Leffler and . Others, Home of the LibTIFF software, p.15

S. Marco, C. Giorgio, and . Buttazzo, « Efficient Aperiodic Service Under Earliest Deadline Scheduling, p.62, 1994.

F. Siqueira and C. Vinny, Delivering QoS in open distributed systems, Proceedings 7th IEEE Workshop on Future Trends of Distributed Computing Systems. IEEE, p.117, 1999.

S. Luis, C. Daniel, and C. Raphael, « Plb-hec : A profile-based loadbalancing algorithm for heterogeneous cpu-gpu clusters, 2015 IEEE International Conference on Cluster Computing, p.99, 2015.

S. Günter, « Scheduling with limited machine availability, European Journal of Operational Research, vol.121, p.54, 2000.

S. Oren, C. Philip, N. Nasibeh, Q. Zhuo, M. Martin et al., A unified programming framework for accelerators on heterogeneous clusters, p.60, 2015.

S. Richard, B. Fenner, M. Andrew, . Rudoff, . Unix-network et al., The Sockets Networking API. T, vol.1, p.113, 2004.

S. Bianca and G. Garth, « A large-scale study of failures in high-performance computing systems, IEEE transactions on Dependable and Secure Computing, vol.7, p.70, 2009.

E. John, D. Stone, . Gohara, S. Guochun, and . Opencl, A parallel programming standard for heterogeneous computing systems, Computing in science & engineering, vol.12, p.57, 2010.

. Bibliographie,

J. Sanders and K. Edward, CUDA by example : an introduction to general-purpose GPU programming, p.57, 2010.

M. Andreas, S. Markus, and O. , Storage and retrieval for image and video databases III. T. 2420. International Society for Optics et Photonics, p.16, 1995.

S. Jinwoo and V. K. Prasanna, An efficient algorithm for out-of-core matrix transposition, IEEE Transactions on Computers, vol.51, p.32

S. Adam and T. Marek, « RDMA communication based on rotating buffers for efficient parallel fine-grain computations, International Conference on Parallel Processing and Applied Mathematics, p.65, 2003.

S. Michel, F. Christian, L. Sam, and D. Christophe, « Generating performance portable code using rewrite rules : from high-level functional expressions to high-performance OpenCL code, ACM SIGPLAN Notices, vol.50, p.57, 2015.

S. Tim, An introduction to the Partitioned Global Address Space (PGAS) programming model. Connexions, p.58, 2009.

S. Madhavapeddi and V. George, « Efficient fair queuing using deficit round-robin, IEEE/ACM Transactions on networking, vol.3, p.62, 1996.

T. Paulo, « Event-triggered real-time scheduling of stabilizing control tasks, IEEE Transactions on Automatic Control, vol.52, p.91, 2007.

T. Philippe, B. Thierry, and U. Michael, Handbook of medical imaging, processing and analysis, vol.1, p.15, 2000.

T. Ashish, J. Sen, S. Namit, and J. , Hive : a warehousing solution over a map-reduce framework, Proceedings of the VLDB Endowment, vol.2, p.60, 2009.

T. Haluk, H. Salim, and M. Wu, « Performance-effective and lowcomplexity task scheduling for heterogeneous computing, vol.13, p.99, 2002.

T. Massimo, Single-producer/single-consumer queues on shared cache multi-core systems, p.64, 2010.

S. Ay?egül-toptal-et-ihsan, Distributed scheduling : a review of concepts and applications, International Journal of Production Research, vol.48, p.63, 2010.

T. Risi and Y. Et-jun, « Permuting Data on Random-access Block Storage, Proceedings of the VLDB Endowment, vol.6, p.32

D. U. Jeffrey, NP-complete scheduling problems, Journal of Computer and System sciences, vol.10, issue.3, p.54, 1975.

D. John and . Valois, Lock-free linked lists using compare-and-swap, PODC. T. 95, p.64, 1995.

V. Kumar, V. , A. C. Murthy, and C. D. , Apache Hadoop YARN : Yet Another Resource Negotiator, Proceedings of the 4th Annual Symposium on Cloud Computing. SOCC '13, vol.5, p.63, 2013.

V. Shivaram, P. Aurojit, and O. Kay, Fast and adaptable stream processing at scale, Proceedings of the 26th Symposium on Operating Systems Principles, p.61, 2017.

V. Sven, C. Juan, A. Juega, and . Cohen, « Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization (TACO), vol.9, p.59, 2013.

V. Sven, F. De-komei, H. Joris, J. Michael, and T. Nobuki, An Integer Set Library for the Polyhedral Model, Mathematical Software (ICMS'10), vol.6327, p.58, 2010.

V. Mehul-nalin, Proceedings of 2011 International Conference on Computer Science and Network Technology. T. 1. IEEE, p.60, 2011.

C. David and . Walden, Systems for Interprocess Communication in a Resource Sharing Computer Network, vol.15, p.64, 1972.

W. Lizhe, V. Gregor, J. Laszewski, . Dayal, and W. Fugang, « Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with DVFS, Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, p.62, 2010.

C. Mary and . Whitton, Memory design for raster graphics displays, IEEE Computer Graphics and Applications, vol.4, p.64, 1984.

W. Sandra, T. Christian, C. James, . Beyer, S. Matthias et al., « A patternbased comparison of OpenACC and OpenMP for accelerator computing, European Conference on Parallel Processing, p.57, 2014.

W. Chee-siang, I. T. , R. Deena, K. Et-fun, and W. , Towards achieving fairness in the Linux scheduler, vol.42, p.62, 2008.

W. Tom and T. Don, Getting started with posix threads, Rapp. tech. University of Massachusetts at Amherst, p.65, 1995.

Y. Jae-heon, H. Jams, and . Anderson, « A fast, scalable mutual exclusion algorithm, Distributed Computing, vol.9, p.64, 1995.

Y. «. Thomas and . Ii, On the theory of light and colours, Royal Society of London, vol.92, p.16, 1802.

Z. Matei, B. Dhruba, S. Joydeep, and . Sarma, « Delay scheduling : a simple technique for achieving locality and fairness in cluster scheduling, Proceedings of the 5th European conference on Computer systems, p.62, 2010.

Z. Matei, C. Mosharaf, and D. Tathagata, « Resilient distributed datasets : A fault-tolerant abstraction for in-memory cluster computing, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, p.60, 2012.

. Bibliographie,

Z. Matei, S. Reynold, . Xin, and W. Patrick, Apache spark : a unified engine for big data processing, Communications of the ACM, vol.59, p.60, 2016.

A. Z. , Restructuring and implementations of 2D matrix transpose algorithm using SSE4 vector instructions, Computer Science and Engineering (ICAR), p.32, 2015.

. Christian-zinner, K. «. Wilfried, and . Ros-, DMA : a DMA double buffering method for embedded image processing with resource optimized slicing, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06), p.64, 2006.

, Temps d'exécution du programme cp(1) en fonction de la taille de la matrice et de la configuration des mémoires de masse

, Temps d'exécution des différentes implémentations sur la configuration HDD, p.44

, Temps d'exécution des différentes implémentations sur la configuration hybride HDD

, Temps d'exécution des différentes implémentations sur la configuration SSD, p.45

, Vue d'ensemble et répartition des rôles des trois éléments de notre solution, avec T pour Travail, S pour Scheduler, P pour Producer et C pour Consumer, p.68

, Représentation des différentes topologies possibles entre les producteurs et les consommateurs, avec T pour Travail, S pour Scheduler, P pour Producer et C pour Consumer

, Chronologie des messages échangés pour l'identification des producteurs et des consommateurs auprès de l'ordonnanceur, avec S pour Scheduler, C pour Consumer et P pour Producer

, Chronologie des messages échangés depuis la réception d'un travail jusqu'à sa terminaison en passant par son découpage en tâches successives (Preload, Compute, Expel, et Cleanup), avec U pour Utilisateur, S pour Scheduler, C pour Consumer, P pour Producer

. .. , Organisation interne de notre solution avec mise en avant du chemin à faible latence (partie gauche) pilotant celui à haut débit (partie droite), p.77

, Graphe orienté présentant l'enchaînement possible des tâches sur un producteur pour un travail donné, avec P pour Preload, C pour Compute, X pour Expel et F pour Cleanup

, Pseudo-code d'assignation d'un nouveau travail sur les producteurs du cluster, p.90