. A. Wm, S. A. Wulf, and . Mckee, Hitting the memory wall : Implications of the obvious, Computer Architecture News, vol.23, issue.1, pp.20-24, 1995.

D. Geer, Chip makers turn to multicore processors, Computer, vol.38, issue.5, pp.11-13, 2005.
DOI : 10.1109/MC.2005.160

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer et al., Introduction to the Cell multiprocessor, IBM Journal of Research and Development, vol.49, issue.4.5, pp.589-604, 2005.
DOI : 10.1147/rd.494.0589

S. Borkar, N. P. Jouppi, and P. Stenstrom, Microprocessors in the Era of Terascale Integration, 2007 Design, Automation & Test in Europe Conference & Exhibition, pp.237-242, 2007.
DOI : 10.1109/DATE.2007.364597

F. Cappello and D. Etiemble, MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks, ACM/IEEE SC 2000 Conference (SC'00), p.12, 2000.
DOI : 10.1109/SC.2000.10001

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the cilk- 5 multithreaded language, PLDI '98 : Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pp.212-223, 1998.

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krügerkr¨krüger et al., A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, vol.7, issue.4, pp.80-113, 2007.
DOI : 10.1016/j.rti.2005.04.002

W. R. Mark, R. Steven-glanville, K. Akeley, and M. J. Kilgard, Cg : a system for programming graphics hardware in a c-like language, SIGGRAPH '03 : ACM SIG- GRAPH 2003 Papers, pp.896-907, 2003.

R. J. Rost, OpenGL(R) Shading Language, 2005.

A. Bleiweiss and A. Preetham, Ashli -advanced shading langage interface, 2003.

M. Mccool, S. Du-toit, T. Popa, B. Chan, and K. Moule, Shader algebra, SIGGRAPH '04 : ACM SIGGRAPH 2004 Papers, pp.787-795, 2004.

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian et al., Brook for gpus : stream computing on graphics hardware, SIG- GRAPH '04 : ACM SIGGRAPH 2004 Papers, pp.777-786, 2004.

P. Mccormick, J. Inman, J. Ahrens, J. Mohd-yusof, G. Roth et al., Scout: a data-parallel programming language for graphics processors, Parallel Computing, vol.33, issue.10-11, pp.10-11648, 2007.
DOI : 10.1016/j.parco.2007.09.001

A. E. Lefohn, S. Sengupta, J. Kniss, R. Strzodka, and J. D. Owens, Glift, ACM Transactions on Graphics, vol.25, issue.1, pp.60-99, 2006.
DOI : 10.1145/1122501.1122505

S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick, Lattice Boltzmann simulation optimization on leading multicore platforms, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536295
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.2646

C. H. Crawford, P. Henning, M. Kistler, and C. Wright, Accelerating computing with the cell broadband engine processor, Proceedings of the 2008 conference on Computing frontiers , CF '08, pp.3-12, 2008.
DOI : 10.1145/1366230.1366234

D. Kunzman, G. Zheng, E. Bohm, and L. V. Kalé, Charm++, Offload API, and the Cell Processor, Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism, 2006.

B. Bouzas, R. Cooper, J. Greene, M. Pepe, and M. J. Prelle, Multicore framework : An api for programming heterogeneous multicore processors, Proc. of First Workshop on Software Tools for Multi-Core Systems, 2006.

M. Morita, T. Machino, M. Guo, and G. Wang, Design and implementation of stream processing system and library for cell broadband engine processors, Proceeding (590) Parallel and Distributed Computing and Systems, 2007.

P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta, CellSs: a Programming Model for the Cell BE Architecture, ACM/IEEE SC 2006 Conference (SC'06), p.86, 2006.
DOI : 10.1109/SC.2006.17

M. D. Mccool, Data-parallel programming on the cell be and the gpu using the rapidmind development platform, 2006.

M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng, Merge : a programming model for heterogeneous multi-core systems, ASPLOS XIII, 2008.

M. Haldar, A. Nayak, A. Kanhere, P. Joisha, N. Shenoy et al., A library-based compiler to execute matlab programs on a heterogeneous platform

K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. Reiter-horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

B. Meister, R. Lethin, A. Leung, and E. Schweitz, R-stream : A parametric high level compiler

R. Dolbeau, S. Bihan, and F. Bodin, Hmpp : A hybrid multi-core parallel programming environment

J. Stratton, S. Stone, and W. Mei-hwu, MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs, 2008.
DOI : 10.1007/978-3-540-89740-8_2

S. Pakin, Receiver-initiated message passing over RDMA Networks, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536262

M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani, Mpi microtask for programming the cell broadband enginetm processor, IBM Syst. J, vol.45, issue.1, 2006.

R. , C. Whaley, A. Petitet, and J. J. Dongarra, Automated empirical optimizations of software and the ATLAS project, Parallel Computing, vol.27, issue.12, pp.3-35, 2001.

S. Moreaud and B. Goglin, Impact of NUMA Effects on High-Speed Networking with Multi-Opteron Machines, The 19th IASTED International Conference on Parallel and Distributed Computing and Systems, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00175747

S. Moreaud, Impact des architectures multiprocesseurs sur les communications dans les grappes de calcul : de l'exploration des effets numa au placement automatique, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00177495

S. Thibault, R. Namyst, and P. Wacrenier, Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, EuroPar, 2007.
DOI : 10.1007/978-3-540-74466-5_6
URL : https://hal.archives-ouvertes.fr/inria-00154506

O. Aumage, E. Brunet, N. Furmento, and R. Namyst, Newmadeleine : a fast communication scheduling engine for high performance networks, CAC 2007 : Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2007, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00122723

J. Sancho and D. Kerbyson, Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536316

D. Robert, C. E. Blumofe, and . Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol.46, issue.5, pp.720-748, 1999.

V. Danjean, R. Namyst, and P. Wacrenier, An Efficient Multi-level Trace Toolkit for Multi-threaded Applications, EuroPar, 2005.
DOI : 10.1007/11549468_21
URL : https://hal.archives-ouvertes.fr/hal-00360309

J. Garrigues, InitiationàInitiation`Initiationà la méthode deséléméntsdes´deséléménts finis, 2002.

G. Allaire, Analyse numérique et optimisation. ´ Editions de l' ´ Ecole Polytechnique, 2005.

L. Agbezuge, Finite element solution of the poisson equation with dirichlet boundary conditions in a rectangular domain