J. A. Ang, R. F. Barrett, R. E. Benner, D. Burke, C. Chan et al., Abstract Machine Models and Proxy Architectures for Exascale Computing, 2014 Hardware-Software Co-Design for High Performance Computing, pp.25-32, 2014.
DOI : 10.1109/Co-HPC.2014.4

URL : http://cacs.usc.edu/education/cs596/ExcascaleMachineModelsArchitectures.pdf

. Barcelona-supercomputing and . Center, Extrae: a Paraver trace-files generator

. Barcelona-supercomputing and . Center, Paraver: a flexible performance analysis tool

. Barcelona-supercomputing and . Center, The OmpSs Programming Model

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-66
DOI : 10.1109/SC.2012.71

URL : http://theory.stanford.edu/%7Eaiken/publications/papers/sc12.pdf

M. Besta and T. Hoefler, Slim Fly: A Cost Effective Low-Diameter Network Topology, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.348-359, 2014.
DOI : 10.1109/SC.2014.34

URL : http://unixer.de/publications/img/sf_sc_2014.pdf

G. Bikshandi, J. Guo, D. Hoeflinger, G. Almasi, B. B. Fraguela et al., Programming for parallelism and locality with hierarchically tiled arrays, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.48-57, 2006.
DOI : 10.1145/1122971.1122981

URL : http://polaris.cs.uiuc.edu/~garzaran/doc/ppopp06.pdf

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pp.101-113, 2008.
DOI : 10.1145/1379022.1375595

URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf

P. J. Braam, The lustre storage architecture, 2003.

N. Capit, G. Da-costa, Y. Georgiou, G. Huard, C. Martin et al., A batch scheduler with high level components, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., pp.776-783, 2005.
DOI : 10.1109/CCGRID.2005.1558641

URL : https://hal.archives-ouvertes.fr/hal-00005106

P. H. Carns, W. B. Ligon, I. , R. B. Ross, and R. Thakur, PVFS: A parallel file system for linux clusters, Proceedings of the 4th Annual Linux Showcase and Conference, pp.317-327, 2000.

J. C. Carver, Software Engineering for Computational Science and Engineering, Computing in Science & Engineering, vol.14, issue.2, pp.8-11, 2012.
DOI : 10.1109/MCSE.2012.31

B. L. Chamberlain, D. Callahan, and H. P. Zima, Parallel Programmability and the Chapel Language, The International Journal of High Performance Computing Applications, vol.10, issue.11, pp.291-312, 2007.
DOI : 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H

URL : http://www.cs.utexas.edu/%7Elin/cs380p/chapel07.pdf

B. L. Chamberlain, S. Choi, E. C. Lewis, C. Lin, L. Snyder et al., ZPL: a machine independent programming language for parallel computers, IEEE Transactions on Software Engineering, vol.26, issue.3, pp.197-211, 2000.
DOI : 10.1109/32.842947

URL : http://www.cs.utexas.edu/users/lin/papers/tse99.pdf

M. Dorier, G. Antoniu, R. Ross, D. Kimpe, and S. Ibrahim, CAL- CioM: Mitigating I/O interference in HPC systems through crossapplication coordination, Proceedings of the International Parallel and Distributed Processing Symposium, 2014.
DOI : 10.1109/ipdps.2014.27

URL : http://hal.inria.fr/docs/00/91/60/91/PDF/CALCioM.pdf

A. Dubey, A. Almgren, J. Bell, M. Berzins, S. Brandt et al., A survey of high level frameworks in block-structured adaptive mesh refinement packages, Journal of Parallel and Distributed Computing, vol.74, issue.12, pp.743217-3227, 2014.
DOI : 10.1016/j.jpdc.2014.07.001

A. Dubey, S. Brandt, R. Brower, M. Giles, P. Hovland et al., Software abstractions and methodologies for hpc simulation codes on future architectures, Journal of Open Research Software, vol.2, issue.1, p.2014

H. C. Edwards, C. R. Trott, D. Sunderland, M. Frigo, C. E. Leiserson et al., Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Cache-oblivious algorithms, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039), pp.285-297, 1999.

T. Fuchs and K. Fuerlinger, Expressing and exploiting multidimensional locality in DASH, Proceedings of the SPPEXA Symposium 2016, 2016.
DOI : 10.1007/978-3-319-40528-5_15

K. Fuerlinger, C. Glass, J. Gracia, A. Knüpferkn¨knüpfer, J. Tao et al., DASH: Data Structures and Algorithms with Support for Hierarchical Locality, Euro-Par Workshops, 2014.
DOI : 10.1007/978-3-319-14313-2_46

M. Garland, M. Kudlur, and Y. Zheng, Designing a unified programming model for heterogeneous machines, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012.
DOI : 10.1109/SC.2012.48

URL : http://mgarland.org/files/papers/phalanx-sc12-preprint.pdf

M. Geimer, F. Wolf, B. J. Wylie, E. Abrahám, D. Becker et al., The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, pp.702-719, 2010.
DOI : 10.1002/cpe.1556

URL : http://www.fz-juelich.de/jsc/hgfgroup/show_attach.php?pubid=31

B. Goglin, Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc), 2014 International Conference on High Performance Computing & Simulation (HPCS), 2014.
DOI : 10.1109/HPCSim.2014.6903671

URL : https://hal.archives-ouvertes.fr/hal-00985096

B. Goglin, J. Hursey, and J. M. Squyres, Netloc: Towards a Comprehensive View of the HPC System Topology, 2014 43rd International Conference on Parallel Processing Workshops, 2014.
DOI : 10.1109/ICPPW.2014.38

URL : https://hal.archives-ouvertes.fr/hal-01010599

T. Grosser and T. Hoefler, Polly-ACC Transparent compilation to heterogeneous hardware, Proceedings of the 2016 International Conference on Supercomputing, ICS '16, 2016.
DOI : 10.1109/MCSE.2010.69

T. Gysi, T. Grosser, and T. Hoefler, MODESTO, Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pp.177-186, 2015.
DOI : 10.1145/1498765.1498785

R. L. Henderson, Job scheduling under the Portable Batch System, Job scheduling strategies for parallel processing, pp.279-294, 1995.
DOI : 10.1007/3-540-60153-8_34

L. Hochstein and V. R. Basili, The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development, Computer, vol.41, issue.3, pp.50-58, 2008.
DOI : 10.1109/MC.2008.101

T. Hoefler, J. Dinan, D. Buntinas, P. Balaji, B. Barrett et al., MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory, Computing, vol.14, issue.1, pp.10-1007, 2013.
DOI : 10.1109/MCSE.2010.122

R. Hornung and J. Keasler, The raja portability layer: Overview and status, 2014.
DOI : 10.2172/1169830

. Hwloc, Portable Hardware Locality

. Intel-open-source, Hetero Streams Library. https://01.org/ hetero-streams-library

E. Jeannot, E. Meneses, G. Mercier, F. Tessier, and G. Zheng, Communication and topology-aware load balancing in Charm++ with TreeMatch, 2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013.
DOI : 10.1109/CLUSTER.2013.6702666

URL : https://hal.archives-ouvertes.fr/hal-00851148

L. V. Kale and S. Krishnan, Charm++: A portable concurrent object oriented system based on c++, Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '93, pp.91-108, 1993.

A. Kamil and K. Yelick, Hierarchical Computation in the SPMD Programming Model, The 26th International Workshop on Languages and Compilers for Parallel Computing, 2013.
DOI : 10.1007/978-3-319-09967-5_1

R. M. Karp, R. E. Miller, and S. Winograd, The Organization of Computations for Uniform Recurrence Equations, Journal of the ACM, vol.14, issue.3, pp.563-590, 1967.
DOI : 10.1145/321406.321418

J. Kim, W. J. Dally, S. Scott, and D. Abts, Technology-Driven, Highly-Scalable Dragonfly Topology, ACM SIGARCH Computer Architecture News, vol.36, issue.3, pp.77-88, 2008.
DOI : 10.1145/1394608.1382129

P. M. Kogge and J. Shalf, Exascale Computing Trends: Adjusting to the &#x0022;New Normal&#x0022;' for Computer Architecture, Computing in Science & Engineering, vol.15, issue.6, pp.16-26, 2013.
DOI : 10.1109/MCSE.2013.95

J. Li, W. Liao, A. Choudhary, R. Ross, R. Thakur et al., Parallel netCDF, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, 2003.
DOI : 10.1145/1048935.1050189

B. Meister, N. Vasilache, D. Wohlford, M. M. Baskaran, A. Leung et al., Encyclopedia of Parallel Computing, chapter R-Stream Compiler, pp.1756-1765, 2011.

J. Mellor-crummey, L. Adhianto, W. N. Scherer, I. , and G. Jin, A new vision for coarray Fortran, Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS '09, pp.1-5, 2009.
DOI : 10.1145/1809961.1809969

URL : http://www.cs.rice.edu/~johnmc/papers/caf2-pgas09.pdf

J. Nakashima, S. Nakatani, and K. Taura, Design and implementation of a customizable work stealing scheduler, Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '13, 2013.
DOI : 10.1145/2491661.2481433

. Netloc, Portable Network Locality

T. Nguyen, D. Unat, W. Zhang, A. Almgren, N. Farooqi et al., Perilla: Metadata-Based Optimizations of an Asynchronous Runtime for Adaptive Mesh Refinement, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-8112, 2016.
DOI : 10.1109/SC.2016.80

R. W. Numrich and J. Reid, Co-array Fortran for parallel programming, ACM SIGPLAN Fortran Forum, vol.17, issue.2, pp.1-31, 1998.
DOI : 10.1145/289918.289920

URL : http://caf.rice.edu/documentation/nrRAL98060.pdf

S. L. Olivier, A. K. Porterfield, K. B. Wheeler, M. Spiegel, and J. F. Prins, OpenMP task scheduling strategies for multicore NUMA systems, The International Journal of High Performance Computing Applications, vol.44, issue.2, pp.110-124, 2012.
DOI : 10.1145/1321211.1321241

S. G. Parker, A component-based architecture for parallel multi-physics PDE simulation, Future Generation Computer Systems, vol.22, issue.1-2, pp.204-216, 2006.
DOI : 10.1016/j.future.2005.04.001

URL : http://www.cs.utah.edu/projects/sci/publications/sparker06/science.pdf

P. Participants, Workshop on programming abstractions for data locality, PADAL '14. https://sites.google, 2014.

P. Participants, Workshop on programming abstractions for data locality, PADAL '15. https://sites.google, 2015.

J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid et al., Scalable molecular dynamics with NAMD, Journal of Computational Chemistry, vol.84, issue.16, pp.261781-1802, 2005.
DOI : 10.1515/9783110879476

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2486339/pdf

L. L. Pilla, P. O. Navaux, C. P. Ribeiro, P. Coucheney, F. Broquedis et al., Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems, 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp.236-243, 2012.
DOI : 10.1109/ICPADS.2012.41

URL : https://hal.archives-ouvertes.fr/hal-00788008

B. Prisacari, G. Rodriguez, P. Heidelberger, D. Chen, C. Minkenberg et al., Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks, Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, HPDC '14, pp.129-140, 2014.
DOI : 10.1145/2600212.2600225

F. Schmuck and R. Haskin, GPFS: A shared-disk file system for large computing clusters, First USENIX Conference on File and Storage Technologies (FAST'02), 2002.

J. Segal and C. Morris, Developing scientific software. Software, IEEE, vol.25, issue.4, pp.18-20, 2008.
DOI : 10.1109/ms.2008.85

M. Showerman, J. Enos, J. Fullop, P. Cassella, N. Naksinehaboon et al., Large scale system monitoring and analysis on blue waters using ovis, Proceedings of the 2014 Cray User's Group, 2014.

F. Tessier, P. Malakar, V. Vishwanath, E. Jeannot, and F. Isaila, Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers, 2016 First International Workshop on Communication Optimizations in HPC (COMHPC), p.10, 2016.
DOI : 10.1109/COMHPC.2016.013

URL : https://hal.archives-ouvertes.fr/hal-01394741

R. Thakur, W. Gropp, E. Lusk, ]. Unat, T. Nguyen et al., On implementing MPI-IO portably and with high performance TiDA: High-Level Programming Abstractions for Data Locality Management, Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, pp.23-32, 1999.

T. L. Veldhuizen and D. Gannon, Active libraries: Rethinking the roles of compilers and libraries. CoRR, math, 1998.

S. Verdoolaege, J. C. Juega, A. Cohen, J. Ignacio, G. ´-omez et al., Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5423, 2013.
DOI : 10.1145/2400682.2400713

URL : https://hal.archives-ouvertes.fr/hal-00786677

B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller et al., Scalable performance of the Panasas parallel file system, Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), pp.17-33, 2008.

M. Wimmer, D. Cederman, J. L. Träff, and P. Tsigas, Work-stealing with configurable scheduling strategies, Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pp.315-316, 2013.
DOI : 10.1145/2517327.2442562

Y. Yan, J. Zhao, Y. Guo, and V. Sarkar, Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement, Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing, 2009.
DOI : 10.1007/978-3-642-13374-9_12

URL : http://www.cs.rice.edu/~vs3/PDF/hpt.pdf

K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit et al., Titanium: a high-performance Java dialect, Workshop on Java for High-Performance Network Computing, 1998.
DOI : 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H

URL : http://HTTP.CS.Berkeley.EDU/~dmartin/cs267/readings/titanium.ps

A. B. Yoo, M. A. Jette, and M. Grondona, SLURM: Simple Linux Utility for Resource Management, Job Scheduling Strategies for Parallel Processing, pp.44-60, 2003.
DOI : 10.1007/10968987_3

URL : http://www.cs.huji.ac.il/~feit/parsched/p-03-3.ps.gz

W. Zhang, A. Almgren, M. Day, T. Nguyen, J. Shalf et al., BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework, SIAM Journal on Scientific Computing, vol.38, issue.5, pp.156-172, 2016.
DOI : 10.1137/15M102616X

Y. Zheng, A. Kamil, M. Driscoll, H. Shan, and K. Yelick, UPC++: A PGAS Extension for C++, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014.
DOI : 10.1109/IPDPS.2014.115

S. Zhou, Lsf: Load sharing in large heterogeneous distributed systems, Workshop on Cluster Computing, 1992.