T. Austin and G. Sohi, Dynamic dependency analysis of ordinary programs, ISCA, pp.342-351, 1992.

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing Communication in Numerical Linear Algebra, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.3, pp.866-901, 2011.
DOI : 10.1137/090769156

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Graph expansion and communication costs of fast matrix multiplication, Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA '11, pp.1-12, 2011.
DOI : 10.1145/1989493.1989495

K. Bergman and S. Borkar, Exascale computing study: Technology challenges in achieving exascale systems, DARPA IPTO, 2008.

G. Bilardi and E. Peserico, A Characterization of Temporal Locality and Its Portability across Memory Hierarchies, Proc. ICALP, pp.128-139, 2001.
DOI : 10.1007/3-540-48224-5_11

U. Bondhugula, A. Hartono, J. Ramanujan, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proc. PLDI, 2008.

M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. , Revisiting the Sequential Programming Model for Multi-Core, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.69-84, 2007.
DOI : 10.1109/MICRO.2007.20

C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski, Multiple page size modeling and optimization, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), 2005.
DOI : 10.1109/PACT.2005.32

J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM Journal on Scientific Computing, vol.34, issue.1, pp.206-239, 2012.
DOI : 10.1137/080731992

URL : https://hal.archives-ouvertes.fr/hal-00870930

C. Ding and Y. Zhong, Predicting whole-program locality through reuse distance analysis, PLDI, pp.245-257, 2003.

N. Fauzia, V. Elango, M. Ravishankar, L. Pouchet, J. Ramanujam et al., Beyond reuse distance analysis, ACM Transactions on Architecture and Code Optimization, vol.10, issue.4, 2013.
DOI : 10.1145/2541228.2555309

URL : https://hal.archives-ouvertes.fr/hal-00920031

H. Samuel, L. I. Fuller, and . Millett, The Future of Computing Performance: Game Over or Next Level?, 2011.

S. Garcia, D. Jeon, C. M. Louie, and M. Taylor, Kremlin: Rethinking and rebooting gprof for the multicore age, PLDI, pp.458-469, 2011.

J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach, 2011.

J. Holewinski, R. Ramamurthi, and M. Ravishankar, Naznin Fauzia, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. Dynamic trace-based analysis of vectorization potential of applications, Proc. PLDI, pp.371-382

F. Irigoin and R. Triolet, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '88, pp.319-329, 1988.
DOI : 10.1145/73560.73588

S. Jiang and X. Zhang, Making LRU Friendly to Weak Locality Workloads: A Novel Replacement Algorithm to Improve Buffer Cache Performance, IEEE Transactions on Computers, vol.54, issue.8, pp.939-952, 2005.
DOI : 10.1109/TC.2005.130

Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen, Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?, Proc. Comp. Const, pp.264-282, 2010.
DOI : 10.1007/978-3-642-11970-5_15

K. Kennedy and J. Allen, Optimizing compilers for modern architectures: A dependence-based approach, 2002.

K. Kennedy and K. S. Mckinley, Maximizing loop parallelism and improving data locality via loop fusion and distribution, Languages and Compilers for Parallel Computing, pp.301-320, 1993.
DOI : 10.1007/3-540-57659-2_18

A. Ketterlin and P. Clauss, Prediction and trace compression of data access addresses through nested loop recognition, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.94-103, 2008.
DOI : 10.1145/1356058.1356071

URL : https://hal.archives-ouvertes.fr/inria-00504597

M. Kumar, Measuring parallelism in computation-intensive scientific/engineering applications, IEEE Transactions on Computers, vol.37, issue.9, pp.1088-1098, 1988.
DOI : 10.1109/12.2259

M. Lam and R. Wilson, Limits of control flow on parallelism, ISCA, pp.46-57, 1992.

J. Larus, Loop-level parallelism in numeric and symbolic programs, IEEE Transactions on Parallel and Distributed Systems, vol.4, issue.7, pp.812-826, 1993.
DOI : 10.1109/71.238302

J. Mak and A. Mycroft, Limits of parallelism using dynamic dependency graphs, Proceedings of the Seventh International Workshop on Dynamic Analysis, WODA '09, pp.42-48, 2009.
DOI : 10.1145/2134243.2134253

G. Marin and J. Mellor-crummey, Cross-architecture performance predictions for scientific applications using parameterized models, SIGMETRICS '04, 2004.

R. L. Mattson, J. Gecsei, D. Slutz, and I. Traiger, Evaluation techniques for storage hierarchies, IBM Systems Journal, vol.9, issue.2, pp.78-117, 1970.
DOI : 10.1147/sj.92.0078

A. Nicolau and J. Fisher, Measuring the Parallelism Available for Very Long Instruction Word Architectures, IEEE Transactions on Computers, vol.33, issue.11, pp.968-976, 1984.
DOI : 10.1109/TC.1984.1676371

Q. Niu, J. Dinan, Q. Lu, and P. Sadayappan, PARDA: A Fast Parallel Reuse Distance Analysis Algorithm, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1284-1294, 2012.
DOI : 10.1109/IPDPS.2012.117

C. Oancea and A. Mycroft, Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS), LCPC, pp.156-171, 2008.
DOI : 10.1007/978-3-540-89740-8_11

J. Park, M. Penner, and V. K. Prasanna, Optimizing graph algorithms for improved cache performance, IEEE Transactions on Parallel and Distributed Systems, vol.15, issue.9, pp.769-782, 2004.
DOI : 10.1109/TPDS.2004.44

M. Postiff, D. Greene, G. Tyson, and T. Mudge, The limits of instruction level parallelism in SPEC95 applications, ACM SIGARCH Computer Architecture News, vol.27, issue.1, pp.31-34, 1999.
DOI : 10.1145/309758.309771

L. Rauchwerger, P. Dubey, and R. Nair, Measuring limits of parallelism and characterizing its vulnerability to resource constraints, Proceedings of the 26th Annual International Symposium on Microarchitecture, pp.105-117, 1993.
DOI : 10.1109/MICRO.1993.282747

L. Rauchwerger and D. Padua, The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization, PLDI, pp.218-232, 1995.

V. Sarkar and J. L. Hennessy, Compile-time partitioning and scheduling of parallel programs, SIGPLAN Symposium on Compiler Construction, pp.17-26, 1986.

J. Shalf, S. Dosanjh, and J. Morrison, Exascale computing technology challenges. High Performance Computing for Computational Science?VECPAR, pp.1-25, 2010.
DOI : 10.1007/978-3-642-19328-6_1

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.185.3897

X. Shen, Y. Zhong, and C. Ding, Locality phase prediction, Proc. ASPLOS. ACM, 2004.
DOI : 10.1145/1024393.1024414

D. Stefanovi´cstefanovi´c and M. Martonosi, Limits and graph structure of available instruction-level parallelism, Euro-Par, pp.1018-1022, 2000.

K. Theobald, G. Gao, and L. Hendren, On the limits of program parallelism and its smoothability, MICRO, pp.10-19, 1992.

C. Tian, M. Feng, V. Nagarajan, and R. Gupta, Copy or discard execution model for speculative parallelization on multicores, MICRO, pp.330-341, 2008.

G. Tournavitis, Z. Wang, B. Zheng, M. Franke, and . Boyle, Towards a holistic approach to auto-parallelization, PLDI, pp.177-187, 2009.

G. Venkataraman, S. Sahni, and S. Mukhopadhyaya, A Blocked All-Pairs Shortest-Paths Algorithm, Journal of Experimental Algorithmics, vol.8, issue.22, 2003.

D. Wall, Limits of instruction-level parallelism, ASPLOS, pp.176-188, 1991.

E. Michael, M. S. Wolf, and . Lam, A data locality optimizing algorithm, PLDI '91: ACM SIGPLAN 1991 conference on Programming language design and implementation, pp.30-44, 1991.

P. Wu and A. Kejariwal, Compiler-Driven Dependence Profiling to Guide Program Parallelization, LCPC, pp.232-248, 2008.
DOI : 10.1007/978-3-540-89740-8_16

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke, Uncovering hidden loop level parallelism in sequential applications, HPCA, pp.290-301, 2008.

Y. Zhong, S. G. Dropsho, and C. Ding, Miss rate prediction across all program inputs, Oceans 2002 Conference and Exhibition. Conference Proceedings (Cat. No.02CH37362), 2003.
DOI : 10.1109/PACT.2003.1238004

Y. Zhong, M. Orlovich, X. Shen, and C. Ding, Array regrouping and structure splitting using whole-program reference affinity, Proc. PLDI. ACM, 2004.