@. E. Riou, E. Rohou, P. Clauss, N. Hallou, and A. Ketterlin, PADRONE: a Platform for Online Profiling, Analysis, and Optimization, International workshop on Dynamic Compilation Everywhere (DCE), 2014.
URL : https://hal.archives-ouvertes.fr/hal-00917950

M. Amini, B. Creusillet, S. Even, R. Keryell, O. Goubier et al., Par4All: From Convex Array Regions to Heterogeneous Computing, IMPACT 2012 : Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00744733

[. Aldea, D. R. Llanos, and A. González-escribano, Support for Thread-Level Speculation into OpenMP, pp.275-278, 2012.
DOI : 10.1007/978-3-642-30961-8_25

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967.
DOI : 10.1145/1465482.1465560

[. Bastoul, Code generation in the polyhedral model is easier than you think, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004., pp.7-16, 2004.
DOI : 10.1109/PACT.2004.1342537

URL : https://hal.archives-ouvertes.fr/hal-00017260

[. Bastoul, Improving Data Locality in Static Control Programs, 2004.

J. C. Beyler and P. Clauss, Performance driven data cache prefetching in a dynamic software optimization system, Proceedings of the 21st annual international conference on Supercomputing, ICS '07, pp.202-209, 2007.
DOI : 10.1145/1274971.1275000

URL : https://hal.archives-ouvertes.fr/inria-00504614

F. Bellard, Qemu, a fast and portable dynamic translator, Usenix ATC, Freenix Track, pp.41-46, 2005.

T. [. Bruening, S. Garnett, and . Amarasinghe, An infrastructure for adaptive dynamic optimization, International Symposium on Code Generation and Optimization, 2003. CGO 2003., pp.265-275, 2003.
DOI : 10.1109/CGO.2003.1191551

URL : http://cag.lcs.mit.edu/commit/papers/03/RIO-adaptive-CGO03.ps

[. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pp.101-113, 2008.
DOI : 10.1145/1375581.1375595

URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf

[. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pp.101-113, 2008.
DOI : 10.1145/1375581.1375595

URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf

C. [. Bungale and . Luk, PinOS, Proceedings of the 3rd international conference on Virtual execution environments , VEE '07, pp.137-147, 2007.
DOI : 10.1145/1254810.1254830

A. [. Bitzes and . Nowak, The overhead of profiling using PMU hardware, CERN openlab report, 2014.

A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin et al., FX!32 a profile-directed binary translator, IEEE Micro, vol.18, issue.2, pp.56-64, 1998.
DOI : 10.1109/40.671403

URL : http://www.cs.utexas.edu/users/dburger/teaching/spring99/cs395t/papers/11_FX32.pdf

N. Clark, A. Hormati, S. Yehia, S. Mahlke, and K. Flautner, Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping, 2007 IEEE 13th International Symposium on High Performance Computer Architecture, 2007.
DOI : 10.1109/HPCA.2007.346199

URL : http://www.hpcaconf.org/hpca13/papers/022-clark.pdf

H. Chen, J. Lu, W. Hsu, and P. Yew, Continuous Adaptive Object-Code Re-optimization Framework, Advances in Computer Systems Architecture, 2004.
DOI : 10.1007/978-3-540-30102-8_20

URL : http://www.dtc.umn.edu/publications/reports/2004_28.pdf

J. C. Dehnert, B. K. Grant, J. P. Banning, R. Johnson, T. Kistler et al., The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges, International Symposium on Code Generation and Optimization, 2003. CGO 2003., 2003.
DOI : 10.1109/CGO.2003.1191529

S. [. Demme and . Sethumadhavan, Rapid identification of architectural bottlenecks via precise event counting, Proceedings of the 38th annual international symposium on Computer architectureEra06] S. Eranian. Perfmon2: a flexible performance monitoring interface for linux. Proc. Ottawa Linux Symposium, pp.353-364, 2006.
DOI : 10.1145/2024723.2000107

A. E. Eichenberger, P. Wu, and K. O. Brien, Vectorization for simd architectures with alignment constraints, Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, PLDI '04, pp.82-93, 2004.
DOI : 10.1145/996893.996853

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, pp.23-53, 1991.
DOI : 10.1145/360827.360844

URL : http://www.prism.uvsq.fr/public/paf/dataflow.ps

]. P. Fea93 and . Feautrier, Toward automatic partitioning of arrays on distributed memory computers, Proceedings of the 7th international conference on Supercomputing, pp.175-184, 1993.

P. Feautrier and C. Lengauer, Polyhedron model, Encyclopedia of Parallel Computing, 2011.

]. S. Gue10 and . Guelton, Sac: An efficient retargetable source-to-source compiler for multimedia instruction sets, 2010.

T. [. Hartstein and . Puzak, The optimum pipeline depth for a microprocessor, ACM SIGARCH Computer Architecture News, vol.30, issue.2, pp.7-13, 2002.
DOI : 10.1145/545214.545217

URL : http://www.ece.ualberta.ca/~elliott/ece510/seminars/2002f/2002-10-01/p7-hartstein.pdf

[. Irigoin, P. Jouvelot, and R. Triolet, Semantical interprocedural parallelization: An overview of the PIPS project, Proceedings of the 5th International Conference on Supercomputing, ICS '91, pp.244-251, 1991.
DOI : 10.1145/109025.109086

URL : https://hal.archives-ouvertes.fr/hal-00984684

[. Irigoin and R. Triolet, Computing dependence direction vectors and dependence cones with linear systems, 1987.

C. N. Jia, J. Yang, D. Wang, K. Tong, and . Wang, Spire: improving dynamic binary translation through SPC-indexed indirect branch redirecting, pp.1-12, 2003.

N. S. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner et al., Leakage current: Moore's law meets static power, Computer, issue.12, pp.3668-75, 2003.

K. Kotha, M. Anand, G. Smithson, R. Yellareddy, and . Barua, Automatic Parallelization in a Binary Rewriter, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.547-557, 2010.
DOI : 10.1109/MICRO.2010.27

URL : http://www.ece.umd.edu/~barua/micro10-aparna.pdf

A. Kudriavtsev and P. Kogge, Generation of permutations for SIMD processors, Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES '05, pp.147-156, 2005.
DOI : 10.1145/1065910.1065931

[. Larsen and S. Amarasinghe, Exploiting superword level parallelism with multimedia instruction sets, Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation , PLDI '00, pp.145-156, 2000.
DOI : 10.1145/349299.349320

URL : http://cag.lcs.mit.edu/commit/papers/99/SLP-TM.ps

L. Lamport, The parallel execution of DO loops, Communications of the ACM, vol.17, issue.2, pp.83-93, 1974.
DOI : 10.1145/360827.360844

. K. Lcm-+-05-]-c, R. Luk, R. Cohn, H. Muth, A. Patil et al., Pin: Building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pp.190-200, 2005.

A. W. Lim and M. S. Lam, Maximizing parallelism and minimizing synchronization with affine transforms, Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '97, pp.201-214, 1997.
DOI : 10.1145/263699.263719

URL : http://suif.stanford.edu//papers/lim97.ps

J. Li, Q. Zhang, S. Xu, and B. Huang, Optimizing dynamic binary translation for SIMD instructions, CGO, 2006.

V. Maslov, Lazy array data-flow dependence analysis, Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '94, pp.311-325, 1994.
DOI : 10.1145/174675.177911

URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3110/CS-TR-3110.ps.Z

J. [. Mathur and . Cook, Improved Estimation for Software Multiplexing of Performance Counters, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp.23-32, 2005.
DOI : 10.1109/MASCOTS.2005.34

URL : http://www.ece.nmsu.edu/~jecook/pubs/jcook_multiplexing.pdf

[. Griebl, Polyhedral loop paralleliza- tion: Loopo. https, pp.1993-2012
DOI : 10.1007/bfb0017283

URL : http://brahms.fmi.uni-passau.de/cl/papers/GriLe96b.ps

Y. Maleki, M. J. Gao, T. Garzarán, D. A. Wong, and . Padua, An Evaluation of Vectorizing Compilers, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.372-382, 2011.
DOI : 10.1109/PACT.2011.68

URL : http://polaris.cs.uiuc.edu/%7Egarzaran/doc/pact11.pdf

P. [. Maxwell, L. Teller, and . Salay, Accuracy of performance monitoring hardware, Proc. LACSI Symposium, 2002.

T. Mudge, Power: a first-class architectural design constraint, Computer, vol.34, issue.4, pp.52-58, 2001.
DOI : 10.1109/2.917539

URL : http://web.eecs.umich.edu/~tnm/trev_test/papersPDF/2001.04.Power a First Class Architectural Design Constraint_Computer.pdf

J. Nie, B. Cheng, S. Li, L. Wang, and X. Li, Vectorization for Java, NPC, 2010.
DOI : 10.1007/978-3-642-15672-4_3

URL : https://hal.archives-ouvertes.fr/hal-01054962

D. Nuzman, S. Dyshel, E. Rohou, I. Rosen, K. Williams et al., Albert Cohen, and Ayal Zaks. Vapor SIMD: Auto-vectorize once, run everywhere, CGO, 2011.

D. Nuzman, R. Henderson, [. Nakamura, S. Miki, and S. Oikawa, Multi-platform autovectorization Automatic vectorization by runtime binary translation, CGO Second International Conference on Networking and Computing, pp.87-94, 2006.

[. Nuzman, I. Rosen, and A. Zaks, Auto-vectorization of interleaved data for SIMD, ACM SIGPLAN Notices, vol.41, issue.6, pp.132-143, 2006.
DOI : 10.1145/1133255.1133997

J. [. Nethercote and . Seward, Valgrind: A program supervision framework, Third Workshop on Runtime Verification, 2003.

D. Nuzman, Autovectorization in GCC ? two years later, GCC Developer's summit, 2006.

D. Nuzman and A. Zaks, Outer-loop vectorization, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.2-11, 2008.
DOI : 10.1145/1454115.1454119

]. M. Pet99 and . Pettersson, The perfctr interface, 1999.

[. Porpodas and T. M. Jones, Throttling Automatic Vectorization: When Less is More, 2015 International Conference on Parallel Architecture and Compilation (PACT), pp.432-444, 2015.
DOI : 10.1109/PACT.2015.32

A. Benoit-pradelle, P. Ketterlin, and . Clauss, Transparent Parallelization of Binary Code, First International Workshop on Polyhedral Compilation Techniques, IMPACT 2011, 2011.

[. Pouchet, The Polyhedral Compiler Collection package, 2013.

[. Pugh and D. Wonnacott, An exact method for analysis of value-based array data dependences, pp.546-566, 1994.
DOI : 10.1007/3-540-57659-2_31

URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3196/CS-TR-3196.ps.Z

K. [. Ruize-alvarez and . Hazelwood, Evaluating the impact of dynamic binary translation systems on hardware cache performance, 2008 IEEE International Symposium on Workload Characterization, 2008.
DOI : 10.1109/IISWC.2008.4636098

[. Rauchwerger, N. M. Amato, and D. A. Padua, A scalable method for run-time loop parallelization, International Journal of Parallel Programming, vol.4, issue.1, pp.537-576, 1995.
DOI : 10.1007/978-1-4684-6894-6

URL : http://polaris.cs.uiuc.edu/publications/1444.pdf

[. Rotenberg, S. Bennett, and J. E. Smith, Trace cache: a low latency approach to high bandwidth instruction fetching, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29, pp.24-35, 1996.
DOI : 10.1109/MICRO.1996.566447

URL : http://www.cs.utah.edu/classes/cs7810-rajeev/papers/rotenberg96.pdf

L. Rauchwerger and D. Padua, The privatizing DOALL test, Proceedings of the 8th international conference on Supercomputing , ICS '94, pp.33-43, 1994.
DOI : 10.1145/181181.181254

[. Shin, Introducing Control Flow into Vectorized Code, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.280-291, 2007.
DOI : 10.1109/PACT.2007.4336219

URL : http://www-unix.mcs.anl.gov/~jaewook/papers/shin-nestboscc.pdf

[. Sukumaran-rajam, Beyond the Realm of the Polyhedral Model: Combining Speculative Program Parallelization with Polyhedral Compilation, 2016.
URL : https://hal.archives-ouvertes.fr/tel-01342007

K. Trifunovi?, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen, Polyhedral-Model Guided Loop-Nest Auto-Vectorization, 2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009.
DOI : 10.1109/PACT.2009.18

[. Valensi, A generic approach to the definition of low-level components for multi-architecture binary analysis, 2014.

S. Verdoolaege, isl: An Integer Set Library for the Polyhedral Model, Proceedings of the Third International Congress Conference on Mathematical Software, ICMS'10, pp.299-302, 2010.
DOI : 10.1007/978-3-642-15582-6_49

URL : https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf

S. Verdoolaege and T. Grosser, Polyhedral extraction tool, Second International Workshop on Polyhedral Compilation Techniques, 2012.

[. Verdoolaege, H. Nikolov, and T. Stefanov, On Demand Parametric Array Dataflow Analysis, Proceedings of the 3rd International Workshop on Polyhedral Compilation Techniques, pp.23-36, 2013.

]. V. Wea15 and . Weaver, Self-monitoring overhead of the Linux perfevent performance counter interface, International Symposium on Performance Analysis of Systems and Software, pp.102-111, 2015.

[. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson et al., An overview of the SUIF compiler system

D. [. Wu and . Gajski, Hypertool: a programming aid for message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3, pp.330-343, 1990.
DOI : 10.1109/71.80160

URL : http://www.eece.unm.edu/~shu/lab/paper/htooltrans.pdf

M. Wolfe, Optimizing Supercompilers for Supercomputers, 1989.

E. Yardimci and M. Franz, Dynamic parallelization and mapping of binary executables on hierarchical platforms, Proceedings of the 3rd conference on Computing frontiers , CF '06, pp.127-138, 2006.
DOI : 10.1145/1128022.1128040

J. Yang, K. Skadron, M. L. Soffa, and K. Whitehouse, Feasibility of dynamic binary parallelization, 2011.
DOI : 10.18130/v3479c

URL : https://libraetd.lib.virginia.edu/downloads/9p290955k?filename=dissertation-jingyang.pdf

I. [. Zhao, W. F. Cutcutache, and . Wong, Pipa, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.185-194, 2008.
DOI : 10.1145/1356058.1356083

S. .. For, Search for the largest element in an array, p.42