PADRONE: a Platform for Online Profiling, Analysis, and Optimization, International workshop on Dynamic Compilation Everywhere (DCE), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00917950
Par4All: From Convex Array Regions to Heterogeneous Computing, IMPACT 2012 : Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00744733
Support for Thread-Level Speculation into OpenMP, pp.275-278, 2012. ,
DOI : 10.1007/978-3-642-30961-8_25
Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967. ,
DOI : 10.1145/1465482.1465560
Code generation in the polyhedral model is easier than you think, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004., pp.7-16, 2004. ,
DOI : 10.1109/PACT.2004.1342537
URL : https://hal.archives-ouvertes.fr/hal-00017260
Improving Data Locality in Static Control Programs, 2004. ,
Performance driven data cache prefetching in a dynamic software optimization system, Proceedings of the 21st annual international conference on Supercomputing, ICS '07, pp.202-209, 2007. ,
DOI : 10.1145/1274971.1275000
URL : https://hal.archives-ouvertes.fr/inria-00504614
Qemu, a fast and portable dynamic translator, Usenix ATC, Freenix Track, pp.41-46, 2005. ,
An infrastructure for adaptive dynamic optimization, International Symposium on Code Generation and Optimization, 2003. CGO 2003., pp.265-275, 2003. ,
DOI : 10.1109/CGO.2003.1191551
URL : http://cag.lcs.mit.edu/commit/papers/03/RIO-adaptive-CGO03.ps
A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pp.101-113, 2008. ,
DOI : 10.1145/1375581.1375595
URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf
A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pp.101-113, 2008. ,
DOI : 10.1145/1375581.1375595
URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf
PinOS, Proceedings of the 3rd international conference on Virtual execution environments , VEE '07, pp.137-147, 2007. ,
DOI : 10.1145/1254810.1254830
The overhead of profiling using PMU hardware, CERN openlab report, 2014. ,
FX!32 a profile-directed binary translator, IEEE Micro, vol.18, issue.2, pp.56-64, 1998. ,
DOI : 10.1109/40.671403
URL : http://www.cs.utexas.edu/users/dburger/teaching/spring99/cs395t/papers/11_FX32.pdf
Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping, 2007 IEEE 13th International Symposium on High Performance Computer Architecture, 2007. ,
DOI : 10.1109/HPCA.2007.346199
URL : http://www.hpcaconf.org/hpca13/papers/022-clark.pdf
Continuous Adaptive Object-Code Re-optimization Framework, Advances in Computer Systems Architecture, 2004. ,
DOI : 10.1007/978-3-540-30102-8_20
URL : http://www.dtc.umn.edu/publications/reports/2004_28.pdf
The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges, International Symposium on Code Generation and Optimization, 2003. CGO 2003., 2003. ,
DOI : 10.1109/CGO.2003.1191529
Rapid identification of architectural bottlenecks via precise event counting, Proceedings of the 38th annual international symposium on Computer architectureEra06] S. Eranian. Perfmon2: a flexible performance monitoring interface for linux. Proc. Ottawa Linux Symposium, pp.353-364, 2006. ,
DOI : 10.1145/2024723.2000107
Vectorization for simd architectures with alignment constraints, Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, PLDI '04, pp.82-93, 2004. ,
DOI : 10.1145/996893.996853
Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, pp.23-53, 1991. ,
DOI : 10.1145/360827.360844
URL : http://www.prism.uvsq.fr/public/paf/dataflow.ps
Toward automatic partitioning of arrays on distributed memory computers, Proceedings of the 7th international conference on Supercomputing, pp.175-184, 1993. ,
Polyhedron model, Encyclopedia of Parallel Computing, 2011. ,
Sac: An efficient retargetable source-to-source compiler for multimedia instruction sets, 2010. ,
The optimum pipeline depth for a microprocessor, ACM SIGARCH Computer Architecture News, vol.30, issue.2, pp.7-13, 2002. ,
DOI : 10.1145/545214.545217
URL : http://www.ece.ualberta.ca/~elliott/ece510/seminars/2002f/2002-10-01/p7-hartstein.pdf
Semantical interprocedural parallelization: An overview of the PIPS project, Proceedings of the 5th International Conference on Supercomputing, ICS '91, pp.244-251, 1991. ,
DOI : 10.1145/109025.109086
URL : https://hal.archives-ouvertes.fr/hal-00984684
Computing dependence direction vectors and dependence cones with linear systems, 1987. ,
Spire: improving dynamic binary translation through SPC-indexed indirect branch redirecting, pp.1-12, 2003. ,
Leakage current: Moore's law meets static power, Computer, issue.12, pp.3668-75, 2003. ,
Automatic Parallelization in a Binary Rewriter, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.547-557, 2010. ,
DOI : 10.1109/MICRO.2010.27
URL : http://www.ece.umd.edu/~barua/micro10-aparna.pdf
Generation of permutations for SIMD processors, Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES '05, pp.147-156, 2005. ,
DOI : 10.1145/1065910.1065931
Exploiting superword level parallelism with multimedia instruction sets, Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation , PLDI '00, pp.145-156, 2000. ,
DOI : 10.1145/349299.349320
URL : http://cag.lcs.mit.edu/commit/papers/99/SLP-TM.ps
The parallel execution of DO loops, Communications of the ACM, vol.17, issue.2, pp.83-93, 1974. ,
DOI : 10.1145/360827.360844
Pin: Building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pp.190-200, 2005. ,
Maximizing parallelism and minimizing synchronization with affine transforms, Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '97, pp.201-214, 1997. ,
DOI : 10.1145/263699.263719
URL : http://suif.stanford.edu//papers/lim97.ps
Optimizing dynamic binary translation for SIMD instructions, CGO, 2006. ,
Lazy array data-flow dependence analysis, Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '94, pp.311-325, 1994. ,
DOI : 10.1145/174675.177911
URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3110/CS-TR-3110.ps.Z
Improved Estimation for Software Multiplexing of Performance Counters, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp.23-32, 2005. ,
DOI : 10.1109/MASCOTS.2005.34
URL : http://www.ece.nmsu.edu/~jecook/pubs/jcook_multiplexing.pdf
Polyhedral loop paralleliza- tion: Loopo. https, pp.1993-2012 ,
DOI : 10.1007/bfb0017283
URL : http://brahms.fmi.uni-passau.de/cl/papers/GriLe96b.ps
An Evaluation of Vectorizing Compilers, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.372-382, 2011. ,
DOI : 10.1109/PACT.2011.68
URL : http://polaris.cs.uiuc.edu/%7Egarzaran/doc/pact11.pdf
Accuracy of performance monitoring hardware, Proc. LACSI Symposium, 2002. ,
Power: a first-class architectural design constraint, Computer, vol.34, issue.4, pp.52-58, 2001. ,
DOI : 10.1109/2.917539
URL : http://web.eecs.umich.edu/~tnm/trev_test/papersPDF/2001.04.Power a First Class Architectural Design Constraint_Computer.pdf
Vectorization for Java, NPC, 2010. ,
DOI : 10.1007/978-3-642-15672-4_3
URL : https://hal.archives-ouvertes.fr/hal-01054962
Albert Cohen, and Ayal Zaks. Vapor SIMD: Auto-vectorize once, run everywhere, CGO, 2011. ,
Multi-platform autovectorization Automatic vectorization by runtime binary translation, CGO Second International Conference on Networking and Computing, pp.87-94, 2006. ,
Auto-vectorization of interleaved data for SIMD, ACM SIGPLAN Notices, vol.41, issue.6, pp.132-143, 2006. ,
DOI : 10.1145/1133255.1133997
Valgrind: A program supervision framework, Third Workshop on Runtime Verification, 2003. ,
Autovectorization in GCC ? two years later, GCC Developer's summit, 2006. ,
Outer-loop vectorization, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.2-11, 2008. ,
DOI : 10.1145/1454115.1454119
The perfctr interface, 1999. ,
Throttling Automatic Vectorization: When Less is More, 2015 International Conference on Parallel Architecture and Compilation (PACT), pp.432-444, 2015. ,
DOI : 10.1109/PACT.2015.32
Transparent Parallelization of Binary Code, First International Workshop on Polyhedral Compilation Techniques, IMPACT 2011, 2011. ,
The Polyhedral Compiler Collection package, 2013. ,
An exact method for analysis of value-based array data dependences, pp.546-566, 1994. ,
DOI : 10.1007/3-540-57659-2_31
URL : http://www.cs.umd.edu/Library/TRs/CS-TR-3196/CS-TR-3196.ps.Z
Evaluating the impact of dynamic binary translation systems on hardware cache performance, 2008 IEEE International Symposium on Workload Characterization, 2008. ,
DOI : 10.1109/IISWC.2008.4636098
A scalable method for run-time loop parallelization, International Journal of Parallel Programming, vol.4, issue.1, pp.537-576, 1995. ,
DOI : 10.1007/978-1-4684-6894-6
URL : http://polaris.cs.uiuc.edu/publications/1444.pdf
Trace cache: a low latency approach to high bandwidth instruction fetching, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29, pp.24-35, 1996. ,
DOI : 10.1109/MICRO.1996.566447
URL : http://www.cs.utah.edu/classes/cs7810-rajeev/papers/rotenberg96.pdf
The privatizing DOALL test, Proceedings of the 8th international conference on Supercomputing , ICS '94, pp.33-43, 1994. ,
DOI : 10.1145/181181.181254
Introducing Control Flow into Vectorized Code, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.280-291, 2007. ,
DOI : 10.1109/PACT.2007.4336219
URL : http://www-unix.mcs.anl.gov/~jaewook/papers/shin-nestboscc.pdf
Beyond the Realm of the Polyhedral Model: Combining Speculative Program Parallelization with Polyhedral Compilation, 2016. ,
URL : https://hal.archives-ouvertes.fr/tel-01342007
Polyhedral-Model Guided Loop-Nest Auto-Vectorization, 2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009. ,
DOI : 10.1109/PACT.2009.18
A generic approach to the definition of low-level components for multi-architecture binary analysis, 2014. ,
isl: An Integer Set Library for the Polyhedral Model, Proceedings of the Third International Congress Conference on Mathematical Software, ICMS'10, pp.299-302, 2010. ,
DOI : 10.1007/978-3-642-15582-6_49
URL : https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf
Polyhedral extraction tool, Second International Workshop on Polyhedral Compilation Techniques, 2012. ,
On Demand Parametric Array Dataflow Analysis, Proceedings of the 3rd International Workshop on Polyhedral Compilation Techniques, pp.23-36, 2013. ,
Self-monitoring overhead of the Linux perfevent performance counter interface, International Symposium on Performance Analysis of Systems and Software, pp.102-111, 2015. ,
An overview of the SUIF compiler system ,
Hypertool: a programming aid for message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3, pp.330-343, 1990. ,
DOI : 10.1109/71.80160
URL : http://www.eece.unm.edu/~shu/lab/paper/htooltrans.pdf
Optimizing Supercompilers for Supercomputers, 1989. ,
Dynamic parallelization and mapping of binary executables on hierarchical platforms, Proceedings of the 3rd conference on Computing frontiers , CF '06, pp.127-138, 2006. ,
DOI : 10.1145/1128022.1128040
Feasibility of dynamic binary parallelization, 2011. ,
DOI : 10.18130/v3479c
URL : https://libraetd.lib.virginia.edu/downloads/9p290955k?filename=dissertation-jingyang.pdf
Pipa, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.185-194, 2008. ,
DOI : 10.1145/1356058.1356083
Search for the largest element in an array, p.42 ,