HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, vol.22, pp.685-701, 2010. ,
Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling, Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI '97), 1997. ,
ParaInsight: An Assistant for Quantitatively Analyzing Multi-granularity Parallel Region, High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), pp.698-707, 2013. ,
Dependence Analysis for Supercomputing, 1988. ,
PolyCheck: Dynamic Verification of Iteration Space Transformations on Affine Programs, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '16), pp.539-554, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01234104
Generating loops for scanning polyhedra: Cloog users guide, Polyhedron, vol.2, p.10, 2004. ,
QEMU, a Fast and Portable Dynamic Translator, Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05), 2005. ,
Fast data-locality profiling of native execution, In ACM SIGMETRICS Performance Evaluation Review, vol.33, pp.169-180, 2005. ,
Discovery of Localityimproving Refactorings by Reuse Path Analysis, Proceedings of the Second International Conference on High Performance Computing and Communications (HPCC'06), pp.220-229, 2006. ,
Discovery of LocalityImproving Refactorings by Reuse Path Analysis. High Performance Computing and Communications, pp.220-229, 2006. ,
A Practical Automatic Polyhedral Program Optimization System, PLDI, 2008. ,
Runtime analysis of application binaries for function level parallelism potential using QEMU, Open Source Systems and Technologies (ICOSST), 2012 International Conference on. IEEE, pp.33-39, 2012. ,
CQA: A code quality analyzer tool at binary level, High Performance Computing (HiPC), 2014. ,
Rodinia: A benchmark suite for heterogeneous computing, IEEE International Symposium on, 2009. ,
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads, Workload Characterization (IISWC), 2010 IEEE International Symposium on, 2010. ,
Fuzzy array dataflow analysis, In ACM SIGPLAN Notices, vol.30, pp.92-101, 1995. ,
, Intel Architecture Code Analyzer -User's Guide, 2009.
The new linux'perf'tools, Slides from Linux Kongress, vol.18, 2010. ,
Maqao: Modular assembler quality analyzer and optimizer for itanium 2, The 4th Workshop on EPIC architectures and compiler technology, vol.200, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00141075
, Spolly: speculative optimizations in the polyhedral model. IMPACT 2013, p.55, 2013.
Embla-data dependence profiling for parallel programming, Complex, Intelligent and Software Intensive Systems, pp.780-785, 2008. ,
Parametric integer programming, RAIROOperations Research, vol.22, pp.243-268, 1988. ,
Polyhedron model, Encyclopedia of Parallel Computing, pp.1581-1592, 2011. ,
Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies, Intl. J. of Parallel Programming, vol.34, issue.3, 2006. ,
Gprof: A call graph execution profiler, ACM Sigplan Notices, vol.17, pp.120-126, 1982. ,
The flame graph, Commun. ACM, vol.59, pp.48-57, 2016. ,
GCC: the complete reference, 2002. ,
Polly -Performing polyhedral optimizations on a low-level intermediate representation, Parallel Processing Letters, vol.22, p.4, 2012. ,
Building of a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of non-Affine Programs Scalable, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01967828
Program Instrumentation with QEMU, Proceedings of the International QEMU User's Forum, 2011. ,
Nesting of Reducible and Irreducible Loops, ACM Trans. Program. Lang. Syst, vol.19, issue.4, 1997. ,
SPEC CPU2006 Benchmark Descriptions, SIGARCH Comput. Archit. News, vol.34, pp.1-17, 2006. ,
Dynamic trace-based analysis of vectorization potential of applications, ACM SIGPLAN Notices, vol.47, pp.371-382, 2012. ,
VMAD: An Advanced Dynamic Program Analysis and Instrumentation Framework, 2012. ,
A unifying framework for iteration reordering transformations, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing, 1995. ,
Prediction and Trace Compression of Data Access Addresses Through Nested Loop Recognition, Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '08), 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00504597
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization, p.45, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00780782
, Annual IEEE/ACM International Symposium on Microarchitecture, pp.437-448
Prospector: A dynamic data-dependence profiler to help parallel programming, HotPar'10: Proceedings of the USENIX workshop on Hot Topics in parallelism, 2010. ,
SD3: An Efficient Dynamic Data-Dependence Profiling Mechanism, IEEE Trans. Comput, vol.62, pp.2516-2530, 2013. ,
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04), 2004. ,
DiscoPoP: A profiling tool to identify parallelization opportunities, Tools for High Performance Computing, pp.37-54, 2014. ,
Fast data-dependence profiling by skipping repeatedly executed memory operations, International Conference on Algorithms and Architectures for Parallel Processing, pp.583-596, 2015. ,
An efficient datadependence profiler for sequential and parallel programs, Parallel and Distributed Processing Symposium (IPDPS), pp.484-493, 2015. ,
Pinpointing data locality problems using data-centric analysis, Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on. IEEE, pp.171-180, 2011. ,
Pin: building customized program analysis tools with dynamic instrumentation, Acm sigplan notices, vol.40, pp.190-200, 2005. ,
MIAMI: A framework for application performance diagnosis, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014. ,
Full runtime polyhedral optimizing loop transformations with the generation, instantiation, and scheduling of code-bones, Concurrency and Computation: Practice and Experience, vol.29, 2017. ,
Improving Compiler Scalability: Optimizing Large Programs at Small Price, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '15), pp.143-152, 2015. ,
Shadow profiling: Hiding instrumentation costs with parallelism, Code Generation and Optimization, 2007. CGO'07. International Symposium on. IEEE, pp.198-208, 2007. ,
Redux: A dynamic dataflow tracer. Electronic Notes in Theoretical Computer Science, vol.89, pp.149-170, 2003. ,
Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. SIGPLAN Not, 2007. ,
Combinatorial optimization: algorithms and complexity, 1998. ,
GRAPHITE: Polyhedral analyses and optimizations for GCC, Proceedings of the 2006 GCC Developers Summit, 2006. ,
Induction Variable Analysis with Delayed Abstractions, Proceedings of the First International Conference on High Performance Embedded Architectures and Compilers (HiPEAC'05), 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-01257294
,
Polybench: The polyhedral benchmark suite, pp.2017-2026, 2017. ,
Scalable hierarchical polyhedral compilation, Parallel Processing (ICPP), pp.432-441, 2016. ,
On Loops, Dominators, and Dominance Frontiers, ACM Trans. Program. Lang. Syst, vol.24, 2002. ,
VTune performance analyzer essentials, 2005. ,
On-thefly Detection of Precise Loop Nests Across Procedures on a Dynamic Binary Translation System, Proceedings of the 8th ACM International Conference on Computing Frontiers (CF '11), 2011. ,
Verification of polyhedral optimizations with constant loop bounds in finite state space computations, International Symposium On Leveraging Applications of Formal Methods, Verification and Validation, pp.493-508, 2014. ,
AddressSanitizer: A Fast Address Sanity Checker, USENIX Annual Technical Conference, pp.309-318, 2012. ,
, PolyJIT: Polyhedral Optimization Just in Time. International Journal of Parallel Programming, p.33, 2018.
Overcoming the Challenges to Feedbackdirected Optimization (Keynote Talk). SIGPLAN Not, vol.35, pp.1-11, 2000. ,
AutoSCOPE: Automatic Suggestions for Code Optimizations Using PerfExpert, 2011. ,
GNU Parallel -The Command-Line Power Tool. ;login: The USENIX Magazine, vol.36, pp.42-47, 2011. ,
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information, Parallel Architectures and Compilation Techniques (PACT), pp.377-388, 2010. ,
GRAPHITE Two Years After: First Lessons Learned From Real-World Polyhedral Compilation, GCC Research Opportunities Workshop, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00551516
Efficient symbolic analysis for optimizing compilers, International Conference on Compiler Construction, 2001. ,
The Paralax infrastructure: automatic parallelization with a helping hand, Parallel Architectures and Compilation Techniques (PACT), pp.389-399, 2010. ,
isl: An Integer Set Library for the Polyhedral Model, Proceedings of the Third International Congress on Mathematical Software (ICMS '2010), 2010. ,
, , 2014.
Equivalence checking of static affine programs using widening to handle recurrences, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.34, p.11, 2012. ,
Integrating profile-driven parallelism detection and machine-learning-based mapping, ACM Transactions on Architecture and Code Optimization (TACO), vol.11, issue.2, 2014. ,
A data locality optimizing algorithm, ACM Sigplan Notices, vol.26, pp.30-44, 1991. ,
Umbra: Efficient and scalable memory shadowing, Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, pp.22-31, 2010. ,