Detecting SIMDization Opportunities through Static/Dynamic Dependence Analysis, Workshop on Productivity and Performance (PROPER), 2013. ,
DOI : 10.1007/978-3-642-54420-0_62
URL : https://hal.archives-ouvertes.fr/hal-00858004
QIRAL: A High Level Language for Lattice QCD Code Generation, Programming Language Approaches to Concurrency and Communication-centric Software Workshop, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00666885
Inter-array Data Regrouping, Intl. Workshop on Languages and Compilers for Parallel Computing, pp.149-163, 2000. ,
DOI : 10.1007/3-540-44905-1_10
Kokkos: Enabling Performance Portability Across Manycore Architectures, 2013 Extreme Scaling Workshop (xsw 2013), pp.18-24, 2013. ,
DOI : 10.1109/XSW.2013.7
Cyme: A Library Maximizing SIMD Computation on User-Defined Containers, Supercomputing, pp.440-449, 2014. ,
DOI : 10.1007/978-3-319-07518-1_29
Optimistic Delinearization of Parametrically Sized Arrays, Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pp.351-360, 2015. ,
DOI : 10.1007/11587514_15
Exploring and Evaluating Array Layout Restructuring for SIMDization, Intl. Workshop on Languages and Compilers for Parallel Computing, pp.351-366 ,
DOI : 10.1007/978-3-319-17473-0_23
Berkeley lab checkpoint/restart (BLCR) for Linux clusters, Journal of Physics: Conference Series, vol.46, issue.1, p.494, 2006. ,
DOI : 10.1088/1742-6596/46/1/067
URL : http://iopscience.iop.org/article/10.1088/1742-6596/46/1/067/pdf
Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures, Intl. Conf. on Compiler Construction, pp.225-245, 2011. ,
DOI : 10.1109/COMPSAC.2009.82
URL : http://users.ece.cmu.edu/~franzf/papers/cc2011.pdf
Array Unification: A Locality Optimization Technique, Compiler Construction, pp.259-273, 2001. ,
DOI : 10.1007/3-540-45306-7_18
Prediction and trace compression of data access addresses through nested loop recognition, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.94-103, 2008. ,
DOI : 10.1145/1356058.1356071
URL : https://hal.archives-ouvertes.fr/inria-00504597
When polyhedral transformations meet SIMD code generation, ACM SIGPLAN Conf. on Prog. Lang. Design and Implementation, 2013. ,
DOI : 10.1145/2491956.2462187
URL : http://users.ece.cmu.edu/~franzf/papers/pldi13.pdf
ArrayTool, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.405-416, 2014. ,
DOI : 10.1109/PACT.2011.20
ADHA, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.479-480, 2014. ,
DOI : 10.1145/2628071.2628122
StructSlim: a lightweight profiler to guide structure splitting, Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, 2016. ,
DOI : 10.1145/996841.996872
Can traditional programming bridge the ninja performance gap for parallel computing applications? In Intl, Symp. on Computer Arch, pp.440-451, 2012. ,
Data Layout Optimization for Portable Performance, Intl. Euro-Par Conference, pp.250-262, 2015. ,
DOI : 10.1007/978-3-662-48096-0_20
DL: A data layout transformation system for heterogeneous computing, 2012 Innovative Parallel Computing (InPar), pp.1-11, 2012. ,
DOI : 10.1109/InPar.2012.6339606
URL : http://impact.crhc.illinois.edu/shared/papers/dl_inpar2012_ack.pdf
Towards a Semantics-Aware Code Transformation Toolchain for Heterogeneous Systems, Program Transformation for Programmability in Heterogeneous Arch. Workshop, 2016. ,
DOI : 10.1007/978-3-540-25935-0_13
URL : http://arxiv.org/pdf/1701.03319
Automatic Data Layout Transformation for Heterogeneous Many-Core Systems, Network and Parallel Computing, pp.208-219, 2014. ,
DOI : 10.1007/978-3-662-44917-2_18
URL : https://hal.archives-ouvertes.fr/hal-01403085
Optimizing 3D Convolutions for Wavelet Transforms on CPUs with SSE Units and GPUs, Intl. Euro-Par Conference, 2013. ,
DOI : 10.1007/978-3-642-40047-6_82
URL : https://hal.archives-ouvertes.fr/hal-00953056
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators, PLoS ONE, vol.65, issue.1, pp.1-10, 2014. ,
DOI : 10.1371/journal.pone.0086484.s001
Vp3: A vectorization potential performance prototype, Workshop on Programming Models for SIMD/Vector Processing, 2015. ,