O. Aumage, D. Barthou, C. Haine, and T. Meunier, Detecting SIMDization Opportunities through Static/Dynamic Dependence Analysis, Workshop on Productivity and Performance (PROPER), 2013.
DOI : 10.1007/978-3-642-54420-0_62
URL : https://hal.archives-ouvertes.fr/hal-00858004

D. Barthou, G. Grosdidier, M. Kruse, O. Pene, and C. Tadonki, QIRAL: A High Level Language for Lattice QCD Code Generation, Programming Language Approaches to Concurrency and Communication-centric Software Workshop, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00666885

C. Ding and K. Kennedy, Inter-array Data Regrouping, Intl. Workshop on Languages and Compilers for Parallel Computing, pp.149-163, 2000.
DOI : 10.1007/3-540-44905-1_10

H. Edwards and C. Trott, Kokkos: Enabling Performance Portability Across Manycore Architectures, 2013 Extreme Scaling Workshop (xsw 2013), pp.18-24, 2013.
DOI : 10.1109/XSW.2013.7

T. Ewart, F. Delalondre, and F. Schrmann, Cyme: A Library Maximizing SIMD Computation on User-Defined Containers, Supercomputing, pp.440-449, 2014.
DOI : 10.1007/978-3-319-07518-1_29

T. Grosser, J. Ramanujam, L. Pouchet, P. Sadayappan, and S. Pop, Optimistic Delinearization of Parametrically Sized Arrays, Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pp.351-360, 2015.
DOI : 10.1007/11587514_15

C. Haine, O. Aumage, E. Petit, and D. Barthou, Exploring and Evaluating Array Layout Restructuring for SIMDization, Intl. Workshop on Languages and Compilers for Parallel Computing, pp.351-366
DOI : 10.1007/978-3-319-17473-0_23

P. H. Hargrove and J. C. , Berkeley lab checkpoint/restart (BLCR) for Linux clusters, Journal of Physics: Conference Series, vol.46, issue.1, p.494, 2006.
DOI : 10.1088/1742-6596/46/1/067
URL : http://iopscience.iop.org/article/10.1088/1742-6596/46/1/067/pdf

T. Henretty, K. Stock, L. Pouchet, F. Franchetti, J. Ramanujam et al., Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures, Intl. Conf. on Compiler Construction, pp.225-245, 2011.
DOI : 10.1109/COMPSAC.2009.82
URL : http://users.ece.cmu.edu/~franzf/papers/cc2011.pdf

M. Kandemir, Array Unification: A Locality Optimization Technique, Compiler Construction, pp.259-273, 2001.
DOI : 10.1007/3-540-45306-7_18

A. Ketterlin and P. Clauss, Prediction and trace compression of data access addresses through nested loop recognition, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.94-103, 2008.
DOI : 10.1145/1356058.1356071
URL : https://hal.archives-ouvertes.fr/inria-00504597

M. Kong, R. Veras, K. Stock, F. Franchetti, L. Pouchet et al., When polyhedral transformations meet SIMD code generation, ACM SIGPLAN Conf. on Prog. Lang. Design and Implementation, 2013.
DOI : 10.1145/2491956.2462187
URL : http://users.ece.cmu.edu/~franzf/papers/pldi13.pdf

X. Liu, K. Sharma, and J. Mellor-crummey, ArrayTool, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.405-416, 2014.
DOI : 10.1109/PACT.2011.20

D. Majeti, K. S. Meel, R. Barik, and V. Sarkar, ADHA, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.479-480, 2014.
DOI : 10.1145/2628071.2628122

P. Roy and X. Liu, StructSlim: a lightweight profiler to guide structure splitting, Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, 2016.
DOI : 10.1145/996841.996872

N. Satish, C. Kim, J. Chhugani, H. Saito, R. Krishnaiyer et al., Can traditional programming bridge the ninja performance gap for parallel computing applications? In Intl, Symp. on Computer Arch, pp.440-451, 2012.

K. Sharma, I. Karlin, J. Keasler, J. R. Mcgraw, and V. Sarkar, Data Layout Optimization for Portable Performance, Intl. Euro-Par Conference, pp.250-262, 2015.
DOI : 10.1007/978-3-662-48096-0_20

I. Sung, G. Liu, and W. Hwu, DL: A data layout transformation system for heterogeneous computing, 2012 Innovative Parallel Computing (InPar), pp.1-11, 2012.
DOI : 10.1109/InPar.2012.6339606
URL : http://impact.crhc.illinois.edu/shared/papers/dl_inpar2012_ack.pdf

S. Tamarit, J. Mario, G. Vigueras, and M. Carro, Towards a Semantics-Aware Code Transformation Toolchain for Heterogeneous Systems, Program Transformation for Programmability in Heterogeneous Arch. Workshop, 2016.
DOI : 10.1007/978-3-540-25935-0_13
URL : http://arxiv.org/pdf/1701.03319

Y. Tseng, Y. Huang, B. C. Lai, and J. Lin, Automatic Data Layout Transformation for Heterogeneous Many-Core Systems, Network and Parallel Computing, pp.208-219, 2014.
DOI : 10.1007/978-3-662-44917-2_18
URL : https://hal.archives-ouvertes.fr/hal-01403085

B. Videau, V. Marangozova-martin, L. Genovese, and T. Deutsch, Optimizing 3D Convolutions for Wavelet Transforms on CPUs with SSE Units and GPUs, Intl. Euro-Par Conference, 2013.
DOI : 10.1007/978-3-642-40047-6_82
URL : https://hal.archives-ouvertes.fr/hal-00953056

W. Wang, L. Xu, J. Cavazos, H. H. Huang, and M. Kay, Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators, PLoS ONE, vol.65, issue.1, pp.1-10, 2014.
DOI : 10.1371/journal.pone.0086484.s001

D. C. Wong, D. J. Kuck, D. Palomares, Z. Bendifallah, M. Tribalat et al., Vp3: A vectorization potential performance prototype, Workshop on Programming Models for SIMD/Vector Processing, 2015.