Optimizing compilers for modern architectures: a dependence-based approach, 2002. ,
Direct N-body Kernels for Multicore Platforms, 2009 International Conference on Parallel Processing, pp.379-387, 2009. ,
DOI : 10.1109/ICPP.2009.71
Can cpus match gpus on performance with productivity? technical report rc25033, IBM, 2010. ,
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-12, 2008. ,
DOI : 10.1109/SC.2008.5222004
what every programmer should know about memory. technical report, Red Hat, 2007. ,
Boost. simd: generic programming for portable simdization, International Conference on Parallel architectures and compilation techniques, pp.431-432, 2012. ,
Parallel Computing Experiences with CUDA, Parallel computing experiences with cuda, pp.13-27, 2008. ,
DOI : 10.1109/MM.2008.57
A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, 1988. ,
DOI : 10.5244/C.2.23
The use of the genie system in numerical calculation, Annual Review in Automatic Programming, vol.2, pp.1-28, 1961. ,
Multidimensional streams rooted in dataflow, IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, 1993. ,
Debunking the 100x gpu vs. cpu myth: an evaluation of throughput computing on cpu and gpu, International Symposium on Computer Architecture, pp.451-460, 2010. ,
Iterative optimization in the polyhedral model: part ii, multidimensional time, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08), pp.90-100, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-01257273
Numerical Recipes in C book set: Numerical Recipes in C: The Art of Scientific Computing, pp.20-23, 1992. ,
Can traditional programming bridge the ninja performance gap for parallel computing applications, International Symposium on Computer Architecture, pp.440-451, 2012. ,
Improving cache behavior of dynamically allocated data structures, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), pp.322-329, 1998. ,
DOI : 10.1109/PACT.1998.727268
Better performance at lower occupancy, GPU Technology Conference, 2010. ,
Lu, qr and cholesky factorizations using vector capabilities of gpus, technical report, 2008. ,