L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S. Ghasemi et al., Daubechies wavelets as a basis set for density functional pseudopotential calculations, The Journal of Chemical Physics, vol.129, issue.1, p.14109, 2008.
DOI : 10.1063/1.2949547

H. Nussbaumer, Fast fourier transform and convolution algorithms, 1982.
DOI : 10.1007/978-3-662-00551-4

L. Genovese, M. Ospici, T. Deutsch, J. Méhaut, A. Neelov et al., Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures, The Journal of Chemical Physics, vol.131, issue.3, p.34103, 2009.
DOI : 10.1063/1.3166140

S. Goedecker, Rotating a three-dimensional array in an optimal position for vector processing: case study for a three-dimensional fast Fourier transform, Computer Physics Communications, vol.76, issue.3, pp.294-300, 1993.
DOI : 10.1016/0010-4655(93)90057-J

S. Goedecker, M. Boulet, and T. Deutsch, An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of??multiprocessor nodes, Computer Physics Communications, vol.154, issue.2, pp.105-110, 2003.
DOI : 10.1016/S0010-4655(03)00287-X

A. Nukada, Y. Hourai, A. Nishida, and Y. Akiyama, High Performance 3D Convolution for Protein Docking on IBM Blue Gene, Lecture Notes in Computer Science, vol.4742, pp.958-969, 2007.
DOI : 10.1007/978-3-540-74742-0_84

O. Fialka and M. Cadik, FFT and Convolution Performance in Image Filtering on GPU, Tenth International Conference on Information Visualisation (IV'06), pp.609-614, 2006.
DOI : 10.1109/IV.2006.53

V. Podlozhnyuk, Image convolution with cuda NVIDIA Corporation white paper [11] Z. Danovich, " 16bit 3D Convolution: SSE4+OpenMP implementation on Penryn CPU, 2007.

A. Va?ko and M. ?rámek, Optimizing Gaussian Filtering of Volumetric Data Using SSE Concurrency and Computation: Practice and Experience, pp.100-116, 2011.

M. Hopf and T. Ertl, Accelerating 3d convolution using graphics hardware (case study)99: celebrating ten years, ser. VIS '99, Proceedings of the conference on Visualization, pp.471-474, 1999.

R. C. Whaley and J. Dongarra, Automatically Tuned Linear Algebra Software, Proceedings of the IEEE/ACM SC98 Conference, 1999.
DOI : 10.1109/SC.1998.10004

M. Frigo, A fast fourier transform compiler, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, ser. PLDI '99, pp.169-180, 1999.

J. Xiong, J. Johnson, R. Johnson, and D. Padua, Spl: A language and compiler for dsp algorithms, ACM SIGPLAN Notices, pp.298-308, 2001.

P. Mucci, S. Browne, C. Deane, and G. Ho, Papi: A portable interface to hardware performance counters, Proc. Dept. of Defense HPCMP Users Group Conference. Citeseer, pp.7-10, 1999.

M. Wolf and M. Lam, A loop transformation theory and an algorithm to maximize parallelism Parallel and Distributed Systems, IEEE Transactions on, vol.2, issue.4, pp.452-471, 1991.

S. Thakkur and T. Huff, Internet Streaming SIMD Extensions, Computer, vol.32, issue.12, pp.26-34, 1999.
DOI : 10.1109/2.809248

K. Opencl, OpenCL: Open Computing Language

C. , N. S. Alam, G. Fourestey, B. Videau, L. Genovese et al., Overlapping computations with communications and i/o explicitly using openmp based heterogeneous threading models, NVIDIA_OpenCL_BestPracticesGuide.pdf [24] IWOMP, ser. Lecture Notes in Computer Science, B. M. Chapman, F. Massaioli, M. S. Müller, and M. Rorro, pp.267-270, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00953052