S. Johnson, http://docs.nvidia.com/cuda/cufft/ 5. https://software.intel.com/en-us/intel-mkl 6. https://team.inria.fr/aoste/ 7. https://software.intel.com/sites The Design and Implementation of FFTW3, www.spiral.net 4 Proceedings of the IEEE, pp.319433-319455, 2005.

B. Cipra, The Best of the 20th Century : Editors Name Top 10 Algorithms, SIAM News, vol.33, issue.4, 2000.

J. R. Johnson and R. W. Johnson, Challenges of Computing the Fast Fourier Transform, DARPA CONFERENCE, 1997.

J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, Mathematics of Computation, vol.19, issue.90, pp.297-301, 1965.
DOI : 10.1090/S0025-5718-1965-0178586-1

E. Quinnell, E. Swartzlander, and C. Lemonds, Floating-Point Fused Multiply-Add Architectures, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers, pp.331-337, 2007.
DOI : 10.1109/ACSSC.2007.4487224

E. Linzer and E. Feig, Implementation of Efficient FFT Algorithms on Fused Multiply- Add Architectures, IEEE Transactions on Signal Processing, vol.41, issue.1, p.93, 1993.
DOI : 10.1109/TSP.1993.193130

E. Linzer and E. Feig, Modified FFTs for fused multiply-add architectures, Mathematics of Computation, vol.60, issue.201, pp.347-361, 1993.
DOI : 10.1090/S0025-5718-1993-1159169-0

S. Goedecker, Fast Radix 2, 3, 4, and 5 Kernels for Fast Fourier Transformations on Computers with Overlapping Multiply--Add Instructions, SIAM Journal on Scientific Computing, vol.18, issue.6, pp.1605-1611, 1997.
DOI : 10.1137/S1064827595281940

J. Perez-seva, The optimizations of signal processing algorithms of modern parallel and embedded architectures. Theses, 2009.
URL : https://hal.archives-ouvertes.fr/tel-00610865

M. A. Bergach, S. Tissot, M. Syska, and R. Simone, Scaling Performance of FFT Computation on an Industrial Integrated GPU Co-processor : Experiments with Algorithm Adaptation, 2014.

C. Van-loan, Computational Frameworks for the Fast Fourier Transform, Frontiers in Applied Mathematics, Society for Industrial and Applied Mathematics, 1992.
DOI : 10.1137/1.9781611970999

W. Cochran, J. W. Cooley, D. Favin, H. Helms, R. Kaenel et al., What is the fast Fourier transform?, Proceedings of the IEEE, vol.55, issue.10, pp.1664-1674, 1967.
DOI : 10.1109/PROC.1967.5957

R. C. Singleton, An algorithm for computing the mixed radix fast Fourier transform, IEEE Transactions on Audio and Electroacoustics, vol.17, issue.2, pp.93-103, 1969.
DOI : 10.1109/TAU.1969.1162042

S. G. Johnson and M. Frigo, A Modified Split-Radix FFT With Fewer Arithmetic Operations, IEEE Transactions on Signal Processing, vol.55, issue.1, pp.111-119, 2007.
DOI : 10.1109/TSP.2006.882087

D. J. Bernstein, The Tangent FFT Algebraic Algorithms and Error-correcting Codes, Proceedings of the 17th International Conference on Applied Algebra, pp.291-300, 2007.

I. J. Good, The interaction algorithm and practical Fourier analysis, Journal of the Royal Statistical Society. Series B, 1960.

P. Duhamel and M. Vetterli, Fast Fourier Transforms, Signal Processing, vol.19, pp.259-299, 1990.
DOI : 10.1201/9781420046076-c7

S. Winograd, On computing the Discrete Fourier Transform, Proceedings of the National Academy of Sciences, vol.73, issue.4, p.175, 1978.
DOI : 10.1073/pnas.73.4.1005

S. Winograd, On the multiplicative complexity of the Discrete Fourier Transform, Advances in Mathematics, vol.32, issue.2, pp.83-117, 1979.
DOI : 10.1016/0001-8708(79)90037-9

T. Hartley, A. Fasih, C. Berdanier, F. Ozguner, and U. Catalyurek, Investigating the use of GPU-accelerated nodes for SAR image formation, 2009 IEEE International Conference on Cluster Computing and Workshops, pp.1-8, 2009.
DOI : 10.1109/CLUSTR.2009.5289125

O. Altun, S. Paker, and M. Kartal, Realization of interpolation-free fast sar range-doppler algorithm using parallel processing on gpu, Progress In Electromagnetics Research Symposium Proceedings, pp.998-1002, 2013.

R. Motwani, K. V. Palem, V. Sarkar, and S. Reyen, Combining register allocation and instruction scheduling, tech. rep, 1995.

D. Koufaty and D. Marr, Hyperthreading technology in the netburst microarchitecture, IEEE Micro, vol.23, issue.2, pp.56-65, 2003.
DOI : 10.1109/MM.2003.1196115

J. L. Hennessy and D. A. Patterson, Computer Architecture, Fifth Edition : A Quantitative Approach, 2011.

J. Kim, C. Torng, S. Srinath, D. Lockhart, and C. Batten, Microarchitectural Mechanisms to Exploit Value Structure in SIMT Architectures, Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, pp.130-141, 2013.

F. Franchetti and M. Püschel, Encyclopedia of Parallel Computing, ch. Fast Fourier Transform, 2011.

C. Temperton, Self-Sorting In-Place Fast Fourier Transforms, SIAM Journal on Scientific and Statistical Computing, vol.12, issue.4, pp.808-823, 1991.
DOI : 10.1137/0912043

M. Frigo and S. G. Johnson, FFTW: an adaptive software architecture for the FFT, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.1381-1384, 1998.
DOI : 10.1109/ICASSP.1998.681704

D. Mirkovi´cmirkovi´c and S. L. Johnsson, Automatic performance tuning in the uhfft library, Computational Science ICCS 2001, pp.71-80, 2001.

E. W. Dijkstra, A short introduction to the art of programming, Technische Hogeschool Eindhoven Eindhoven, vol.4, 1971.

A. M. Blake, Computing the fast Fourier transform on SIMD microprocessors. Thesis, 2012.

S. Ocovaj and Z. Lukac, Optimization of conjugate-pair split-radix FFT algorithm for SIMD platforms, 2014 IEEE International Conference on Consumer Electronics (ICCE), pp.373-374, 2014.
DOI : 10.1109/ICCE.2014.6776047

W. Xu, Z. Yan, and D. Shunying, A high performance FFT library with single instruction multiple data (SIMD) architecture, 2011 International Conference on Electronics, Communications and Control (ICECC), pp.630-633, 2011.
DOI : 10.1109/ICECC.2011.6066463

K. Zhang, S. Chen, S. Liu, Y. Wang, and J. Huang, Accelerating the data shuffle operations for FFT algorithms on SIMD DSPs, 2011 9th IEEE International Conference on ASIC, pp.683-686, 2011.
DOI : 10.1109/ASICON.2011.6157297

H. Izumi, K. Sasaki, K. Nakajima, and H. Sato, An efficient technique for corner-turn in SAR image reconstruction by improving cache access, Proceedings 16th International Parallel and Distributed Processing Symposium, p.67, 2002.
DOI : 10.1109/IPDPS.2002.1015471

N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli, High performance discrete Fourier transforms on graphics processors, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-2, 2008.
DOI : 10.1109/SC.2008.5213922

Y. Ogata, T. Endo, N. Maruyama, and S. Matsuoka, An efficient, model-based CPU-GPU heterogeneous FFT library, Parallel and Distributed Processing IEEE International Symposium on, pp.1-10, 2008.

V. Volkov and B. Kazian, Fitting fft onto the g80 architecture, 2008.

M. Daga, A. M. Aji, and W. Feng, On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing, 2011 Symposium on Application Accelerators in High-Performance Computing, pp.141-149, 2011.
DOI : 10.1109/SAAHPC.2011.29

V. Kelefouras, G. Athanasiou, N. Alachiotis, H. Michail, A. Kritikakou et al., A Methodology for Speeding Up Fast Fourier Transform Focusing on Memory Architecture Utilization, IEEE Transactions on Signal Processing, vol.59, issue.12, pp.6217-6226, 2011.
DOI : 10.1109/TSP.2011.2168525

R. Baghdadi, A. Cohen, S. Guelton, S. Verdoolaege, J. Inoue et al., Pencil : Towards a platform-neutral compute intermediate language for dsls, 2nd Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC, associated with SC), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00786828

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631

URL : https://hal.archives-ouvertes.fr/inria-00384363

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.1299-1308, 2013.
DOI : 10.1109/IPDPS.2013.66

URL : https://hal.archives-ouvertes.fr/hal-00799904