W. Chen, P. Kosmas, M. Leeser, and C. Rappaport, An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm, Proceeding of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays , FPGA '04, pp.213-222, 2004.
DOI : 10.1145/968280.968311

C. He, W. Zhao, and M. Lu, Time domain numerical simulation for transient waves on reconfigurable coprocessor platform, Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp.127-136, 2005.

J. Cong, M. Huang, and Y. Zou, Accelerating Fluid Registration Algorithm on Multi-FPGA Platforms, 2011 21st International Conference on Field Programmable Logic and Applications, pp.50-57, 2011.
DOI : 10.1109/FPL.2011.20
URL : http://ballade.cs.ucla.edu/%7Econg/papers/fpl11.pdf

P. Diniz, M. Hall, J. Park, B. So, and H. Ziegler, Automatic mapping of C to FPGAs with the DEFACTO compilation and synthesis system, Microprocessors and Microsystems, vol.29, issue.2-3, pp.51-62, 2005.
DOI : 10.1016/j.micpro.2004.06.007

M. Kunz, A. Ostrowski, and P. Zipf, An FPGA-optimized architecture of horn and schunck optical flow algorithm for real-time applications, 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pp.1-4, 2014.
DOI : 10.1109/FPL.2014.6927406

W. Luzhou, K. Sano, and S. Yamamoto, Domain-Specific Language and Compiler for Stencil Computation on FPGA-Based Systolic Computational-Memory Array, Proceedings of the 8th International Symposium on Applied Reconfigurable Computing, pp.26-39, 2012.
DOI : 10.1109/71.97902

A. A. Nacci, V. Rana, F. Bruschi, D. Sciuto, I. Beretta et al., A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices, Proceedings of the 50th Annual Design Automation Conference on, DAC '13, pp.521-52, 2013.
DOI : 10.1145/2463209.2488797

O. Reiche, M. Schmid, F. Hannig, R. Membarth, and J. Teich, Code generation from a domain-specific language for C-based HLS of hardware accelerators, Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, CODES '14, 2014.
DOI : 10.1145/2656075.2656081

J. Cong, P. Li, B. Xiao, and P. Zhang, An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers, Proceedings of the 51st Annual Design Automation Conference, pp.771-77, 2014.

G. Natale, G. Stramondo, P. Bressana, R. Cattaneo, D. Sciuto et al., A polyhedral model-based framework for dataflow implementation on FPGA devices of iterative stencil loops, Proceedings of the 35th International Conference on Computer-Aided Design, ICCAD '16, pp.771-778, 2016.
DOI : 10.1145/2435264.2435271

S. Verdoolaege, isl: An Integer Set Library for the Polyhedral Model, Proceedings of the 3rd International Congress on Mathematical Software, pp.299-302, 2010.
DOI : 10.1007/978-3-642-15582-6_49

F. Irigoin and R. Triolet, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '88, pp.319-329, 1988.
DOI : 10.1145/73560.73588

M. E. Wolf and M. S. Lam, A data locality optimizing algorithm, Proceedings of the 12th Conference on Programming Language Design and Implementation, pp.30-44, 1991.
DOI : 10.1145/113445.113449

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 29th Conference on Programming Language Design and Implementation, pp.101-113, 2008.
DOI : 10.1145/1379022.1375595
URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf

T. Grosser, A. Cohen, J. Holewinski, P. Sadayappan, and S. Verdoolaege, Hybrid Hexagonal/Classical Tiling for GPUs, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pp.6666-6675, 2014.
DOI : 10.1145/2581122.2544160
URL : https://hal.archives-ouvertes.fr/hal-00911177

S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev et al., Effective automatic parallelization of stencil computations, Proceedings of the 28th Conference on Programming Language Design and Implementation, pp.235-244, 2007.
DOI : 10.1145/1250734.1250761
URL : http://www.cse.ohio-state.edu/~bondhugu/publications/pldi196-krishnamoorthy.ps

M. Christen, O. Schenk, and H. Burkhart, PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.676-687, 2011.
DOI : 10.1109/IPDPS.2011.70

Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C. Luk, and C. E. Leiserson, The pochoir stencil compiler, Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA '11, pp.117-128, 2011.
DOI : 10.1145/1989493.1989508

T. Henretty, R. Veras, F. Franchetti, L. Pouchet, J. Ramanujam et al., A stencil compiler for short-vector SIMD architectures, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pp.13-24, 2013.
DOI : 10.1145/2464996.2467268
URL : http://www.cs.ucla.edu/%7Epouchet/doc/ics-article.13.pdf

H. Fu and R. G. Clapp, Eliminating the memory bottleneck, Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '11, pp.65-74, 2011.
DOI : 10.1145/1950413.1950429

J. Hegarty, J. Brunhaver, Z. Devito, J. Ragan-kelley, N. Cohen et al., Darkroom, Proceedings of the 41st International Conference on Computer Graphics and Interactive Techniques, 2014.
DOI : 10.1145/2228360.2228472