S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran et al., Cudnn: Efficient primitives for deep learning. arXiv preprint arX- iv:1410, p.759, 2014.

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He et al., DaDianNao: A Machine-Learning Supercomputer, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.609-622, 2014.
DOI : 10.1109/MICRO.2014.58

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao et al., Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '15, pp.161-170, 2015.
DOI : 10.1145/1498765.1498785

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li et al., Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, pp.26-35, 2016.
DOI : 10.1109/92.784091

T. S. Czajkowski, U. Aydonat, D. Denisenko, J. Freeman, M. Kinsner et al., From opencl to high-performance hardware on FPGAS, 22nd International Conference on Field Programmable Logic and Applications (FPL), pp.531-534, 2012.
DOI : 10.1109/FPL.2012.6339272

D. Lin, S. Talathi, and S. Annapureddy, Fixed point quantization of deep convolutional networks, International Conference on Machine Learning, pp.2849-2858, 2016.

J. Gu, Y. Liu, Y. Gao, and M. Zhu, OpenCL caffe, Proceedings of the 4th International Workshop on OpenCL, IWOCL '16, 2016.
DOI : 10.1145/2909437.2909443

. Altera, Altera opencl design examples. https://www.altera.com/support/supportresources/design-examples/design-software/opencl/matrix-multiplication

N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma et al., Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, pp.16-25, 2016.
DOI : 10.1145/2664666.2664670

M. Zhu, L. Liu, C. Wang, and Y. Xie, Cnnlab: a novel parallel framework for neural networks using gpu and fpga-a practical study with trade-off analysis. arXiv preprint, 2016.

J. Zhang and J. Li, Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pp.25-34, 2017.
DOI : 10.1201/EBK1439811924