Cudnn: Efficient primitives for deep learning. arXiv preprint arX- iv:1410, p.759, 2014. ,
DaDianNao: A Machine-Learning Supercomputer, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.609-622, 2014. ,
DOI : 10.1109/MICRO.2014.58
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '15, pp.161-170, 2015. ,
DOI : 10.1145/1498765.1498785
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, pp.26-35, 2016. ,
DOI : 10.1109/92.784091
From opencl to high-performance hardware on FPGAS, 22nd International Conference on Field Programmable Logic and Applications (FPL), pp.531-534, 2012. ,
DOI : 10.1109/FPL.2012.6339272
Fixed point quantization of deep convolutional networks, International Conference on Machine Learning, pp.2849-2858, 2016. ,
OpenCL caffe, Proceedings of the 4th International Workshop on OpenCL, IWOCL '16, 2016. ,
DOI : 10.1145/2909437.2909443
Altera opencl design examples. https://www.altera.com/support/supportresources/design-examples/design-software/opencl/matrix-multiplication ,
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, pp.16-25, 2016. ,
DOI : 10.1145/2664666.2664670
Cnnlab: a novel parallel framework for neural networks using gpu and fpga-a practical study with trade-off analysis. arXiv preprint, 2016. ,
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pp.25-34, 2017. ,
DOI : 10.1201/EBK1439811924