Skip to Main content Skip to Navigation
Conference papers

Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA

Abstract : Nowadays, the rapid growth of data across the Internet has provided sufficient labeled data to train deep structured artificial neural networks. While deeper structured networks bring about significant precision gains in many applications, they also pose an urgent demand for higher computation capacity at the expense of power consumption. To this end, various FPGA based deep neural network accelerators are proposed for higher performance and lower energy consumption. However, as a dilemma, the development cycle of FPGA application is much longer than that of CPU and GPU. Although FPGA vendors such as Altera and Xilinx have released OpenCL framework to ease the programming, tuning the OpenCL codes for desirable performance on FPGAs is still challenging. In this paper, we look into the OpenCL implementation of Convolutional Neural Network (CNN) on FPGA. By analysing the execution manners of a CPU/GPU oriented verision on FPGA, we find out the causes of performance difference between FPGA and CPU/GPU and locate the performance bottlenecks. According to our analysis, we put forward a corresponding optimization method focusing on external memory transfers. We implement a prototype system on an Altera Stratix V A7 FPGA, which brings a considerable 4.76$$\times $$ speed up to the original version. To the best of our knowledge, this implementation outperforms most of the previous OpenCL implementations on FPGA by a large margin.
Document type :
Conference papers
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Friday, February 9, 2018 - 2:26:40 PM
Last modification on : Tuesday, September 3, 2019 - 3:04:02 PM
Long-term archiving on: : Friday, May 4, 2018 - 12:11:32 AM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Yuran Qiao, Junzhong Shen, Dafei Huang, Qianming Yang, Mei Wen, et al.. Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA. 14th IFIP International Conference on Network and Parallel Computing (NPC), Oct 2017, Hefei, China. pp.100-111, ⟨10.1007/978-3-319-68210-5_9⟩. ⟨hal-01705448⟩



Record views


Files downloads