Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning, ASPLOS '14, 2014. ,
Eie: Efficient inference engine on compressed deep neural network, ISCA'16 ,
, Amazon ec2 f1
Ese: Efficient speech recognition engine with sparse lstm on fpga, FPGA '17 ,
, Tensorflow xla
Introducing nnvm compiler: A new open end-to-end compiler for ai frameworks, 2017. ,
, Tensorflow architecture
Eigen v3, 2010. ,
Scalable graph traversal on sunway taihulight with ten million cores, IPDPS'17 ,
Learning multiple layers of features from tiny images, 2009. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, 1998. ,
Rethinking the inception architecture for computer vision, CoRR, 2015. ,
Identity mappings in deep residual networks, CoRR, 2016. ,
swdnn: A library for accelerating deep learning applications on sunway taihulight, IPDPS'17, 2017. ,
cudnn: Efficient primitives for deep learning, CoRR, 2014. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998. ,
Imagenet classification with deep convolutional neural networks, NIPS'12, 2012. ,
Tensorflow: A system for large-scale machine learning, OSDI'16 ,
Caffe: Convolutional architecture for fast feature embedding, MM '14, pp.675-678 ,
Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems, CoRR, 2015. ,
Nvidia cuda c programming guide, Nvidia Corporation, 2011. ,
Eyeriss: A spatial architecture for energyefficient dataflow for convolutional neural networks, ISCA '16, 2016. ,
Scnn: An accelerator for compressedsparse convolutional neural networks, ISCA '17 ,
Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks, FPGA '16, 2016. ,
Going deeper with embedded fpga platform for convolutional neural network, 2016. ,
Bridge the gap between neural networks and neuromorphic hardware with a neural network compiler, ASPLOS '18 ,