This project aims to accelerate the inference and training of Deep Neural Networks (DNN) using FPGAs for high energy efficiency and low latency in data centers.
We have been developing a CNN (Convolutional Neural Network) accelerator based on an embedded FPGA platform. A dynamic-precision data quantization method and a convolver design that is efficient for all layer types in CNN are proposed to improve the bandwidth and resource utilization. Results show that only 0.4% accuracy loss is introduced by our data quantization flow for the very deep VGG16 model when 8/4-bit quantization is used. VGG16-SVD is implemented on an embedded FPGA platform (Xilinx Zynq) as a case study. The system on Xilinx Zynq ZC706 board achieves a framerate at 4.45 fps with the top-5 accuracy of 86.66% using 16-bit quantization. The average performance of Convolutional layers and the full CNN is 187.8GOP/s and 137.0GOP/s under 150MHz working frequency