The ongoing research on Neural Networks has started to focus on reducing the computation and storage requirements to make their deployment feasible in energy constraint compute environments. One of the promising opportunities is the reduction of the compute operators down to a few bit precision whereby these networks achieve close to state of the art accuracy compared to their floating point counterparts. In this talk, we will show an automated framework for implementing these reduced precision (and in the extreme case fully binarized) neural networks on reconfigurable logic that can scale reduced precision neural networks onto an FPGA-based inference accelerator, given a set of fixed design constraints.
We show, that the compute performance can scale well beyond typical floating point performance, currently demonstrating 10ks to millions of images per second for inference, 14 TOps compute performance with power consumption < 25W on today’s devices. Results on the accuracy, architecture comparison to other approaches and detailed implementation of the latest large networks will also be presented.
Session Category : Session 2 | Heterogeneous Computing