High performance computing is an important and useful tool for many scientific domains, and many users demand highest performance for their applications. Today, most HPC machines provide a heterogeneous architecture, mixing different processors, accelerators, and other components specific for the expected load of the customers and following the latest trends in computer architectures. More
Performance and energy efficiency are now critical concerns in high performance scientific computing. To address these twin concerns while continuing to provide unprecedented computational performance, HPC systems today (For example: Top500) have tight integration of energy-efficient multicore CPU processors and accelerators (GPUs, Intel Xeon Phis, FPGAs, etc). However, this tight integration has created formidable challenges for model and algorithm developers. More
The ongoing research on Neural Networks has started to focus on reducing the computation and storage requirements to make their deployment feasible in energy constraint compute environments. One of the promising opportunities is the reduction of the compute operators down to a few bit precision whereby these networks achieve close to state of the art accuracy compared to their floating point counterparts. In this talk, we will show an automated framework for implementing these reduced precision (and in the extreme case fully binarized) neural networks on reconfigurable logic that can scale reduced precision neural networks onto an FPGA-based inference accelerator, given a set of fixed design constraints. More