Performance and energy efficiency are now critical concerns in high performance scientific computing. To address these twin concerns while continuing to provide unprecedented computational performance, HPC systems today (For example: Top500) have tight integration of energy-efficient multicore CPU processors and accelerators (GPUs, Intel Xeon Phis, FPGAs, etc). However, this tight integration has created formidable challenges for model and algorithm developers.
In this talk, I will focus on the new complexities introduced in modern homogeneous parallel platforms composed of multicore and manycore processors such as resource contention and non-uniform memory access (NUMA). They are caused due to the cores contending for various shared on-chip resources such as Last Level Cache (LLC) and interconnect. Due to them, the performance and energy profiles of real-life scientific applications on these platforms are not smooth and may deviate significantly from the shapes that allowed traditional and state-of-the-art load balancing algorithms to minimize their computation time.
I will then present latest advances that address the challenges posed by the complexities. These include new model-based methods and algorithms for minimization of time and energy of computations for the most general shapes of performance and energy profiles of data parallel applications observed on such platforms. Using these algorithms as building blocks, I will also discuss bi-objective optimization of data-parallel applications on such platforms for performance and energy. I will conclude the talk with ongoing and future research in this direction.
Session Category : Session 2 | Heterogeneous Computing