Dhabaleswar K (DK) Panda
Dhabaleswar K (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He is also the Founder and CEO of X-ScaleSolutions, Inc. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High-Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP, and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.45M downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 12th, 20th, and 31st ranked ones in the TOP500 list and many OpenPOWER clusters). Prof. Panda’s research group at OSU has been focusing on High-performance and scalable Distributed Training of popular Deep Learning Frameworks (TensorFlow and PyTorch) using MPI-driven libraries. These enhanced versions are available from https://hidl.cse.ohio-state.edu. Prof. Panda is a regular Keynote Speaker, Invited Speaker, and Tutorial Speaker for many events including the ones focusing on OpenPOWER platforms (OpenPOWER Summit 2018 and 2020 and several OpenPOWER Academic Summits). During the last 20 years, he has presented 79 Keynote Talks, 185 Invited Tutorials, and 363 Invited Talks. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda
This talk will focus on high-performance and scalable middleware for MPI and DL applications on the OpenPOWER platform. The focus will be on three products with commercial support being available from X-ScaleSolutions. The first product focuses on the OSU MVAPICH2 MPI libraries and their capabilities for high-performance computing with both CPUs (OpenPOWER) and GPUs (NVIDIA). The second product focuses on tight integration between the OSU MVAPICH2-GDR MPI library and the Horovod stack to provide high-performance and scalable Deep Learning (DL) with deep introspection (DI) capabilities for DL frameworks like TensorFlow and PyTorch. The DI capabilities allow DL users and runtime developers to easily optimize their DL applications on modern systems. The third product focuses on a high-performance and scalable checkpointing library for HPC and DL applications. Performance results from the ORNL SUMMIT system (#2nd) and Lassen (#20th) with thousands of GPUs and POWER9 CPUs will be presented.