Keynote Speakers

Prof. Geoffrey Fox, ACM Fellow, University of Virginia, USA

Foxreceived a Ph.D. in Theoretical Physics from Cambridge University, where he was Senior Wrangler. He is now a Professor in the Biocomplexity Institute & Initiative and Computer Science Department at the University of Virginia. He previously held positions at Caltech, Syracuse University, Florida State University, and Indiana University. after being a postdoc at the Institute for Advanced Study at Princeton, Lawrence Berkeley Laboratory, and Peterhouse College Cambridge. He has supervised the Ph.D. of 75 students. He received the High-Performance Parallel and Distributed Computing (HPDC) Achievement Award and the ACM - IEEE CS Ken Kennedy Award for Foundational contributions to parallel computing in 2019. He is a Fellow of APS (Physics) and ACM (Computing) and works on the interdisciplinary interface between computing and applications. He is currently active in the Industry consortium MLCommons/MLCommons.

Speech Title: AI for Science: Deep Learning for Geospatial Time Series

Abstract: We describe the use of deep learning to describe geospatial time series. We present a general approach building on previous work on recurrent neural networks and transformers. We give three examples of so-called spatial bags from earthquake, medical time series and particle dynamics and focus on earthquake forecasting. The latter is presented as an MLCommons benchmark challenge with three different implementations: pure recurrent network, a spatio-temporal science transformer and a version of the Google Temporal Fusion Transformer. We discuss how deep learning is used to both cleanup the inputs and describe hidden dynamics. We show that both data engineering (wrangling data into desired input format) and data science (the deep learning training/inference) are needed and comment on achieving high performance in both. We briefly speculate how such p[articular examples can drive broad progress in AI for science.

Prof. Dhabaleswar K. (DK) Panda, IEEE Fellow, The Ohio State University, USA

DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (, are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.48M downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 20th, and 31st ranked ones) in the TOP500 list. MPI-driven solutions for providing high-performance and scalable deep learning for TensorFlow and TensorFlow frameworks are available from Solutions to accelerate Big Data applications are available from Prof. Panda leads one of the recently funded NSF AI Institutes – ICICLE ( to design intelligent cyberinfrastructure for next-generation systems. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at

Speech Title: Designing High-Performance and Scalable Middleware for HPC and AI: Challenges and Opportunities

Abstract: This talk will focus on challenges and opportunities in designing middleware for HPC, AI (Deep/Machine Learning), and Data Science for On-premise HPC and Cloud systems with advances in networking and accelerator technologies. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks (InfiniBand and RoCE), GPUs (including GPUDirect RDMA), and emerging BlueField-2 DPUs. Features, sample performance numbers and best practices of using MVAPICH2 libraries ( be presented. For the Deep/Machine Learning domain, we will focus on MPI-driven solutions ( to extract performance and scalability for popular Deep Learning frameworks (TensorFlow and PyTorch) and large out-of-core models. Accelerating Deep Learning applications with Bluefield-2 DPUs will also be presented. MPI-driven solutions to accelerate data science applications like Dask ( will be highlighted. Finally, we will outline the challenges and experiences in deploying this middleware to the HPC cloud environments for Azure, AWS, and Oracle.

Prof. Baochun Li, IEEE Fellow, University of Toronto, Canada

Baochun Li received his B.Engr. degree from the Department of Computer Science and Technology, Tsinghua University, China, in 1995 and his M.S. and Ph.D. degrees from the Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, in 1997 and 2000. Since 2000, he has been with the Department of Electrical and Computer Engineering at the University of Toronto, where he is currently a Professor. He holds the Bell Canada Endowed Chair in Computer Engineering since August 2005. His research interests include cloud computing, distributed systems, datacenter networking, and wireless systems.

Dr. Li has co-authored more than 360 research papers, with a total of over 17000 citations, an H-index of 75 and an i10-index of 233, according to Google Scholar Citations. He was the recipient of the IEEE Communications Society Leonard G. Abraham Award in the Field of Communications Systems in 2000. In 2009, he was a recipient of the Multimedia Communications Best Paper Award from the IEEE Communications Society, and a recipient of the University of Toronto McLean Award. He is a member of ACM and a Fellow of IEEE.

Speech Title: Back to the Basics: Proven Recipes in Distributed Systems Design

Abstract: From training a machine learning model with billions of operations to running a large datacenter with tens of thousands of servers, designing large-scale parallel and distributed systems that are both scalable and fault-tolerant is never an easy task. Many design ideas have failed in the past three decades, but many more succeeded, and what we achieved so far were considerable feats of engineering. In this talk, I will share my own ideas and thoughts on the pitfalls and lessons learned on designing and implementing scalable, fast, and practical parallel and distributed systems. To support my ideas, I will share anecdotal evidence from a long history of real-world distributed systems, ranging from a simple time synchronization protocol to distributed machine learning systems in recent academic work.