Geoffrey Charles Fox received the Ph.D. degree in theoretical physics from Cambridge University, Cambridge, U.K.,He is currently a Distinguished Professor of Computing, Engineering, and Physics with Indiana University Bloomington, Bloomington, IN, USA, where he is also the Director of the Digital Science Center and the Chair of the Intelligent Systems Engineering Department, School of Informatics and Computing. His research interests include applying computer science from infrastructure to analytics in biology, pathology, sensor clouds, earthquake and ice-sheet science, image processing, deep learning, network science, financial systems, and particle physics. The infrastructure work is built around software-defined systems on clouds and clusters. The analytics focuses on scalable parallelism.,Dr. Fox is a Fellow of American Physical Society and the Association for Computing Machinery.(Based on document published on 5 June 2017).
DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.48M downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 20th, and 31st ranked ones) in the TOP500 list. MPI-driven solutions for providing high-performance and scalable deep learning for TensorFlow and TensorFlow frameworks are available from http://hidl.cse.ohio-state.edu. Solutions to accelerate Big Data applications are available from http://hibd.cse.ohio-state.edu. Prof. Panda leads one of the recently funded NSF AI Institutes – ICICLE (https://icicle.osu.edu) to design intelligent cyberinfrastructure for next-generation systems. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.
Speech Title: Designing High-Performance and Scalable Middleware for HPC and AI: Challenges and Opportunities
Abstract: This talk will focus on challenges and opportunities in designing middleware for HPC, AI (Deep/Machine Learning), and Data Science for On-premise HPC and Cloud systems with advances in networking and accelerator technologies. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks (InfiniBand and RoCE), GPUs (including GPUDirect RDMA), and emerging BlueField-2 DPUs. Features, sample performance numbers and best practices of using MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu)will be presented. For the Deep/Machine Learning domain, we will focus on MPI-driven solutions (http://hidl.cse.ohio-state.edu) to extract performance and scalability for popular Deep Learning frameworks (TensorFlow and PyTorch) and large out-of-core models. Accelerating Deep Learning applications with Bluefield-2 DPUs will also be presented. MPI-driven solutions to accelerate data science applications like Dask (http://hibd.cse.ohio-state.edu) will be highlighted. Finally, we will outline the challenges and experiences in deploying this middleware to the HPC cloud environments for Azure, AWS, and Oracle.
Baochun Li received his B.Engr. degree from the Department of Computer Science and Technology, Tsinghua University, China, in 1995 and his M.S. and Ph.D. degrees from the Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, in 1997 and 2000. Since 2000, he has been with the Department of Electrical and Computer Engineering at the University of Toronto, where he is currently a Professor. He holds the Bell Canada Endowed Chair in Computer Engineering since August 2005. His research interests include cloud computing, distributed systems, datacenter networking, and wireless systems.
Dr. Li has co-authored more than 360 research papers, with a total of over 17000 citations, an H-index of 75 and an i10-index of 233, according to Google Scholar Citations. He was the recipient of the IEEE Communications Society Leonard G. Abraham Award in the Field of Communications Systems in 2000. In 2009, he was a recipient of the Multimedia Communications Best Paper Award from the IEEE Communications Society, and a recipient of the University of Toronto McLean Award. He is a member of ACM and a Fellow of IEEE.
Speech Title: Back to the Basics: Proven Recipes in Distributed Systems Design
Abstract: From training a machine learning model with billions of operations to running a large datacenter with tens of thousands of servers, designing large-scale parallel and distributed systems that are both scalable and fault-tolerant is never an easy task. Many design ideas have failed in the past three decades, but many more succeeded, and what we achieved so far were considerable feats of engineering. In this talk, I will share my own ideas and thoughts on the pitfalls and lessons learned on designing and implementing scalable, fast, and practical parallel and distributed systems. To support my ideas, I will share anecdotal evidence from a long history of real-world distributed systems, ranging from a simple time synchronization protocol to distributed machine learning systems in recent academic work.
Copyright © The 2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2021) All rights reserved.