PITTSBURGH and SUNNYVALE, Calif., July 27, 2021 /PRNewswire/ -- Petuum's CASL research and engineering team has won this year's OSDI 2021 Best Paper Award. This effort is led by Dr. Aurick Qiao who heads the Composability, Automatic, and Scalable Learning (CASL) research and engineering team at Petuum.
Dr. Qiao received the Jay Lepreau Best Paper Award at the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI) 2021 for the paper he co-authored, Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning which captures the revolutionary work implemented using one of CASL's key components, AdaptDL. Other authors include Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing from Petuum Inc., Carnegie Mellon University, UC Berkley, and MBZUAI.
Current live application of Pollux can be implemented via AdaptDL that integrates with PyTorch, Microsoft NNI, and with Ray coming soon.
Pollux as implemented by AdaptDL improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. Pollux models how their goodput (a novel metric that combines system throughput with statistical efficiency) would change by adding or removing resources. Leveraging this information, Pollux dynamically (re-)assigns resources to improve cluster-wide goodput, while respecting fairness and continually optimizing each DL job to better utilize those resources.
In experiments with real DL jobs and with trace-driven simulations, Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers, even when they are provided with ideal resource and training configurations for every job. Pollux promotes fairness among DL jobs competing for resources based on a more meaningful measure of useful job progress and reveals a new opportunity for reducing DL cost in cloud environments.
Dr. Eric Xing, Petuum's founder and Chief Scientist, and the founding President of Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) says "We are incredibly excited to see the recognition our CASL AdaptDL open-source project is getting! The 'Goodput' metric (used in AdaptDL) is a new way to divide ML deep learning training jobs over a cluster, leading to faster model training without quality loss. Goodput is one of several ideas behind our Composability, Automatic, and Scalable Learning (CASL) open-source consortium. We believe AI Production can be more sustainable if enabled in a principled, standardized, and modular way. Congratulations to the team and thank you for all the support from Carnegie Mellon University, Petuum, Inc. and Mohamed bin Zayed University of Artificial Intelligence."
To learn more about CASL and other projects check out the website and follow CASL at Medium.
Petuum, a leading provider of enterprise artificial intelligence (AI) software, brings the most advanced AI methodologies and software solutions to solve challenges that can't be addressed by traditional techniques. Petuum enables enterprises to design, build, experiment, customize, and operate multiple applied AI solutions in a wide range of industries. At Petuum we believe AI should be made accessible for all and embody this principal by open sourcing many of our advanced AI components through Composability, Automatic, and Scalable Learning (CASL) open-source consortium. Visit us, https://petuum.com/
Operating Systems Design and Implementation (OSDI) brings together professionals from academic and industrial backgrounds in what has become a premier forum for discussing the design, implementation, and implications of systems software. The symposium emphasizes innovative research as well as quantified or insightful experiences in systems design and implementation.