The Modeling and Management of Computational Sprinting
Tuesday, September 15, 2020 — 3:45PM - 4:30PM
Online data intensive services process diverse workloads. These services rely on cloud computing’s dynamic resource provisioning to reduce operational costs while delivering the expected quality of service. Providers lose revenue when they inefficiently provision resources to avoid SLO violations. This work studies the modeling and management of computational sprinting for OLDI services in the cloud. Computational sprinting is a resource management technique that allocates ephemeral resources to speed up execution in short bursts. Sprinting provisions resources for the common case and allocates resources to improve long-running performance. Workload and dynamic-runtime conditions make setting good sprinting policies challenging. Greedy policies exhaust the budget too quickly, missing opportunities to improve tail end performance. Conservative policies allow queuing delay to grow unbounded, causing poor performance for requests waiting in the system. We develop a metric, coined as sprint ability, to measure the efficiency of a sprinting policy. We use sprint ability to compare competing policies with the state-of-the-art and identify opportunities for performance improvements. Furthermore, we develop hybrid-approaches to accurately model average response time for several architectures and workloads managed by sprinting. Our model-driven approaches enable fast exploration of sprinting policies which help find policies yielding lower response times. In one case study, we used our approach to increase revenue by 1.6X for AWS burstable instances without sacrificing throughput. Our work employs sprinting with core scaling, DVFS, and LLC cache-allocation. We plan to extend our work by managing sprinting for machine learning workloads on a serverless architecture.