AutoExecutor: Predictive Parallelism for Spark SQL Queries

  • ,
  • Abhishek Roy ,
  • Alekh Jindal ,
  • Rui Fang ,
  • Jeff Zheng ,
  • Xiaolei Liu ,
  • Ruiping Li

VLDB |

Right-sizing resources for query execution is important for cost-efficient performance, but estimating how performance is affected by resource allocations, upfront, before query execution is difficult. We demonstrate AutoExecutor, a predictive system that uses machine learning models to predict query run times as a function of the number of allocated executors, that limits the maximum allowed parallelism, for Spark SQL queries running on Azure Synapse.