Towards Resource-Efficient Compound AI Systems

Compound AI Systems, integrating multiple interacting components like models, retrievers, and external tools, have
emerged as essential for addressing complex AI tasks. However, current implementations suffer from inefficient resource utilization due to tight coupling between application
logic and execution details, a disconnect between orchestration and resource management layers, and the perceived
exclusiveness between efficiency and quality.
We propose a vision for resource-efficient Compound AI
Systems through a declarative workflow programming model
and an adaptive runtime system for dynamic scheduling and
resource-aware decision-making. Decoupling application
logic from low-level details exposes levers for the runtime to
flexibly configure the execution environment and resources,
without compromising on quality. Enabling collaboration
between the workflow orchestration and cluster manager
enables higher efficiency through better scheduling and resource management.
We are building a prototype system, called Murakkab, to
realize this vision. Our preliminary evaluation demonstrates
speedups up to ∼ 3.4× in workflow completion times while
delivering ∼ 4.5× higher energy efficiency, showing promise
in optimizing resources and advancing AI system design