Artificial Intelligence (AI) systems must typically satisfy service-level objectives including accuracy, latency, and cost. The prevailing model-centric approaches select a monolithic model at design time and apply identical computation regardless of input difficulty, which cannot decompose tasks across specialized components, and have knowledge that is fixed at training time. This can lead to performance degradation and increasing costs during runtime. As the model is the main design variable, it determines the majority of system behavior, coupling operational objectives to a single design-time choice. Addressing these limitations requires shifting from model-centric to system-centric design. Compound AI systems realize this shift by orchestrating multiple models, algorithms, and tools as distributed AI systems through explicit control logic. The performance of such systems depends on their workflow topology, the models assigned to each task, and the parameters governing runtime behavior. We present a design methodology that organizes this space along two dimensions, workflow topology and configuration selection, and identifies eight design patterns, each consolidating techniques to address a specific limitation of monolithic deployment. We validate our methodology through three case studies. Across our case studies, Compound AI configurations approach accuracy of monolithic models within 2.5 to 4 percentage points while reducing latency by up to 60% and cost by up to 71%. We show that model selection and parameter configuration jointly determine system performance, but the resulting design space grows combinatorially, as workflows compose more patterns and components. Thus, we identify five open challenges that define a roadmap from manually configured prototypes towards systems that automatically discover and maintain SLO-compliance in Compound and Distributed AI systems.
Blogger's Review: This article presents an innovative design methodology emphasizing the importance of shifting from model-centric to system-centric approaches, particularly in multi-model orchestration. The case studies demonstrate significant advantages in performance and cost for Compound AI systems, warranting deeper exploration in practical applications.