Abstract
Retrieval-augmented generation (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. With device-cloud collaborative inference, deploying small language models on edge devices has become feasible, creating a setting where private documents remain on the device and public knowledge resides in the cloud. Privacy and policy constraints often forbid raw document exchange, leading to a document-isolated dual-end RAG setting.
Existing methods rely on frequent remote synchronization and dense evidence transfer, limiting throughput under realistic latency and bandwidth conditions. To address this, we propose CONCORD, an asynchronous sparse aggregation framework for dual-end RAG under document isolation. CONCORD treats the cloud as an asynchronously arriving evidence source rather than a continuously synchronized co-generator. Specifically, we introduce waiting debt control to decide whether each decoding step should continue waiting for remote participation based on the observed return of waiting. We also design a certificate-guided minimal supplementation mechanism that requests only the remote evidence needed to determine the current greedy decision. Steps that consult the cloud preserve the same greedy token as dense dual-end aggregation, while the remaining steps commit locally without remote evidence.
Experiments on Natural Questions and WikiText-2 show that CONCORD improves end-to-end throughput over baselines by $1.66\times$ and $2.15\times$, respectively, while reducing per-token communication by over two orders of magnitude and maintaining comparable answer quality and perplexity.
Blogger's Review: CONCORD effectively addresses the document isolation issue in device-cloud collaborative inference through its innovative asynchronous sparse aggregation mechanism, significantly enhancing throughput while safeguarding privacy, demonstrating immense potential in practical applications. Its novel waiting debt control and certificate-guided supplementation mechanisms provide fresh insights for future RAG research.