[CS.DS] Exploring the Limits of Fast Hypothesis Selection

In the hypothesis selection problem, we are given sample and query access to a finite set of candidate distributions (hypotheses), $\mathcal{H} = \{H_1, \ldots, H_n\}$, and samples from an unknown distribution $P$. The goal is to output a distribution $Q$ whose distance to $P$ is comparable to that of the nearest hypothesis in $\mathcal{H}$.

Specifically, if the minimum distance is $\mathsf{OPT}$, we aim to output $Q$ such that, with probability at least $1-\delta$, its total variation distance to $P$ is at most $C \cdot \mathsf{OPT} + \varepsilon$.

The optimal approximation for proper algorithms (where $Q \in \mathcal{H}$) is $C=3$ using $\Theta(\log(n/\delta)/\varepsilon^2)$ samples from $P$, while for improper algorithms (where $Q$ is not necessarily in $\mathcal{H}$) is $C=2$ using $\tilde{\Theta}(\log(n/\delta)/\varepsilon^2)$ samples from $P$.

In the improper setting, the algorithm achieving $C=2$ [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] runs in time which grows polynomially with $|\mathcal{X}|$ and does not run in finite time for real-valued distributions.

A promising path towards improved runtime is to consider improper algorithms which output a mixture $Q$ of the hypotheses, as such a distribution can be represented in $n$ words of memory.

We show (1) a lower bound that no algorithm which outputs a mixture can achieve an approximation better than $C = 3-2/n$ unless the number of samples is polynomial in $|\mathcal{X}|$, and (2) an algorithm which runs in time $\text{poly}(n)$ and achieves the same approximation guarantee.

In the proper setting, [Aliakbarpour, Bun, Smith, NeurIPS 2024] provided an algorithm with $C=3$ running in $\tilde{O}(n/(\delta^3\varepsilon^3))$ time. We improve this time complexity to $\tilde{O}(n/(\delta \varepsilon^2))$, significantly reducing the dependence on the confidence and error parameters.

Blogger's Review: This paper explores the complexity of hypothesis selection and offers new insights for algorithm optimization. The proposed mixture output method, especially in the improper hypothesis context, presents potential efficiency improvements for practical applications, warranting further investigation. Overall, the authors' improvement in time complexity holds significant theoretical and practical implications.