Proton dissociation constants (pKa) are crucial for functional molecule discovery and molecular modeling. Building on iBonD, the largest experimental pKa database, we and other researchers have developed several methods, including machine-learning-based empirical predictions and high-accuracy energy calculations. Despite this foundation, the rapid augmentation of high-quality pKa data remains fundamentally constrained. We performed large-scale regression-based pKa prediction on unlabeled molecular datasets using a collection of extensively optimized machine-learning models. Results indicate that the feature distributions of unlabeled molecular datasets approximate normality, with extreme scarcity of tail-region samples. Although such augmentation is valuable for improving overall data availability and predictive modeling, it remains insufficient for efficiently discovering molecules with broad-spectrum pKa properties. To address this, we explore the targeted generation of molecules with sparse pKa properties from the vast chemical space. Given that traditional continuous latent space VAE-RNN methods for molecular generation suffer from insufficient stability and fail to demonstrate clear advantages in complementing sparse data, we design and implement a quantum-assisted sparse-pKa molecular generation. Feasibility is validated on a simulated quantum annealer, with superior extreme-value sampling achieved on physical coherent Ising machines (CIMs).
Blogger's Review: This paper explores a novel domain of molecular generation through quantum computing, particularly under data scarcity. The quantum-assisted method could revolutionize chemistry and materials science, making its subsequent research and application potential worth monitoring.