Abstract
Contextual pattern matching is the task of finding all $occ$ distinct contexts in which a pattern $P[1,m]$ occurs in a text $T[1,n]$, given a context length $\lambda$. The context consists of the $\lambda$ symbols preceding and following the occurrence.
While this problem can be solved in optimal time $O(m+occ)$ using $O(n)$-space precomputed data structures on $T$, such space can be prohibitive for large repetitive text collections. We present the first optimal-time solution that runs in compressed space, specifically using a symmetric CDAWG (SCDAWG) of $T$.
Furthermore, we show how to enumerate the set of $occ$ solutions with $O(\log\log\lambda)$ delay after $O(m)$-time preprocessing of $P$. To achieve this, we develop an improved linear-space distance-sensitive weighted ancestor data structure.
Blogger's Review: This study presents an efficient contextual pattern matching approach for large repetitive texts, leveraging the advantages of compressed space to significantly reduce space complexity. By introducing the symmetric CDAWG, the authors effectively address the limitations of traditional methods, showcasing substantial potential for practical applications in information retrieval. This innovative approach is likely to see widespread use.