This paper introduces a novel concept of suffixient sets, a prefix array (PA) compression technique. Unlike previous methods that compress the entire array, we utilize subsampling of PA, storing only a few entries (in fact, a compressed number of entries), proving that pattern matching via binary search remains possible with random access on the text.
We focus on solving two key problems:
- Is a given subset of text positions a suffixient set?
- How to find a suffixient set of minimum cardinality?
To achieve this, we provide linear-time algorithms that address these issues. Below is the core idea of the algorithm:
// Pseudocode example
function isSuffixientSet(positions) {
// Check if given positions are a suffixient set
}
function findMinimumSuffixientSet(text) {
// Find the minimum cardinality suffixient set
}
The effectiveness of these algorithms lies in their ability to achieve efficient pattern matching without requiring the full array.
Blogger's Review: The novel definition of suffixient sets offers a more efficient solution for text processing, especially in large-scale data handling, and the introduction of linear-time algorithms is worth further exploration.