In-Depth Exploration of Persistent Segment Trees: Efficie...

Core Logic and Mathematical Principles

The Persistent Segment Tree, commonly referred to as the Chairman Tree in competitive programming, addresses the core problem of: retrieving historical versions of the segment tree and querying the static k-th largest element in a range with logarithmic time and space complexity.

1. Functional State and Path Copying

If we were to naively copy the entire segment tree for every modification, the space complexity would reach $O(M \cdot N)$. The core mathematical logic of the persistent segment tree is: between two adjacent versions (e.g., version $t-1$ and version $t$), since a single point modification only alters a single chain in the tree, there are at most $\lceil \log_2 N \rceil$ nodes that change state, while the vast majority of unmodified subtrees can be shared.

Operational Method: When modifying a leaf node in version $t$, instead of directly overwriting the original tree, we create a new root node $Root[t]$. We recursively traverse down the modification path, copying a new identical node upon reaching each node.
Pointer Redirection: The unmodified branches of the new nodes directly point to the corresponding child nodes in the old version tree, while the modified branches point to the newly created lower-level nodes.
Space Complexity Proof: Each single point modification only creates $\lceil \log_2 N \rceil$ new nodes. With $M$ modifications, the total space complexity converges to $O(N + M \log N)$.

2. Prefix Sum Differential Principle for k-th Largest in Range

The persistent weighted segment tree tackles the non-linear problem of "k-th largest in a range" using the concept of prefix sums. Let $Root[i]$ represent the version of the weighted segment tree formed by inserting only the first $i$ elements of the original sequence. As $i$ increases, the frequency count tree[u] of each node in the segment tree exhibits prefix sum reducibility (differential property).

To find the k-th largest element in the range $[L, R]$, we simultaneously perform a binary search on the weighted segment trees of $Root[R]$ and $Root[L-1]$. For the left subtree of the current value domain:

$$cnt = tree[ls[Root[R]]] - tree[ls[Root[L-1]]]$$

Here, $cnt$ strictly represents: the number of elements within the original sequence interval $[L, R]$ that fall into the current left value domain. Through this difference, we successfully decouple the interval problem and transform it into a global k-th smallest tree binary search logic.

State Design and Algorithm Derivation

Taking the classic static k-th largest in a range as an example:

1. Discretization Mapping

The value range of the original array $A$ may be large, so we first discretize it to obtain an ordered and deduplicated array $raw_v$, with size $U$. All elements of the original array are mapped to their respective ranks in the discretized array.

2. State Definition

Root[i]: Records the physical index of the new segment tree root node after inserting the first $i$ elements.
ls[u], rs[u]: The left and right child node indices of node $u$.
tree[u]: The frequency sum maintained by node $u$.

3. Version Propagation Insert (Insert)

Define the recursive function insert(old_node, &new_node, l, r, val):

Allocate a new node: new_node = ++cnt;
Inherit old state: tree[new_node] = tree[old_node] + 1; (increment frequency of the current node)
Binary Propagation:
- If $l = r$, return.
- Let $m = (l + r) \gg 1$. If $val \le m$, it indicates the modification is in the left subtree. The new node's right pointer directly borrows from the old node: rs[new_node] = rs[old_node], and the left pointer recursively creates: insert(ls[old_node], ls[new_node], l, m, val).
- If $val > m$, similarly borrow the left pointer: ls[new_node] = ls[old_node], and recursively create the right pointer: insert(rs[old_node], rs[new_node], m + 1, r, val).

C++ Standard Source Code (NOIP Style)

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

inline int read() {
    int x = 0, f = 1;
    char ch = getchar();
    while (ch < '0' || ch > '9') {
        if (ch == '-') f = -1;
        ch = getchar();
    }
    while (ch >= '0' && ch <= '9') {
        x = (x << 3) + (x << 1) + (ch ^ 48);
        ch = getchar();
    }
    return x * f;
}

const int MAXN = 200005;
// Space estimation: static tree construction N*4 + N insertions, each adding log2(N) nodes.
// 200000 * 4 + 200000 * 18 requires about 4.4 * 10^6 space. Directly allocate to 25 times N.
const int MAX_NODES = MAXN * 25; 

int a[MAXN], raw_v[MAXN];
int Root[MAXN]; // Records the root node index of each version
int tree[MAX_NODES], ls[MAX_NODES], rs[MAX_NODES];
int cnt = 0;

// Build the empty static base tree topology for version 0
int build(int l, int r) {
    int u = ++cnt;
    tree[u] = 0;
    if (l == r) return u;
    int m = (l + r) >> 1;
    ls[u] = build(l, m);
    rs[u] = build(m + 1, r);
    return u;
}

// Core: Advance and derive a new version of the segment tree based on the old version node
void insert(int old_u, int &new_u, int l, int r, int val) {
    new_u = ++cnt; // Physically allocate a new node, implementing path copying
    ls[new_u] = ls[old_u];
    rs[new_u] = rs[old_u];
    tree[new_u] = tree[old_u] + 1; // Accumulate frequency along the path

    if (l == r) return;
    int m = (l + r) >> 1;
    if (val <= m) {
        insert(ls[old_u], ls[new_u], l, m, val); // Critical pitfall: recursively pass the corresponding left subtree pointers of new and old nodes
    } else {
        insert(rs[old_u], rs[new_u], m + 1, r, val); // Critical pitfall: recursively pass the corresponding right subtree pointers of new and old nodes
    }
}

// Core: Dual-version synchronized binary search differential retrieval
int query(int u_L, int u_R, int l, int r, int k) {
    if (l == r) return l;
    int m = (l + r) >> 1;
    // Differential calculation of how many elements in the original array interval [L, R] fall into the value domain of the current left subtree
    int left_cnt = tree[ls[u_R]] - tree[ls[u_L]]; 

    if (k <= left_cnt) {
        return query(ls[u_L], ls[u_R], l, m, k);
    } else {
        return query(rs[u_L], rs[u_R], m + 1, r, k - left_cnt);
    }
}

int main() {
    int n = read();
    int m = read();
    for (int i = 1; i <= n; ++i) {
        a[i] = read();
        raw_v[i] = a[i];
    }

    // Discretization preprocessing
    sort(raw_v + 1, raw_v + n + 1);
    int tot = unique(raw_v + 1, raw_v + n + 1) - (raw_v + 1);

    Root[0] = build(1, tot); // Generate version 0

    for (int i = 1; i <= n; ++i) {
        int pos = lower_bound(raw_v + 1, raw_v + tot + 1, a[i]) - raw_v;
        insert(Root[i - 1], Root[i], 1, tot, pos); // Version i is built on the basis of version i-1
    }

    while (m--) {
        int l = read(), r = read(), k = read();
        // Pass the root nodes of version l-1 and version r for differential query
        int ans_idx = query(Root[l - 1], Root[r], 1, tot, k);
        printf("%d\n", raw_v[ans_idx]);
    }
    return 0;
}

NOIP Practical Pitfall Guide

1. Differential Node Call Errors

When performing the k-th largest query in a range, it is crucial to pass Root[L - 1] and Root[R]. A common mistake made by contestants is to write query(Root[L], Root[R], ...). This is because the Root[L] version already includes the contribution of the $L$-th element itself, and if it is subtracted, it will forcibly exclude the $L$-th element from the interval, leading to severe data underflow at the left endpoint of the interval.

2. Incorrect Space Limit `MAX_NODES` Leading to Dynamic Overflow

The space for a persistent segment tree cannot be covered by the conventional one-dimensional array of $4N$. Since each modification independently adds a chain of length $\log_2(\text{tot})$. The space estimation should satisfy $N \times 4 + N \times \lceil \log_2 N \rceil$. For $N=2 \times 10^5$, it should be opened at least to $N \times 24$. If it is too small, ++cnt will overflow the array boundary during execution, causing online test cases to throw RE or return garbled values.

3. Pointer Inheritance Omission

In the insert function, it is essential to write ls[new_u] = ls[old_u]; rs[new_u] = rs[old_u]; as soon as the function is entered. If these lines are omitted, and only the changes on one side are updated recursively, the pointers of the unchanged side subtree will default to 0 (i.e., become empty subtrees). This will lead to significant structural damage and loss of all non-modified data from the historical versions.

In-Depth Exploration of Persistent Segment Trees: Efficient Implementation and Optimization for Interval K-th Largest Queries