NeFut Logo NeFut
Admin Login

In-Depth Exploration of Persistent Segment Trees: Efficient Implementation and Optimization for Interval K-th Largest Queries

Published at: 2026-05-27 09:17 Last updated: 2026-06-06 13:04
#algorithm #optimization #Segment Tree

Core Logic and Mathematical Principles

The Persistent Segment Tree, commonly referred to as the Chairman Tree in competitive programming, addresses the core problem of: retrieving historical versions of the segment tree and querying the static k-th largest element in a range with logarithmic time and space complexity.

1. Functional State and Path Copying

If we were to naively copy the entire segment tree for every modification, the space complexity would reach $O(M \cdot N)$. The core mathematical logic of the persistent segment tree is: between two adjacent versions (e.g., version $t-1$ and version $t$), since a single point modification only alters a single chain in the tree, there are at most $\lceil \log_2 N \rceil$ nodes that change state, while the vast majority of unmodified subtrees can be shared.

2. Prefix Sum Differential Principle for k-th Largest in Range

The persistent weighted segment tree tackles the non-linear problem of "k-th largest in a range" using the concept of prefix sums. Let $Root[i]$ represent the version of the weighted segment tree formed by inserting only the first $i$ elements of the original sequence. As $i$ increases, the frequency count tree[u] of each node in the segment tree exhibits prefix sum reducibility (differential property).

To find the k-th largest element in the range $[L, R]$, we simultaneously perform a binary search on the weighted segment trees of $Root[R]$ and $Root[L-1]$. For the left subtree of the current value domain:

$$cnt = tree[ls[Root[R]]] - tree[ls[Root[L-1]]]$$

Here, $cnt$ strictly represents: the number of elements within the original sequence interval $[L, R]$ that fall into the current left value domain. Through this difference, we successfully decouple the interval problem and transform it into a global k-th smallest tree binary search logic.


State Design and Algorithm Derivation

Taking the classic static k-th largest in a range as an example:

1. Discretization Mapping

The value range of the original array $A$ may be large, so we first discretize it to obtain an ordered and deduplicated array $raw_v$, with size $U$. All elements of the original array are mapped to their respective ranks in the discretized array.

2. State Definition

3. Version Propagation Insert (Insert)

Define the recursive function insert(old_node, &new_node, l, r, val):

  1. Allocate a new node: new_node = ++cnt;
  2. Inherit old state: tree[new_node] = tree[old_node] + 1; (increment frequency of the current node)
  3. Binary Propagation:
    • If $l = r$, return.
    • Let $m = (l + r) \gg 1$. If $val \le m$, it indicates the modification is in the left subtree. The new node's right pointer directly borrows from the old node: rs[new_node] = rs[old_node], and the left pointer recursively creates: insert(ls[old_node], ls[new_node], l, m, val).
    • If $val > m$, similarly borrow the left pointer: ls[new_node] = ls[old_node], and recursively create the right pointer: insert(rs[old_node], rs[new_node], m + 1, r, val).

C++ Standard Source Code (NOIP Style)

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

inline int read() {
    int x = 0, f = 1;
    char ch = getchar();
    while (ch < '0' || ch > '9') {
        if (ch == '-') f = -1;
        ch = getchar();
    }
    while (ch >= '0' && ch <= '9') {
        x = (x << 3) + (x << 1) + (ch ^ 48);
        ch = getchar();
    }
    return x * f;
}

const int MAXN = 200005;
// Space estimation: static tree construction N*4 + N insertions, each adding log2(N) nodes.
// 200000 * 4 + 200000 * 18 requires about 4.4 * 10^6 space. Directly allocate to 25 times N.
const int MAX_NODES = MAXN * 25; 

int a[MAXN], raw_v[MAXN];
int Root[MAXN]; // Records the root node index of each version
int tree[MAX_NODES], ls[MAX_NODES], rs[MAX_NODES];
int cnt = 0;

// Build the empty static base tree topology for version 0
int build(int l, int r) {
    int u = ++cnt;
    tree[u] = 0;
    if (l == r) return u;
    int m = (l + r) >> 1;
    ls[u] = build(l, m);
    rs[u] = build(m + 1, r);
    return u;
}

// Core: Advance and derive a new version of the segment tree based on the old version node
void insert(int old_u, int &new_u, int l, int r, int val) {
    new_u = ++cnt; // Physically allocate a new node, implementing path copying
    ls[new_u] = ls[old_u];
    rs[new_u] = rs[old_u];
    tree[new_u] = tree[old_u] + 1; // Accumulate frequency along the path

    if (l == r) return;
    int m = (l + r) >> 1;
    if (val <= m) {
        insert(ls[old_u], ls[new_u], l, m, val); // Critical pitfall: recursively pass the corresponding left subtree pointers of new and old nodes
    } else {
        insert(rs[old_u], rs[new_u], m + 1, r, val); // Critical pitfall: recursively pass the corresponding right subtree pointers of new and old nodes
    }
}

// Core: Dual-version synchronized binary search differential retrieval
int query(int u_L, int u_R, int l, int r, int k) {
    if (l == r) return l;
    int m = (l + r) >> 1;
    // Differential calculation of how many elements in the original array interval [L, R] fall into the value domain of the current left subtree
    int left_cnt = tree[ls[u_R]] - tree[ls[u_L]]; 

    if (k <= left_cnt) {
        return query(ls[u_L], ls[u_R], l, m, k);
    } else {
        return query(rs[u_L], rs[u_R], m + 1, r, k - left_cnt);
    }
}

int main() {
    int n = read();
    int m = read();
    for (int i = 1; i <= n; ++i) {
        a[i] = read();
        raw_v[i] = a[i];
    }

    // Discretization preprocessing
    sort(raw_v + 1, raw_v + n + 1);
    int tot = unique(raw_v + 1, raw_v + n + 1) - (raw_v + 1);

    Root[0] = build(1, tot); // Generate version 0

    for (int i = 1; i <= n; ++i) {
        int pos = lower_bound(raw_v + 1, raw_v + tot + 1, a[i]) - raw_v;
        insert(Root[i - 1], Root[i], 1, tot, pos); // Version i is built on the basis of version i-1
    }

    while (m--) {
        int l = read(), r = read(), k = read();
        // Pass the root nodes of version l-1 and version r for differential query
        int ans_idx = query(Root[l - 1], Root[r], 1, tot, k);
        printf("%d\n", raw_v[ans_idx]);
    }
    return 0;
}

NOIP Practical Pitfall Guide

1. Differential Node Call Errors

When performing the k-th largest query in a range, it is crucial to pass Root[L - 1] and Root[R]. A common mistake made by contestants is to write query(Root[L], Root[R], ...). This is because the Root[L] version already includes the contribution of the $L$-th element itself, and if it is subtracted, it will forcibly exclude the $L$-th element from the interval, leading to severe data underflow at the left endpoint of the interval.

2. Incorrect Space Limit MAX_NODES Leading to Dynamic Overflow

The space for a persistent segment tree cannot be covered by the conventional one-dimensional array of $4N$. Since each modification independently adds a chain of length $\log_2(\text{tot})$. The space estimation should satisfy $N \times 4 + N \times \lceil \log_2 N \rceil$. For $N=2 \times 10^5$, it should be opened at least to $N \times 24$. If it is too small, ++cnt will overflow the array boundary during execution, causing online test cases to throw RE or return garbled values.

3. Pointer Inheritance Omission

In the insert function, it is essential to write ls[new_u] = ls[old_u]; rs[new_u] = rs[old_u]; as soon as the function is entered. If these lines are omitted, and only the changes on one side are updated recursively, the pointers of the unchanged side subtree will default to 0 (i.e., become empty subtrees). This will lead to significant structural damage and loss of all non-modified data from the historical versions.

Original Source: local://7.3

[h] Back to Home