NeFut Logo NeFut
Admin Login

Efficient String Matching: A Comprehensive Guide to the KMP Algorithm

Published at: 2026-05-27 09:17 Last updated: 2026-06-10 09:00
#algorithm #String #KMP

Core Logic and Mathematical Principles

The primary task of the KMP algorithm is to achieve pattern matching of a single pattern string within a text string in $O(N + M)$ time complexity, breaking the inherent limit of brute-force matching at $O(NM)$. The underlying mathematical foundation is the Prefix Function, often defined as the $next$ array in competitive programming.

1. Mathematical Definition of the Prefix Function

For a string $P$ of length $M$, its prefix function $next[i]$ represents the length of the longest equal proper prefix and proper suffix of the substring $P[0 \dots i]$. The set representation is as follows:

$$next[i] = \max \big\{ k \big| 0 \le k < i+1 \text{ and } P[0 \dots k-1] == P[i-k+1 \dots i] \big\}$$

If this set is empty, then $next[i] = 0$.

2. Proof of Convergence for Mismatch State Transition

When the character $S[i]$ in the text string mismatches with the character $P[j]$ in the pattern string, traditional algorithms backtrack $i$. KMP keeps $i$ stationary and transitions $j$ to $next[j-1]$.

Proof: Given that $P[0 \dots j-1] == S[i-j \dots i-1]$. If there exists a shorter matching block length $k$ such that the new prefix can connect to the current suffix, it must satisfy $P[0 \dots k-1] == P[j-k \dots j-1]$. According to the definition of the prefix function, the maximum $k$ is strictly equal to $next[j-1]$. Since $next[j-1] < j$, the transition state is monotonically decreasing, and iteration must converge to $0$ in a finite number of steps.


State Design and Algorithm Derivation

1. Recursion of the Prefix Function $next$ Array

Assuming $next[0 \dots i-1]$ is known, we now derive the value of $next[i]$. Let $j = next[i-1]$.

$$next[i] = j + 1$$

2. Pattern Matching State Machine

Consider the pattern string $P$ as a finite state automaton, where the state value is the number of characters successfully matched, denoted as $j$. For each character $S[i]$ in the text string:

  1. If $S[i] \neq P[j]$ and $j > 0$, the state backtracks continuously along the mismatch pointer: j = next[j - 1].
  2. If $S[i] == P[j]$, the state advances: j++.
  3. If $j == M$, it indicates a complete match of the pattern string, record the answer position $i - M + 1$, and then forcefully transition the state to j = next[j - 1] to continue matching subsequent overlapping substrings.

C++ Standard Source Code (NOIP Style)

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

using namespace std;

// Efficient input function to handle regular string input
void fast_io() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
}

const int MAXM = 1000005;
int nxt[MAXM]; // Avoid global keyword conflicts with C++ standard libraries by renaming to nxt

// Strictly compute the prefix function based on 0-based indexing
void get_next(const string& p, int m) {
    nxt[0] = 0; // The substring of length 1 has no proper prefix or suffix, hence must be 0
    int j = 0;  // j always represents the current prefix character index to compare, and also the nxt value of the previous position
    for (int i = 1; i < m; ++i) {
        while (j > 0 && p[i] != p[j]) {
            j = nxt[j - 1]; // Critical point: on mismatch, must backtrack to nxt[j-1] instead of nxt[j]
        }
        if (p[i] == p[j]) {
            j++;
        }
        nxt[i] = j;
    }
}

// KMP main matching process
void kmp_search(const string& s, const string& p) {
    int n = s.length();
    int m = p.length();

    get_next(p, m);

    int j = 0; // Pointer for the pattern string
    for (int i = 0; i < n; ++i) {
        while (j > 0 && s[i] != p[j]) {
            j = nxt[j - 1]; // On mismatch, adaptively transition state; i pointer never backtracks
        }
        if (s[i] == p[j]) {
            j++;
        }
        if (j == m) {
            // Successfully matched the pattern in the text string, output the starting index (1-based index common in NOIP evaluations)
            cout << i - m + 2 << "\n"; 
            j = nxt[j - 1]; // Critical point: after a successful match, to find the next overlapping target, must forcibly transition state
        }
    }
}

int main() {
    fast_io();
    string s, p;
    if (cin >> s >> p) {
        kmp_search(s, p);
        int m = p.length();
        for (int i = 0; i < m; ++i) {
            cout << nxt[i] << (i == m - 1 ? "" : " ");
        }
        cout << "\n";
    }
    return 0;
}

NOIP Practical Avoidance Guide

1. Global Variable Naming Conflicts and Compilation Failures

When compiling with g++ -O2 in an offline Linux environment, if certain standard libraries are included, a global declaration int next[MAXN]; may conflict with the standard iterator operator std::next in <algorithm> or built-in namespaces. This will lead to cryptic symbol overloading errors from the compiler. In competitions, the array must be strictly named nxt or nxt_arr.

2. Mismatch Index Out of Bounds and Infinite Loops

Inside the while (j > 0 && p[i] != p[j]) loop, if the jump statement is written as j = nxt[j];, it will directly lead to logical disasters. Because when the character at $j$ mismatches, it should query the longest common prefix-suffix of the successfully matched substring $P[0 \dots j-1]$. If written as nxt[j], it not only loses mathematical correctness but may also lead to an infinite loop due to malicious data when nxt[j] == j, or trigger RE when $j = M$ due to out-of-bounds.

3. Mixing 0-based and 1-based Indexing Leading to State Gaps

The implementation of the KMP algorithm is highly dependent on index boundaries. If a 0-based index is used when reading the string, while the transition logic of the next array borrows from some textbooks' 1-based algorithms (e.g., nxt[j] = nxt[nxt[j]] and initially setting nxt[0] = -1), it will lead to physical deviations of $\pm 1$ in boundary indices, causing the state machine to completely lose its high-level interception capability, resulting in widespread WA in evaluations.


Classic NOIP/Luogu Problems

1. Luogu P3375 [Template] KMP String Matching

2. Luogu P2375 [NOI2014] Zoo

Original Source: local://8.1

[h] Back to Home