Core Logic and Mathematical Principles
The essence of multi-sequence dynamic programming is to perform topological path searches within a state space constructed by multi-dimensional Cartesian products. This section focuses on the Longest Common Subsequence (LCS) and Edit Distance, two problems that illustrate the state evolution during the processes of "alignment" and "matching."
1. Longest Common Subsequence (LCS)
Given two sequences $A$ (length $N$) and $B$ (length $M$), we seek the length of their longest common subsequence. The core logic lies in determining the equality of the current end elements $A[i]$ and $B[j]$:
- If $A[i] == B[j]$, this character must belong to the current optimal common subsequence's end, and the state transitions directly from the predecessor $f[i-1][j-1]$ with an increase of 1.
- If $A[i] \neq B[j]$, it indicates that at least one of $A[i]$ or $B[j]$ is redundant, and the optimal solution must arise from the sub-states of "discarding $A[i]$" and "discarding $B[j]".
2. Edit Distance
Given two strings $A$ and $B$, we calculate the minimum number of operations required to convert $A$ into $B$ (allowing insertion, deletion, and substitution of characters). Geometrically, the edit distance can be viewed as the shortest path problem on a two-dimensional grid, moving from the top left corner $(0,0)$ to the bottom right corner $(N,M)$. Each type of operation corresponds to a displacement vector in the state space:
- Delete $A[i]$: Advance 1 in the dimension of $A$, while the dimension of $B$ remains static. The vector is represented as $(i-1, j) \to (i, j)$.
- Insert $B[j]$: Advance 1 in the dimension of $B$, while the dimension of $A$ remains static. The vector is represented as $(i, j-1) \to (i, j)$.
- Substitute character: Advance both dimensions by 1. The vector is represented as $(i-1, j-1) \to (i, j)$. If $A[i] == B[j]$, no substitution is needed, and the cost is 0; otherwise, the cost is 1.
State Design and Algorithm Derivation
1. LCS State and Equation
- State Design: $f[i][j]$ represents the length of the longest common subsequence of the first $i$ characters of sequence $A$ and the first $j$ characters of sequence $B$.
- Transition Equation:
$$f[i][j] = \begin{cases} f[i-1][j-1] + 1 & \text{if } A[i] == B[j] \\ \max(f[i-1][j], f[i][j-1]) & \text{if } A[i] \neq B[j] \end{cases}$$
- Boundary Conditions: $f[i][0] = f[0][j] = 0$.
2. Edit Distance State and Equation
- State Design: $f[i][j]$ represents the minimum number of operations required to convert the first $i$ characters of string $A$ to the first $j$ characters of string $B$.
- Transition Equation:
$$f[i][j] = \min \begin{cases} f[i-1][j] + 1 & \text{(deletion operation)} \\ f[i][j-1] + 1 & \text{(insertion operation)} \\ f[i-1][j-1] + \text{cost} & \text{(substitution operation)} \end{cases}$$
where if $A[i] == B[j]$, then $\text{cost} = 0$, otherwise $\text{cost} = 1$.
- Boundary Conditions:
- $f[i][0] = i$: To convert a string of length $i$ to an empty string, $i$ operations are required.
- $f[0][j] = j$: To convert an empty string to the first $j$ characters of $B$, $j$ operations are required.
C++ Standard Source Code (NOIP Style)
The following source code implements a naive $O(N \times M)$ algorithm for both the Longest Common Subsequence and Edit Distance in a single program.
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
using namespace std;
const int MAXN = 2005; // For NOIP traditional 2D DP data range
int lcs_f[MAXN][MAXN];
int edit_f[MAXN][MAXN];
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
string s1, s2;
if (!(cin >> s1 >> s2)) return 0;
int n = s1.length();
int m = s2.length();
// To conform to algorithm habits, convert string indices to start from 1
string a = " " + s1;
string b = " " + s2;
// ==================== 1. Longest Common Subsequence (LCS) ====================
// Explicitly assign boundary values (although global variables default to 0, engineering norms require explicit representation)
for (int i = 0; i <= n; ++i) lcs_f[i][0] = 0;
for (int j = 0; j <= m; ++j) lcs_f[0][j] = 0;
for (int i = 1; i <= n; ++i) {
for (int j = 1; j <= m; ++j) {
if (a[i] == b[j]) {
lcs_f[i][j] = lcs_f[i - 1][j - 1] + 1; // Critical pitfall: this must not be confused with the max operation
} else {
lcs_f[i][j] = max(lcs_f[i - 1][j], lcs_f[i][j - 1]);
}
}
}
// ==================== 2. Edit Distance (Edit Distance) ====================
// Boundary initialization: cost of converting empty string
for (int i = 0; i <= n; ++i) edit_f[i][0] = i;
for (int j = 0; j <= m; ++j) edit_f[0][j] = j;
for (int i = 1; i <= n; ++i) {
for (int j = 1; j <= m; ++j) {
int cost = (a[i] == b[j]) ? 0 : 1;
// Take the minimum of three transformation vectors
int del_op = edit_f[i - 1][j] + 1;
int ins_op = edit_f[i][j - 1] + 1;
int rep_op = edit_f[i - 1][j - 1] + cost;
edit_f[i][j] = min({del_op, ins_op, rep_op});
}
}
cout << "LCS: " << lcs_f[n][m] << "\n";
cout << "Edit Distance: " << edit_f[n][m] << "\n";
return 0;
}
NOIP Practical Pitfall Guide
1. String Index and DP State Index Misalignment
- Phenomenon: In C++,
std::stringindices start from0and end atlen-1. If the DP loop is written asfor (int i = 1; i <= n; ++i), directly accessings1[i]will cause an out-of-bounds error (accessings1[n]) or misss1[0]. - Avoidance Measure: It is recommended to use the "space padding method" in the exam. After reading the string, forcibly concatenate a space at the front
s = " " + s;. This shifts the original string characters, allowings[i]to perfectly align with the physical definition of "the first $i$ characters" in state $f[i][j]$.
2. Ignoring Replacement Cost When $A[i] == B[j]$ in Edit Distance
- Phenomenon: When calculating the edit distance, contestants mistakenly believe that when $A[i] == B[j]$, the entire state $f[i][j]$ directly equals $f[i-1][j-1]$. This overlooks potentially smaller costs from composite operations like "delete first, then insert" in extreme cases (although direct alignment is optimal under normal circumstances, this can lead to fatal logical flaws in modified problems where the costs of various operations are unequal).
- Avoidance Measure: Regardless of whether the current characters are equal, all three operations—deletion, insertion, and substitution—must be included in the
min()operation, withcost = 0or1only adjusting the weight of the substitution operation.
Classic NOIP/Luogu Problems
1. Luogu P2758 Edit Distance
- Problem Description: Let $A$ and $B$ be two strings. We need to determine the minimum number of operations required to convert $A$ into $B$. Allowed operations include deleting a character, inserting a character, and changing one character to another.
- Core Problem Nature and Solution Idea: A standard two-dimensional multi-sequence DP template problem.
- Core Idea: Strictly follow the edit distance equation derived above for state transitions. The outer loop enumerates $i$, while the inner loop enumerates $j$, utilizing three-way branching to relax and update
f[i][j]. Given the small data range (string length $\le 2000$), the $O(N \times M)$ time and space complexity can perfectly meet the constraints.
2. Luogu P1156 Garbage Trap
- Problem Description: Carmen (a cow) is trapped at the bottom of a well with depth $D$. Every so often, a piece of garbage falls, each with a falling time $t_i$, the time it can sustain life $f_i$, and the stacking height $h_i$. Carmen initially has 10 hours of life. Determine the earliest time Carmen can jump out of the well; if not, find out how long she can survive.
- Core Problem Nature and Solution Idea: A variant of the multi-sequence alignment problem (composite alignment of time and height axes). Garbage falls in discrete stages, and must be sorted by time in ascending order.
- State Design: $f[i][j]$ represents the maximum remaining life value Carmen has after processing the first $i$ pieces of garbage and with the current stacking height of $j$.
- Core Idea: Treat garbage as stages. For the $i^{th}$ piece of garbage, its falling time is $t_i$. First, ensure that the remaining life from the previous stage can last until $t_i$, i.e., $f[i-1][j] \ge t_i - t_{i-1}$. Based on this, make two decision updates (reverse push type):
- To stack: $f[i][j + h_i] = \max(f[i][j + h_i], f[i-1][j] - (t_i - t_{i-1}))$. If $j + h_i \ge D$, then output $t_i$ and terminate the program.
- To eat: $f[i][j] = \max(f[i][j], f[i-1][j] - (t_i - t_{i-1}) + f_i)$. If all garbage is processed and Carmen still cannot reach a height of $\ge D$, the global maximum survival time is the maximum span that the state with height 0 can sustain.