Core Logic and Mathematical Principles
The essence of the Minimum Spanning Tree (MST) problem is: in a weighted undirected connected graph $G=(V, E)$ containing $N$ vertices and $M$ edges, select a set of edges $T \subseteq E$ such that $(V, T)$ forms a topologically connected tree, and the sum of the weights of all edges in the tree $\sum_{e \in T} w(e)$ achieves the global minimum value.
Within the NOIP framework, the Kruskal and Prim algorithms are constructed based on completely different algebraic principles of graph theory:
1. Edge Set Greedy Paradigm: Kruskal's Algorithm
The core mathematical foundation of Kruskal's algorithm is the Cut Property and Greedy Expansion.
- Mathematical Principle: Sort all edges in the graph in non-decreasing order (from smallest to largest) based on their weights. Starting from an empty edge set, examine each edge in turn. If the two endpoints of the currently examined edge belong to different connected components in the current spanning forest, then this edge must be the "shortest cut edge" connecting these two components. Forcefully include it in the edge set until $N-1$ edges are collected.
- Disjoint Set Maintenance: The algorithm perfectly models the "identification and merging of connected components" using the
findandmergeoperations of the Disjoint Set Union (DSU). - Complexity Theorem: The time complexity bottleneck of the algorithm lies in edge weight sorting, strictly $O(M \log M)$. Due to its iterative nature focused on edge sets, it is extremely suitable for sparse graphs.
2. Vertex Set Topological Expansion Paradigm: Prim's Algorithm
The core physical essence of Prim's algorithm is the segmentation expansion of blue and white point sets, which is structurally highly isomorphic to Dijkstra's algorithm.
- Mathematical Principle: Partition the vertices in the graph into two sets: the set of matured points $S$ (blue points) that have been added to the spanning tree, and the set of floating points $T$ (white points) that have not been added. Each time, forcibly select a node $u$ from set $T$ that is closest to the current entire spanning tree (rather than the source point), and assign it to $S$, while using all outgoing edges from $u$ to refresh the topological shortest distances from the remaining white points to the entire set $S$.
- Complexity Theorem: By maintaining the minimum distance from white points to the tree with a priority queue (min-heap), the complexity of a single relaxation is $O(\log V)$, and the total time complexity is strictly controlled at $O((V+E)\log V)$. Because its complexity is independent of the logarithm of the number of edges $M$, it exhibits dominating performance in dense graphs ($M \approx V^2$).
Algorithm Derivation and State Design
1. Kruskal State Design
Define the Disjoint Set array fa[i].
- Edge Structure Encapsulation:
struct Edge {
int u, v;
long long w;
bool operator<(const Edge& rhs) const { return w < rhs.w; }
};
- State Transition (Disjoint Set Cycle Check): For the sorted edge $e_i$, let $root_u = \text{find}(e_i.u), \, root_v = \text{find}(e_i.v)$.
$$\text{if } (root_u \neq root_v) \quad \text{fa}[root_u] = root_v, \,\, \text{edge\_cnt} \leftarrow \text{edge\_cnt} + 1$$
2. Prim Heap Optimization State Design
Define $dist[i]$ as the topological shortest distance from white point $i$ to the currently generated MST edge set tree surface.
The state stored in the priority queue is a tuple $(d, u)$, where $u$ is the node index and $d$ is the current $dist[u]$.
The top element of the min-heap drives the state transition equation:
$$dist[v] = \min(dist[v], w(u, v)) \quad (\forall v \in T, \, (u, v) \in E)$$
Physical Meaning: When the blue point set expands to include $u$, the shortest distance from the floating point $v$ to the tree surface may be refreshed by the newly included boundary edge $w(u, v)$.
C++ Standard Source Code (Kruskal Industrial Template)
#include <iostream>
#include <vector>
#include <algorithm>
const int MAXN = 100005;
const int MAXM = 300005;
struct Edge {
int u, v;
long long w;
// Strictly sorted in non-decreasing order of edge weights
bool operator<(const Edge& rhs) const {
return w < rhs.w;
}
};
Edge edges[MAXM];
int fa[MAXN];
// Disjoint Set Core: Path Compression
int find_set(int i) {
if (fa[i] == i) return i;
return fa[i] = find_set(fa[i]); // Key pitfall: path compression must be performed, otherwise chain trees will degrade complexity
}
// Merge connected components in the Disjoint Set
bool union_set(int i, int j) {
int root_i = find_set(i);
int root_j = find_set(j);
if (root_i != root_j) {
fa[root_i] = root_j;
return true; // Successful merge, indicating previously disconnected, this edge is valid
}
return false; // Already connected, adding this edge will create a cycle, reject
}
int main() {
std::ios_base::sync_with_stdio(false);
std::cin.tie(NULL);
int n, m;
if (!(std::cin >> n >> m)) return 0;
for (int i = 0; i < m; ++i) {
std::cin >> edges[i].u >> edges[i].v >> edges[i].w;
}
// Core driver: quickly sort the entire edge set
std::sort(edges, edges + m);
// Initialize the base of the Disjoint Set
for (int i = 1; i <= n; ++i) fa[i] = i;
long long mst_weight = 0;
int edges_selected = 0;
for (int i = 0; i < m; ++i) {
if (union_set(edges[i].u, edges[i].v)) {
mst_weight += edges[i].w;
edges_selected++;
if (edges_selected == n - 1) break; // Early termination: already collected N-1 edges
}
}
// Topological connectivity check
if (edges_selected < n - 1) {
std::cout << "orz\n"; // The graph itself is not connected, unable to construct a spanning tree
} else {
std::cout << mst_weight << "\n";
}
return 0;
}
NOIP Practical Pitfall Guide
- Blindly using Kruskal for dense grid graphs without considering data characteristics: If the given graph is strictly dense, such as a complete graph or a large cardinality grid graph, satisfying $V=5000, M \approx 1.2 \times 10^7$. If Kruskal's algorithm is used, its sorting overhead $M \log M$ will cause memory bloat and CPU time to directly trigger TLE. At this time, it is essential to switch to the naive Prim or heap-optimized Prim algorithm with $O(V^2)$ complexity, as Prim's relaxation state does not depend on the global sorting of the number of edges.
- Omitting Disjoint Set initialization or failing to perform path compression in deep recursion:
In the
mainfunction of Kruskal, it is very easy to miss the initializationfor (int i = 1; i <= n; ++i) fa[i] = i;. If not initialized, allfawill remain 0, causingfind_setto instantly enter an infinite loop or memory overflow. Additionally, if the rolling assignmentfa[i] = find_set(fa[i])(path compression) is omitted infind_set, the disjoint set will degenerate into a long chain, making single merges degrade to $O(V)$, leading Kruskal's algorithm to degenerate to $O(MV)$ brute force, completely losing efficiency.
Classic NOIP/Luogu Problem Examples
1. Luogu P3366 [Template] Minimum Spanning Tree
- Problem Description:
Given a weighted graph containing $N$ vertices and $M$ undirected edges, find the sum of the weights of the minimum spanning tree of the graph. If the graph is not connected, output
orz. - Core Idea and Essence of the Problem: The most standard and pure minimum spanning tree template application. The data range belongs to a typical sparse graph. The core idea is to directly apply Kruskal's algorithm: encapsulate a structure to store triples, sort by weight in ascending order, and traverse each edge using a path-compressed disjoint set. Use a counter to accumulate the number of successful merges; if the final count equals $N-1$, output the accumulated weight, otherwise output the disconnected flag.
2. Luogu P1194 Buying Gifts
- Problem Description: To buy $B$ items, the price for each item individually is $A$. However, there are discount relationships between the items: if item $X$ has already been purchased, then buying item $Y$ will allow a discounted price of $W_{X,Y}$. Find the minimum cost required to purchase all items.
- Core Idea and Essence of the Problem: A typical variation of constructing a dense graph and finding the MST. Since the dependency relationship of "having to buy $X$ to buy $Y$ at a discounted price" and the minimization of costs essentially represents finding the minimum spanning tree in a network, hardcore graph construction strategy: introduce a virtual node 0 as the "market source node". Connect a directed edge with weight $A$ from node 0 to all items $1 \dots B$ (representing the original price purchase as the first item). If there is a discounted price $W_{X,Y}$ between items $X$ and $Y$ and $W_{X,Y} < A$, then connect an undirected edge with weight $W_{X,Y}$ between $X$ and $Y$. Run a standard MST algorithm on this new graph containing $B+1$ nodes. Since the discount relationships between items are often provided in a dense matrix form, either Prim or Kruskal can easily solve this problem. The total cost is the total weight of the MST.