Efficient Algorithms for Generating Random Trees and Graphs

Core Logic and Mathematical Principles

When testing large-scale graph theory and tree structure algorithms (such as network flow, shortest path, tree chain decomposition), traditional uniform random data generation is prone to two types of logical degeneration:

Illegal contamination by multiple edges and self-loops: If we directly generate the edge set $E = \{(u, v) \mid u, v \in [1, N]\}$ uniformly at random, it is highly likely to introduce $u = v$ (self-loop) or duplicate edges. This can cause some unprotected graph algorithms (such as Dijkstra's algorithm when weights are not deduplicated, augmenting path counting, etc.) to enter infinite loops or experience exponential deviations in count values.
Severe degeneration of tree structure shapes: If we generate trees by allowing node $i \ (i > 1)$ to randomly connect to $[1, i-1]$, the expected parent node tends to have a smaller index, leading the expected height of the tree to converge to $O(\log N)$. This overly balanced tree structure cannot test extreme cases that specifically challenge tree chain decomposition, stack depth, or long chain analysis (e.g., degenerating into a single chain with $O(N)$ complexity or a star-shaped flower tree).

To ensure the physical legality and variability of graph theory test data, we must introduce topological shuffling and Prüfer sequence mapping techniques.

According to Cayley's Formula, the number of spanning trees for a complete graph with $N$ labeled nodes is:

$$N^{N-2}$$

The Prüfer sequence establishes a bijection between a labeled tree with $N$ nodes and an integer sequence of length $N-2$ with a range of $[1, N]$. By uniformly and randomly constructing a random sequence of length $N-2$ within $[1, N]$, and then using linear time $O(N)$ or $O(N \log N)$ to reconstruct the edge set, we can mathematically generate any shape of tree uniformly from the total set $N^{N-2}$, with consistent probability measures.

Derivation of Tree and Graph Construction Mechanisms

1. Reverse Construction of Trees from the Prüfer Sequence

Let the randomly generated Prüfer sequence be $P = \{p_1, p_2, \dots, p_{n-2}\}$.

Degree Derivation: The degree of node $i$ in the tree is $d_i = \text{count}(P, i) + 1$.
Edge Connection Logic: Each time we select the current node $u$ with degree $1$ and the smallest index, we connect it to the current first element of the sequence $p_j$. Then we decrease $d_u$ by 1 (removing it from the set) and decrease $d_{p_j}$ by 1. This process is repeated until the sequence is exhausted, and the last two nodes with degree $1$ are directly connected. This algorithm guarantees the randomness and legality of the generated tree.

2. Efficient Deduplication and Shuffling in Graph Theory

When generating an undirected, unweighted simple graph, we must ensure that $|E| \le \frac{N(N-1)}{2}$ and that there are no self-loops or multiple edges.

High-Density Graphs: We directly use std::set<std::pair<int, int>> to maintain the generated edge set, with a time complexity of $O(M \log M)$.
Shuffling Node and Edge Indices: If edges or trees are generated in order, node 1 often becomes the root, or the order of edges has limitations, which can be bypassed by some "data-oriented programming" code through special cases or greedy approaches. We must use Knuth-Shuffle (shuffling algorithm) to spatially shuffle the node indices and the final order of the edge set, with a mathematical expectation of a full permutation $N!$.

C++ Standard Source Code

Considering the potential environmental dependencies or execution efficiency issues with Python on contest machines (for example, generating tree structures of size $10^6$ in Python can be extremely costly), we provide a high-performance, hardcore test data generator source code in C++ production environments, completely based on the C++11 <random> library, smart pointers, and Prüfer sequences.

#include <iostream>
#include <vector>
#include <algorithm>
#include <random>
#include <chrono>
#include <set>

using std::cin;
using std::cout;
using std::vector;
using std::pair;

// High-performance hardware-level random number engine encapsulation
std::mt19937_64 rng(std::chrono::steady_clock::now().time_since_epoch().count());

// Generate a random integer in the range [l, r]
long long randint(long long l, long long r) {
    std::uniform_int_distribution<long long> dist(l, r);
    return dist(rng);
}

// 1. Generate absolutely uniformly random tree structures based on Prüfer sequence
void generate_random_tree(int n) {
    if (n <= 1) return;

    vector<int> prufer(n - 2);
    vector<int> degree(n + 1, 1); // Initial degrees are all 1 (including implicit contribution from one edge)

    for (int i = 0; i < n - 2; ++i) {
        prufer[i] = randint(1, n);
        degree[prufer[i]]++;
    }

    // Pointer p points to the current node with degree 1 that has the smallest index
    int p = 1;
    while (degree[p] != 1) p++;

    int leaf = p;
    vector<pair<int, int>> edges;

    for (int i = 0; i < n - 2; ++i) {
        int v = prufer[i];
        edges.push_back({leaf, v});

        degree[leaf]--;
        degree[v]--;

        if (degree[v] == 1 && v < p) {
            leaf = v; // Optimization: if the current parent's degree decreases to 1 and has a smaller index, take it as the next leaf
        } else {
            p++;
            while (p <= n && degree[p] != 1) p++;
            leaf = p;
        }
    }

    // Connect the last two nodes with degree 1
    int u = -1, v = -1;
    for (int i = 1; i <= n; ++i) {
        if (degree[i] == 1) {
            if (u == -1) u = i;
            else { v = i; break; }
        }
    }
    if (u != -1 && v != -1) edges.push_back({u, v});

    // Node index shuffling mapping to break the special degree pattern of node 1
    vector<int> mapping(n + 1);
    for (int i = 1; i <= n; ++i) mapping[i] = i;
    std::shuffle(mapping.begin() + 1, mapping.end(), rng);

    // Shuffle the output order of the edge set
    std::shuffle(edges.begin(), edges.end(), rng);

    cout << n << "\n";
    for (const auto& edge : edges) {
        // Randomly reverse the order of the edge endpoints
        if (randint(0, 1)) {
            cout << mapping[edge.first] << " " << mapping[edge.second] << "\n";
        } else {
            cout << mapping[edge.second] << " " << mapping[edge.first] << "\n";
        }
    }
}

// 2. Generate a simple connected graph without self-loops and multiple edges
void generate_simple_graph(int n, int m) {
    // Critical pitfall: the number of edges m must meet the minimum requirements for a connected graph and not exceed the upper limit of a complete graph
    if (m < n - 1 || m > 1LL * n * (n - 1) / 2) return;

    std::set<pair<int, int>> edge_set;
    vector<pair<int, int>> final_edges;

    // Strategy: first use a tree structure to ensure absolute connectivity of the graph, preventing isolated points from causing the shortest path algorithm to produce Inf
    for (int i = 2; i <= n; ++i) {
        int fa = randint(1, i - 1); // Quickly generate a tree
        int u = fa, v = i;
        if (u > v) std::swap(u, v);
        edge_set.insert({u, v});
        final_edges.push_back({u, v});
    }

    // Fill in the remaining m - (n - 1) edges
    int remaining = m - (n - 1);
    while (remaining > 0) {
        int u = randint(1, n);
        int v = randint(1, n);
        if (u == v) continue; // Forcefully intercept self-loops

        if (u > v) std::swap(u, v);
        if (edge_set.find({u, v}) == edge_set.end()) {
            edge_set.insert({u, v});
            final_edges.push_back({u, v});
            remaining--;
        }
    }

    // Shuffle and scatter
    std::shuffle(final_edges.begin(), final_edges.end(), rng);

    cout << n << " " << m << "\n";
    for (const auto& edge : final_edges) {
        // Example of outputting random edge weights
        long long weight = randint(1, 100000);
        if (randint(0, 1)) cout << edge.first << " " << edge.second << " " << weight << "\n";
        else cout << edge.second << " " << edge.first << " " << weight << "\n";
    }
}

int main() {
    std::ios::sync_with_stdio(false);
    cin.tie(nullptr);

    // Here you can modify parameters to generate specific types of data
    // Example: generate a random tree with 10 nodes
    generate_random_tree(10);

    // Example: generate a random graph with 10 nodes and 15 edges
    // generate_simple_graph(10, 15);

    return 0;
}

NOIP 实战避坑指南

1. 图论边数边界上界溢出

低级错误表现：选手在生成稠密图时，输入的参数超过了完全图的最大容量，例如设定 $N = 10^5, M = 10^6$，但在循环查重时，由于 edge_set.insert 无法突破实际物理上限（当前规模下，小图上限可能被塞满），导致 while (remaining > 0) 陷入永久死循环，对拍机直接卡死挂起。
避坑手段：在数据生成器的入口处必须进行严格的防御性断言校验。利用数学极限公式进行拦截：
```
if (m > 1LL * n * (n - 1) / 2) {
  std::cerr << "M exceeds the upper bound of simple graph!" << std::endl;
  exit(1); 
}
```

同时注意乘法操作必须强转 1LL，否则 $N \times (N-1)$ 会在 $N \ge 5 \times 10^4$ 时直接触发 int32 符号溢出，变成一个负数，从而导致校验失效。

2. 伪随机数引擎的周期坍塌（`rand()` 的滥用）

低级错误表现：使用 C 标准库老旧的 rand() 函数生成大图的节点编号。在 Linux 环境下，rand() 的最大返回值由 RAND_MAX 决定（通常仅为 $32767$）。当试图生成 $N = 10^5$ 规模的数据时，rand() % n 会导致生成的节点编号永远局限在 $[0, 32767]$ 之间。剩余的 $[32768, 100000]$ 节点将变成彻头彻尾的孤立点，导致图论的压力测试完全失效。
避坑手段：在考场环境下彻底禁用 rand()。 应当全面倒向 C++11 标准的 std::mt19937 或 std::mt19937_64 伪随机数发生器。它是基于梅森旋转算法（Mersenne Twister）实现的，拥有高达 $2^{19937}-1$ 的超长周期，且返回值原生支持 uint32 或 uint64 满位域，能完美覆盖 NOIP 级别所有的整型边界。