NeFut Logo NeFut
Admin Login

Efficient Algorithms for Generating Random Trees and Graphs

Published at: 2026-05-27 09:17 Last updated: 2026-06-06 13:04
#algorithm #Tree #Graph

Core Logic and Mathematical Principles

When testing large-scale graph theory and tree structure algorithms (such as network flow, shortest path, tree chain decomposition), traditional uniform random data generation is prone to two types of logical degeneration:

  1. Illegal contamination by multiple edges and self-loops: If we directly generate the edge set $E = \{(u, v) \mid u, v \in [1, N]$ uniformly at random, it is highly likely to introduce $u = v$ (self-loop) or duplicate edges. This can cause some unprotected graph algorithms (such as Dijkstra's algorithm when weights are not deduplicated, augmenting path counting, etc.) to enter infinite loops or experience exponential deviations in count values.
  2. Severe degeneration of tree structure shapes: If we generate trees by allowing node $i \ (i > 1)$ to randomly connect to $[1, i-1]$, the expected parent node tends to have a smaller index, leading the expected height of the tree to converge to $O(\log N)$. This overly balanced tree structure cannot test extreme cases that specifically challenge tree chain decomposition, stack depth, or long chain analysis (e.g., degenerating into a single chain with $O(N)$ complexity or a star-shaped flower tree).

To ensure the physical legality and variability of graph theory test data, we must introduce topological shuffling and Prüfer sequence mapping techniques.

According to Cayley's Formula, the number of spanning trees for a complete graph with $N$ labeled nodes is:

$$N^{N-2}$$

The Prüfer sequence establishes a bijection between a labeled tree with $N$ nodes and an integer sequence of length $N-2$ with a range of $[1, N]$. By uniformly and randomly constructing a random sequence of length $N-2$ within $[1, N]$, and then using linear time $O(N)$ or $O(N \log N)$ to reconstruct the edge set, we can mathematically generate any shape of tree uniformly from the total set $N^{N-2}$, with consistent probability measures.


Derivation of Tree and Graph Construction Mechanisms

1. Reverse Construction of Trees from the Prüfer Sequence

Let the randomly generated Prüfer sequence be $P = \{p_1, p_2, \dots, p_{n-2}\}$.

2. Efficient Deduplication and Shuffling in Graph Theory

When generating an undirected, unweighted simple graph, we must ensure that $|E| \le \frac{N(N-1)}{2}$ and that there are no self-loops or multiple edges.


C++ Standard Source Code

Considering the potential environmental dependencies or execution efficiency issues with Python on contest machines (for example, generating tree structures of size $10^6$ in Python can be extremely costly), we provide a high-performance, hardcore test data generator source code in C++ production environments, completely based on the C++11 <random> library, smart pointers, and Prüfer sequences.

#include <iostream>
#include <vector>
#include <algorithm>
#include <random>
#include <chrono>
#include <set>

using std::cin;
using std::cout;
using std::vector;
using std::pair;

// High-performance hardware-level random number engine encapsulation
std::mt19937_64 rng(std::chrono::steady_clock::now().time_since_epoch().count());

// Generate a random integer in the range [l, r]
long long randint(long long l, long long r) {
    std::uniform_int_distribution<long long> dist(l, r);
    return dist(rng);
}

// 1. Generate absolutely uniformly random tree structures based on Prüfer sequence
void generate_random_tree(int n) {
    if (n <= 1) return;

    vector<int> prufer(n - 2);
    vector<int> degree(n + 1, 1); // Initial degrees are all 1 (including implicit contribution from one edge)

    for (int i = 0; i < n - 2; ++i) {
        prufer[i] = randint(1, n);
        degree[prufer[i]]++;
    }

    // Pointer p points to the current node with degree 1 that has the smallest index
    int p = 1;
    while (degree[p] != 1) p++;

    int leaf = p;
    vector<pair<int, int>> edges;

    for (int i = 0; i < n - 2; ++i) {
        int v = prufer[i];
        edges.push_back({leaf, v});

        degree[leaf]--;
        degree[v]--;

        if (degree[v] == 1 && v < p) {
            leaf = v; // Optimization: if the current parent's degree decreases to 1 and has a smaller index, take it as the next leaf
        } else {
            p++;
            while (p <= n && degree[p] != 1) p++;
            leaf = p;
        }
    }

    // Connect the last two nodes with degree 1
    int u = -1, v = -1;
    for (int i = 1; i <= n; ++i) {
        if (degree[i] == 1) {
            if (u == -1) u = i;
            else { v = i; break; }
        }
    }
    if (u != -1 && v != -1) edges.push_back({u, v});

    // Node index shuffling mapping to break the special degree pattern of node 1
    vector<int> mapping(n + 1);
    for (int i = 1; i <= n; ++i) mapping[i] = i;
    std::shuffle(mapping.begin() + 1, mapping.end(), rng);

    // Shuffle the output order of the edge set
    std::shuffle(edges.begin(), edges.end(), rng);

    cout << n << "\n";
    for (const auto& edge : edges) {
        // Randomly reverse the order of the edge endpoints
        if (randint(0, 1)) {
            cout << mapping[edge.first] << " " << mapping[edge.second] << "\n";
        } else {
            cout << mapping[edge.second] << " " << mapping[edge.first] << "\n";
        }
    }
}

// 2. Generate a simple connected graph without self-loops and multiple edges
void generate_simple_graph(int n, int m) {
    // Critical pitfall: the number of edges m must meet the minimum requirements for a connected graph and not exceed the upper limit of a complete graph
    if (m < n - 1 || m > 1LL * n * (n - 1) / 2) return;

    std::set<pair<int, int>> edge_set;
    vector<pair<int, int>> final_edges;

    // Strategy: first use a tree structure to ensure absolute connectivity of the graph, preventing isolated points from causing the shortest path algorithm to produce Inf
    for (int i = 2; i <= n; ++i) {
        int fa = randint(1, i - 1); // Quickly generate a tree
        int u = fa, v = i;
        if (u > v) std::swap(u, v);
        edge_set.insert({u, v});
        final_edges.push_back({u, v});
    }

    // Fill in the remaining m - (n - 1) edges
    int remaining = m - (n - 1);
    while (remaining > 0) {
        int u = randint(1, n);
        int v = randint(1, n);
        if (u == v) continue; // Forcefully intercept self-loops

        if (u > v) std::swap(u, v);
        if (edge_set.find({u, v}) == edge_set.end()) {
            edge_set.insert({u, v});
            final_edges.push_back({u, v});
            remaining--;
        }
    }

    // Shuffle and scatter
    std::shuffle(final_edges.begin(), final_edges.end(), rng);

    cout << n << " " << m << "\n";
    for (const auto& edge : final_edges) {
        // Example of outputting random edge weights
        long long weight = randint(1, 100000);
        if (randint(0, 1)) cout << edge.first << " " << edge.second << " " << weight << "\n";
        else cout << edge.second << " " << edge.first << " " << weight << "\n";
    }
}

int main() {
    std::ios::sync_with_stdio(false);
    cin.tie(nullptr);

    // Here you can modify parameters to generate specific types of data
    // Example: generate a random tree with 10 nodes
    generate_random_tree(10);

    // Example: generate a random graph with 10 nodes and 15 edges
    // generate_simple_graph(10, 15);

    return 0;
}

NOIP Practical Pitfall Guide

1. Overflow of Edge Count Boundary in Graph Theory

Also, be sure to cast the multiplication operation to 1LL, otherwise $N \times (N-1)$ will trigger int32 overflow and turn negative when $N \ge 5 \times 10^4$, causing the check to fail.

2. Collapse of Pseudo-Random Number Engine Period (Abuse of rand())

Original Source: local://23.5

[h] Back to Home