NeFut Logo NeFut
Admin Login

Maximizing C++ I/O Performance with Advanced Fast Read and Write Techniques

Published at: 2026-05-27 09:17 Last updated: 2026-06-06 13:04
#C++ #Tutorial

Core Logic and Mathematical Principles

In algorithm competitions such as NOIP, when the data size reaches $N, M \ge 10^6$, traditional input/output (I/O) often becomes a performance bottleneck for programs. Without optimization, the time consumed by I/O operations can even exceed the running time of the algorithm's core logic, leading to a TLE (Time Limit Exceeded) due to excessive constants.

The default input/output streams std::cin and std::cout in C++ maintain an explicit synchronization mechanism at the low level to ensure compatibility with the C standard input/output library (scanf / printf). This means that during each I/O operation, the C++ stream forcibly flushes the corresponding C buffer, introducing significant system-level context switch overhead. Additionally, std::cin is by default in a tied state with std::cout, meaning that before reading any data from the input stream, the output stream's buffer is automatically flushed. The time complexity overhead is severely amplified at the constant level.

By executing:

std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);

we can forcibly break the synchronization and binding. The underlying mathematical logic eliminates the constant-level factorial factor of high-frequency buffer flushing, making the I/O efficiency of C++ streams approach or even exceed that of scanf / printf.

However, to achieve extreme performance, we must bypass C++'s stream parsing layer and directly manipulate the underlying standard file descriptor input buffer. This requires introducing more hardcore system-level functions fread and fwrite.

The mathematical essence of fread is blocked I/O. Let the total size of the test data file be $S$ bytes, the time overhead of a single system call from the disk be $T_{\text{sys}}$, and the time overhead of a single memory character scan be $T_{\text{mem}}$ (where $T_{\text{sys}} \gg T_{\text{mem}}$). If we use normal character-by-character reading, the total time overhead is:

$$T_{\text{normal}} = S \times T_{\text{sys}} + S \times T_{\text{mem}}$$

While using fread to allocate a local high-efficiency memory buffer of size $B$ (usually set to $1 \le B \le 1 \le 22$ bytes, such as $1 << 20$ bytes, i.e., 1MB):

$$T_{\text{fread}} = \left\lceil \frac{S}{B} \right\rceil \times T_{\text{sys}} + S \times T_{\text{mem}}$$

Since $\left\lceil \frac{S}{B} \right\rceil \ll S$, the number of system calls is reduced by several orders of magnitude, and the constant overhead of I/O successfully converges to the extreme $\Omega(1)$ basic memory movement overhead.


Fast Read and Write State Machine Derivation

Fast reading (Fast Read) is essentially a deterministic finite state automaton (DFA). For a signed integer, its character sequence consists of an optional negative sign - and a series of consecutive digit characters 0-9.

1. State Transition Logic

If any non-digit character is encountered, the state machine terminates and returns the result $f \times x$.

2. Bitwise Operation Optimization for Shifts

In state $S_1$, the operation of multiplying by $10$ can be accelerated at the hardware level using bitwise operations. Since $10 = 8 + 2 = 2^3 + 2^1$, we have:

$$x \times 10 = (x \ll 3) + (x \ll 1)$$

This directly bypasses the relatively inefficient multiplier instruction within the CPU, further reducing the constant.


C++ Standard Source Code

The following is a complete source code for a highly efficient general fast read and write implementation (including handling negative numbers, __int128 extensions, and fwrite high-speed output buffering) suitable for NOIP/Linux production environments.

#include <iostream>
#include <vector>
#include <algorithm>

// Extreme fast read memory buffer size definition: 1 << 20 bytes, i.e., 1MB
const int IO_BUFF_SIZE = 1 << 20;

class FastIO {
private:
    char in_buf[IO_BUFF_SIZE];
    int in_p1 = 0, in_p2 = 0;

    char out_buf[IO_BUFF_SIZE];
    int out_p = 0;

    // Internal low-level character read, using fread for batch filling
    inline char gc() {
        if (in_p1 == in_p2) {
            in_p2 = (in_p1 = 0) + fread(in_buf, 1, IO_BUFF_SIZE, stdin);
            if (in_p1 == in_p2) return EOF; // End of file
        }
        return in_buf[in_p1++];
    }

public:
    // Must forcibly flush the output buffer upon destruction to prevent data residue causing no output WA
    ~FastIO() {
        flush();
    }

    inline void flush() {
        if (out_p > 0) {
            fwrite(out_buf, 1, out_p, stdout);
            out_p = 0;
        }
    }

    // Internal low-level character output, using fwrite for automatic flush when full
    inline void pc(char c) {
        if (out_p == IO_BUFF_SIZE) {
            flush();
        }
        out_buf[out_p++] = c;
    }

    // Extreme fast read template function, supporting all scalar integers and __int128
    template <typename T>
    inline void read(T &x) {
        x = 0;
        T f = 1;
        char c = gc();
        // State S0: Skip whitespace and illegal characters
        while (c < '0' || c > '9') {
            if (c == '-') f = -1;
            c = gc();
        }
        // State S1: Shift and accumulate digits
        while (c >= '0' && c <= '9') {
            // Fatal pitfall: when optimizing multiplication with bitwise operations, outer parentheses must be rigorous, otherwise calculation errors may occur due to precedence issues
            x = (x << 3) + (x << 1) + (c ^ 48); // c ^ 48 is equivalent to c - '0'
            c = gc();
        }
        x *= f;
    }

    // Extreme fast write template function
    template <typename T>
    inline void write(T x) {
        if (x < 0) {
            pc('-');
            x = -x;
        }
        static char stack[40]; // Stack space for temporarily storing reversed digits
        int top = 0;
        do {
            stack[top++] = (x % 10) ^ 48;
            x /= 10;
        } while (x);
        while (top > 0) {
            pc(stack[--top]);
        }
    }

    inline void write_char(char c) {
        pc(c);
    }

    inline void write_string(const char* s) {
        while (*s) pc(*s++);
    }
} io;

int main() {
    // Fatal pitfall: once using custom FastIO based on fread/fwrite,
    // it is strictly forbidden to mix std::cin, std::cout, scanf, printf in subsequent code, otherwise input stream preemption will cause data confusion!

    int n;
    io.read(n);

    long long sum = 0;
    for (int i = 0; i < n; ++i) {
        long long val;
        io.read(val);
        sum += val;
    }

    io.write_string("Total Sum: ");
    io.write(sum);
    io.write_char('\n');

    return 0; // Upon program exit, the destructor of the io object will automatically trigger the final flush() to write to disk
}

NOIP Practical Pitfall Guide

1. Mixing Custom fread Fast Read with Standard Input Functions Causes Data Gaps

2. Fast Write fwrite Buffer Not Manually Flushed Leads to Output Disappearing


Classic NOIP/Luogu Problems

1. Luogu P3865 [Template] ST Table

2. Luogu P1177 [Template] Quick Sort

Original Source: local://24.1

[h] Back to Home