Deep Dive into Expectation Dynamic Programming: Algorithm...

Core Logic and Mathematical Principles

Expectation Dynamic Programming (Expectation DP) is a core algorithmic tool for dealing with problems characterized by "randomness in decision paths" or "uncertainty in subsequent states." Its primary distinction from conventional dynamic programming lies in the fact that state transitions are not deterministic choices (Max/Min), but rather weighted averages (summation reduction) of the costs of all possible subsequent states.

1. Transition Mechanism of Expectation DP

Let the current state be $u$, which has several discrete random transition paths leading to the set of subsequent states $to(u)$. For each subsequent state $v \in to(u)$, the probability of transition is $p(u \to v)$, and the cost of the transition itself (such as steps, expenses, weights) is $w(u \to v)$.

According to the definition of mathematical expectation and the law of total expectation, the expected total cost $f[u]$ from state $u$ to the endpoint satisfies the following algebraic equation:

$$f[u] = \sum_{v \in to(u)} p(u \to v) \cdot \Big( f[v] + w(u \to v) \Big)$$

Using the property of probability normalization $\sum p(u \to v) = 1$, this formula is often further simplified into a classic form that is easier for code implementation:

$$f[u] = \left( \sum_{v \in to(u)} p(u \to v) \cdot f[v] \right) + \sum_{v \in to(u)} p(u \to v) \cdot w(u \to v)$$

2. Why is Expectation DP Mostly Adopted in "Backward Push (Reverse Substitution)?"

In conventional DP, we are accustomed to "forward push" (pushing from the starting point to the endpoint). However, in Expectation DP, forward pushing often leads to logical collapse.

Dilemma of Forward Push: If we define $f[u]$ as the expected cost of reaching state $u$ from the starting point, then in calculating $f[u]$, we need to know not only the expectations of predecessor states but also the absolute probability of reaching the current state $u$. This necessitates maintaining a probability DP, which easily leads to high coupling of states.
Breakthrough of Backward Push: If we define $f[u]$ as the expected total cost of reaching the endpoint from state $u$.
Clearly Defined Endpoint State: The cost from the endpoint $T$ to itself is evidently $0$, i.e., the boundary condition $f[T] = 0$ holds true.
Completely Detached from Probability Coupling: Since $f[u]$ describes the "future expectation," it is entirely unrelated to "how likely it was to reach state $u$ in the past." This successfully achieves state decoupling.

State Design and Algorithm Derivation

1. Backward Push on a DAG

If the state transition dependencies form a Directed Acyclic Graph (DAG), we can directly use reverse topological sorting (or memoized search) to fill the table starting from the endpoint $T$.

State Equation:

$$f[u] = \sum_{(u, v) \in E} p(u \to v) \cdot (f[v] + w(u \to v))$$

2. Expectation DP with "Self-Loops/Cycles" and Gaussian Elimination

Many expectation problems exhibit characteristics of "staying in place" or "state cycles" (for example: if the dice roll is incorrect, one stays in place, or one randomly walks on a graph). In this case, the state transition equation will exhibit cyclic dependencies, meaning that the algebraic expression for $f[u]$ contains $f[u]$ itself or other mutually dependent unknowns, making it impossible to fill the table in a straightforward manner.

Breakthrough Strategies:

Simplification of Single-Point Self-Loops: If there is only a transition from the current state to itself (with probability $p_{\text{self}}$), perform algebraic substitution directly.

$$f[u] = p_{\text{self}} \cdot (f[u] + w_{\text{self}}) + \sum_{v \neq u} p(u \to v) \cdot (f[v] + w(u \to v))$$

Move all terms containing $f[u]$ to the left side, extract the coefficient $(1 - p_{\text{self}})$, and divide to eliminate the self-loop obstruction. 2. Strongly Connected Component Loops (Full Graph Walk): If states form complex network loops, abandon recursion and treat each state $f[u]$ as an unknown $x_u$. List $N$ linear equations based on the transition topological relationships, construct a matrix, and directly use Gaussian elimination (Gaussian Elimination) to solve it violently within $O(N^3)$ complexity.

C++ Standard Source Code

Core Implementation: Memoized Search for Backward Expectation DP

For typical expectation states with a divide-and-conquer structure, using memoized search can automatically complete the reverse topological flow.

#include <iostream>
#include <vector>
#include <iomanip>

struct Edge {
    int to;
    double prob; // Transition probability
    double weight; // Transition cost
};

const int MAXN = 100000;
std::vector<Edge> graph[MAXN + 5];
double f[MAXN + 5];
bool visited[MAXN + 5];
int target_node;

// Memoized search: calculate the expected total cost from u to target_node
double get_expectation(int u) {
    if (u == target_node) return 0.0; // Endpoint boundary: expected cost to itself is 0
    if (visited[u]) return f[u];      // Intercept already computed states

    visited[u] = true;
    double total_exp = 0.0;

    for (const auto &edge : graph[u]) {
        int v = edge.to;
        // Core backward push formula: p * (f[v] + weight)
        total_exp += edge.prob * (get_expectation(v) + edge.weight);
    }

    f[u] = total_exp;
    return f[u];
}

int main() {
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(NULL);

    int n, m, start_node;
    if (std::cin >> n >> m >> start_node >> target_node) {
        for (int i = 0; i < m; ++i) {
            int u, v;
            double p, w;
            std::cin >> u >> v >> p >> w;
            graph[u].push_back({v, p, w});
        }

        double ans = get_expectation(start_node);
        std::cout << std::fixed << std::setprecision(6) << ans << "\n";
    }
    return 0;
}

NOIP Practical Pitfall Guide

Incorrectly initializing the endpoint state to a non-zero value or forward assigning initial values Some contestants are accustomed to forward thinking, initializing the starting point f[S] = 0 and then attempting to update backwards. This can lead to computed values diverging completely from the expected definition due to the inability to predict the global convergence value of subsequent nodes during branch calculations. It is essential to remember: the natural deterministic boundary of Expectation DP is at the endpoint, f[T] = 0 is the starting engine of recursion.
Failure to handle unreachable states during Gaussian elimination leading to free variable collapse When dealing with graph theory random walk problems that require Gaussian elimination, there may be deadlock nodes that are impossible to reach the endpoint due to unidirectional connectivity or isolated nodes that cannot be reached from the starting point. The unknown equations corresponding to these nodes may cause the matrix determinant to be $0$ (resulting in multiple or no solutions). Before sending the equations to Gaussian elimination, a BFS/DFS must first be run to determine the forward and backward connectivity, removing invalid states from the equation set.

Classic NOIP/Luogu Problems

1. Luogu P4550 Collecting Stamps

Problem Description: There are $n$ different types of stamps, and one can buy one of them at a time. It is known that the probability of buying each type of stamp is $\frac{1}{n}$. The $r$-th purchase costs $r$ yuan. Find the expected cost to collect all $n$ types of stamps.
Essence of the Problem: Dual state coupling backward Expectation DP.
Core Solution Idea: Since the purchase cost is strongly tied to the number of purchases, we cannot just maintain the expected costs. Using divide-and-conquer, define $f[i]$ as the expected number of purchases needed to collect the remaining stamps after already collecting $i$ types; define $g[i]$ as the expected cost to collect the remaining stamps after already collecting $i$ types. By considering whether the next purchase is a duplicate (probability $\frac{i}{n}$) or a new one (probability $\frac{n-i}{n}$), we can establish the backward equation:

$$f[i] = \frac{i}{n} f[i] + \frac{n-i}{n} f[i+1] + 1 \implies f[i] = f[i+1] + \frac{n}{n-i}$$

Using $f[i]$ as the algebraic coefficient to drive the state transitions of $g[i]$, we can push backward from $n$ to $0$, with $g[0]$ being the final solution.

2. Luogu P1850 [NOIP2016 Advanced Group] Change Classroom

Problem Description: There are $n$ classes, each initially in classroom $c_i$, which can be applied to change to $d_i$, with a success probability of $k_i$. A maximum of $m$ applications can be made. Find the minimum expected total distance traveled to complete all $n$ classes under the optimal application strategy.
Essence of the Problem: Multi-dimensional state decision Expectation DP.
Core Solution Idea: This problem organically integrates control flow decisions (optimally choosing which classes to apply for) with expectations. Define the state $f[i][j][0/1]$ as the minimum expected total distance to complete all subsequent courses while currently arranging the $i$-th class, having used $j$ application opportunities, and having 0 (not applied) / 1 (applied) for the $i$-th class. During transitions, since the actual classroom destinations for the $i$-th and $(i+1)$-th classes have $2 \times 2 = 4$ possible probability combinations, we need to use the shortest path distance between the two time-space points (preprocessed by Floyd) and the probabilities $k_i, k_{i+1}$ to perform full expectation combinations, taking std::min for "apply" or "not apply." This is not only a classic expectation backward push but also one of the hardest Expectation DP problems in NOIP history.

Deep Dive into Expectation Dynamic Programming: Algorithm Principles and Implementation