Pitch sequencing is a central topic in baseball analytics, yet previous studies have primarily focused on optimizing the final pitch within a single plate appearance, neglecting the role of preceding setup pitches and their impact on long-term season performance. To tackle these issues, this study conducted counterfactual analyses using MLB Statcast data. A Transformer-based machine-learning model was trained to predict whether a target pitch would result in an in-play outcome or swing-out.
Counterfactual pitch sequences were generated by replacing either the final pitch or the preceding setup pitch with alternative pitch types and locations while keeping surrounding contextual information fixed. Optimal counterfactual selections were defined as those that minimized the predicted in-play probability, and their expected effects on pitchers' seasonal statistics were estimated using regression models linking model outputs to season statistics. The results suggest that the optimization of both final and setup pitches may substantially influence season-level performance, with improvements of more than 1.0 in K/9.
The analyses also provided practical insights, including velocity-band-specific effective locations, the importance of pitch commands, and the expansion of pitch-selection options through middle-velocity pitches. These findings quantitatively support the strategic importance of pitch sequencing in baseball.
Blogger's Review: This study introduces counterfactual analysis to delve into the optimization of pitch sequences, showcasing the potential of data-driven decision-making in sports, particularly in enhancing pitcher performance. Effective pitching strategies not only improve statistical outputs but can also have a profound impact on game outcomes, warranting attention from coaches and analysts alike.