Background
With the Coupled Model Intercomparison Project Phase 6 (CMIP6) generating thousands of peer-reviewed publications, documenting model configurations, evaluation procedures, emergent constraints, and projection uncertainties, the transition to CMIP7 presents a critical bottleneck in efficiently extracting and operationalizing this unstructured knowledge alongside live data analysis.
CMIP-Forge System
We present CMIP-Forge, a hybrid retrieval-augmented generation (RAG) and autonomous analysis system that bridges the gap between scientific literature and Earth System Grid Federation (ESGF) data archives. The system pairs a curated corpus of 6,581 CMIP6-related open-access publications (101,828 indexed chunks) with an agentic pipeline that plans and executes Python workflows over live climate data, while a panel of independent reviewer models audits its methodology end to end.
Multi-layer Defense Architecture
CMIP-Forge introduces a multi-layered Defense-in-Depth architecture that enforces physical and methodological invariants through executable mechanisms: Abstract Syntax Tree (AST) static analysis, audited scientific primitives, and an autonomous adversarial peer-review protocol. We demonstrate the system's capabilities through end-to-end autonomous research pipelines spanning atmospheric teleconnections, ocean dynamics, regional extremes, and global warming projections.
Autonomous Research Workflows
An agentic analysis system grounded in peer-reviewed literature, constrained by automated code guardrails, and audited by an independent adversarial review loop can autonomously complete complex climate research workflows. The same experiments expose concrete failure modes of the review loop (sycophantic regression, unresolved REVISE verdicts, and submission of stub code for review), each diagnosable from the immutable telemetry and provenance record released with the article.
Blogger's Review: CMIP-Forge signifies a vital leap in climate science research, merging real-time data analysis with literature retrieval to significantly enhance research efficiency. However, potential issues within the review mechanism remain, necessitating further optimization to ensure the quality and reliability of research outputs.