Abstract
Robot learning and embodied agents now require simulation to serve as a shared execution substrate linking control, skills, and planning, not merely as a renderer, controller testbed, or fixed task environment. Existing pipelines split these layers with "magic" actions, disconnected training environments, or forward-only renders that cannot reproduce, evaluate, and annotate the same episode.
We present MagicSim, an embodied interaction infrastructure built around one deterministic batched runtime and a shared Markov decision process (MDP). From YAML-first specifications that decouple contents, placement, behavior, and agent exposure, MagicSim constructs diverse executable worlds spanning task families, interaction regimes, physics, layouts, sensors, avatars, and robot embodiments in one reset-and-step loop.
A common execution interface grounds high-level commands through controllers, atomic skills, planner primitives, and asynchronous planning, realizing them as robot actions rather than simulator-side state edits. One task definition supports three capabilities: benchmark and RL evaluation, an autocollect interface that automatically turns commands into grounded trajectories, and agent/VLM-facing interaction.
For automatic execution, commands flow through a Command-Skill-Planner-Robot-Record pipeline, while per-environment command, skill, planning, retry, annotation, and episode states advance independently above the shared physics tick. Successful rollouts are saved as structured multimodal trajectories aligning language supervision, action representations, visual/geometric representations, and task-level status with the executed episode. MagicSim thus unifies diverse world construction, embodied execution, task evaluation, automatic rollout generation, and interactive agent interfaces in one planner-in-the-loop runtime.
Blogger's Review: MagicSim significantly enhances the efficiency of robot learning by integrating multiple complex functionalities into a unified framework. This approach not only simplifies the environment construction process but also provides robust evaluation and interaction capabilities, showcasing the potential of future robotic systems. Its YAML-based specification design lays a solid foundation for flexibility and scalability.