We introduce COMET (Causal Object-centric Model for Efficient Tree search), a model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. COMET pairs a frozen unsupervised object-centric encoder with a transformer-based world model, in which actions are bound to objects through a novel action-slot fusion mechanism that is used in slot transition prediction.
Policy and value heads use object-causal attention, modulating token interactions by learned per-slot relevance scores so that decision-making concentrates on task-relevant entities. COMET adds an explicit object-level inductive bias to MuZero-style latent planning.
Across eight visually and dynamically diverse tasks from the Object-Centric Visual RL benchmark, COMET achieves a higher mean normalized score during the early stages of training compared to object-centric and monolithic baselines.
Blogger's Review: COMET significantly enhances planning efficiency in reinforcement learning by integrating causal object-centric models with transformer architecture. Its innovative action-slot fusion mechanism injects greater flexibility and specificity into the decision-making process, making its future applications in complex tasks highly promising.