PiL-World is a chunk-wise world model designed for policy-in-the-loop VLA evaluation. Unlike existing world models limited to open-loop prediction, PiL-World generates multi-view future observations conditioned on both the current observation and the action trajectory rolled out by a VLA policy. By alternating between VLA inference and world-model prediction, it enables closed-loop evaluation without real robot execution at every step.
- Closed-loop VLA evaluation: First world model to support closed-loop VLA evaluation where each action chunk is conditioned on the observation generated by previous execution
PiL-World takes two inputs at each step:
The model generates multi-view future observations that are:
Key architectural elements:
Evaluated on three real dual-arm manipulation tasks:
PiL-World generates imagined rollouts highly consistent with real robot executions, dramatically reducing the gap between real-world and simulated evaluation of VLA policies.
The paper focuses on manipulation tasks with dual-arm robots; applicability to navigation or other embodied domains is not explored. Future work could extend to more complex task hierarchies and longer-horizon evaluations.
PiL-World addresses a critical gap in world model evaluation: most world models are evaluated open-loop, but real VLA deployment is closed-loop. This work provides a methodology for evaluating world models as VLA evaluation tools, not just as video generators. The 51 percentage point reduction in success rate estimation error demonstrates that world models can serve as effective VLA evaluation proxies when properly designed for closed-loop conditioning.