Gamma-World presents a generative multi-agent world model for interactive video simulation that extends beyond single-agent or two-player settings. It introduces Simplex Rotary Agent Encoding, a parameter-free extension of 3D RoPE that represents agents as vertices of a regular simplex in rotary angle space, enabling permutation-symmetric agent identities without learned per-slot embeddings. Sparse Hub Attention reduces cross-agent attention from quadratic to linear complexity. A full-context diffusion teacher is distilled into a causal student with KV caching for real-time 24 FPS action-responsive generation. The model generalizes from two to four players without retraining.
- First generative multi-agent world model supporting scalable agent counts (beyond two players) with principled permutation symmetry
Architecture: Gamma-World builds on video diffusion transformer architecture extended for multi-agent control.
Simplex Rotary Agent Encoding: Each agent is assigned a unique phase in a rotary embedding space structured as a regular simplex (3 vertices in 3D, 4 in 4D, etc.). This ensures:
Sparse Hub Attention: Instead of all-to-all cross-agent attention (quadratic in agent count), learnable hub tokens mediate interactions. Each agent attends to/from hub tokens, reducing complexity to O(n) in number of agents.
Distillation for Real-Time: A full-context diffusion teacher (processes entire video at once) is distilled into a causal student that generates temporal blocks sequentially with KV caching, enabling action-responsive generation at 24 FPS.
Training: The model is trained in multiplayer virtual environments with ground-truth state information. Agents remain independently controllable and permutation-symmetric.
- Currently demonstrated in multiplayer virtual environments; real-world embodied agents not yet tested
Multi-agent world modeling is an important frontier beyond single-agent video prediction. The parameter-free simplex encoding is an elegant solution to the permutation symmetry problem that could inspire other world model architectures. The linear-scaling attention mechanism is critical for practical deployment. This connects to Voyager/NVIDIA world model work and generalizes single-agent world models (like Sora/VDM) to multi-agent settings.
---