title: "PRISM: PRior-guided Imagination Sampling in World Models"

arxiv_id: "2606.07974"

date: "2026-06-06"

tags: [world-model, jepa, model-based-rl, mpc, planning, continuous-control]

---

Abstract

PRISM tackles model-based continuous control with a learned latent (JEPA-style) world model and argues that the bottleneck is not the simulator's fidelity but which candidate actions the planner evaluates. It re-uses the world model's frozen encoder to predict a state-conditioned Gaussian action prior, then fuses that prior into the planner's sampling distribution through a parameter-free, precision-weighted Product-of-Gaussians update — no extra VLM, no extra visual encoder.

Key Contributions

Method Details

The architecture is a standard JEPA-style latent world model — a vision encoder that maps observations to latents, and a latent dynamics predictor that rolls out future latent states conditioned on actions. On top of this:

  1. Frozen encoder + prior MLP: The encoder weights are frozen; a small MLP is trained to read the encoder's representation of the current state and emit the parameters (μ, σ) of a state-conditioned Gaussian over candidate actions. Training uses the same dataset as the world model — expert demonstrations are not used as demonstrations but as labels for action supervision.
  2. Planner sampling distribution: A model-predictive-control (MPC) planner samples candidate action sequences from a base distribution (typically CEM with isotropic Gaussian proposals).
  3. Product-of-Gaussians fusion (precision-weighted): At each planning step, the base distribution's precision (1/σ²) is added to the prior's precision, and the fused mean is the precision-weighted sum. This is parameter-free, closed-form, and degrades gracefully to the base sampler when the prior is uncertain.
  4. Closed-loop rollouts: The world model scores each sampled trajectory; the first action of the best-scoring sequence is executed, and the process repeats.

The key insight is that the world model already encodes the agent's action intuition — extracting it via a single MLP head avoids the architectural bloat of pairing the world model with a separate large VLM.

Key Results

Limitations and Future Work

Relevance to Patrick's Research

PRISM sits at the intersection of Patrick's interests: JEPA-style latent world models, model-based planning, and the question of how a world model is actually *used* rather than how accurately it predicts. The architectural minimalism — one MLP, no new modules — is a useful counterpoint to the "throw a VLM at it" trend in action prior learning. The +35pp / +32pp numbers also provide a clean baseline for comparing any future action-prior work.