ForesightFlow is a self-guided flow-matching policy that augments generated action chunks with a learned success-potential trajectory, enabling a single model to propose AND score candidate actions (best-of-K) without an external critic.
- Decoupled advantage-weighted flow matching: Exponentiated advantage weights apply only to action velocities (not potential coordinates), preventing value hallucination from overconfident scores.
ForesightFlow trains a conditional flow matching model over action chunks conditioned on vision-language inputs. Each action chunk is augmented with a learned "success-potential" scalar field. The flow model simultaneously:
The key architectural insight is separating the velocity field into two heads: one for action velocities (weighted by exponentiated advantages during training) and one for potential velocities (trained uniformly). This prevents failure gradients from being suppressed during policy improvement.
Applied to: 5 BEHAVIOR-1K simulation tasks + 5 real-world bimanual manipulation tasks. Uses a VLA backbone (RT-series architecture) with flow matching heads.
- Best-of-K inference at test time is computationally expensive; the one-step estimator mitigates but doesn't eliminate this.
Directly relevant to VLA world model policy learning. The flow-matching paradigm offers an alternative to diffusion or autoregressive action generation for world-model-based control. Key idea of using "potential" as an internal world model to guide exploration is aligned with predictive architecture research.