title: "FAWAM: Force-Aware World Action Models for Closed-Loop Contact-Rich Manipulation"

arxiv_id: "2606.08555"

date: "2026-06-07"

tags: [world-model, robot-learning, contact-rich, force-aware, manipulation, closed-loop]

---

Abstract

FAWAM extends the World Action Model paradigm to contact-rich manipulation by incorporating force signals at three levels — perception, prediction, and closed-loop execution. It encodes historical 6-axis force/torque to modulate action generation, jointly predicts future actions and end-effector wrenches to model contact evolution, and uses the predicted wrench trajectory as an execution-time reference for online residual correction.

Key Contributions

Method Details

The model is a WAM with three force-aware extensions:

  1. Perception-level force encoding: A small encoder processes the history of 6-axis F/T signals (forces and torques in 3D) and produces a force embedding. This embedding modulates the action head via FiLM or cross-attention, conditioning action generation on contact state.
  1. Prediction-level wrench forecasting: The model has two heads sharing a backbone:
  1. Execution-level residual correction: At runtime, the predicted wrench trajectory is used as a reference. A small residual policy reads (predicted wrench, real-time measured wrench, current state) and outputs a correction that is added to the planned action. This makes the system robust to model error and to unexpected contact events.

The backbone is a video + force transformer; the training objective combines behavior-cloning loss, wrench-prediction loss, and a wrench-trajectory consistency loss between the predicted action's implied contact dynamics and the predicted wrench trajectory.

Key Results

Limitations and Future Work

Relevance to Patrick's Research

FAWAM is relevant to Patrick's interest in world models for control because it shows that predicting auxiliary physical signals (wrenches) alongside actions materially improves performance on contact-rich tasks. This generalizes a useful principle: the value of a world model is not just in rolling out future observations, but in rolling out task-relevant physical quantities. The three-level integration pattern (perception / prediction / execution) is also a clean architectural template for any future work on multi-modal world models.