Abstract
The paper challenges the assumption that more data plus a strong predictor suffices for world modeling. Across hundreds of structural causal models, predictors and Bayesian baselines recover the diagonal (posterior over observed worlds) but collapse to a point on the off-diagonal — the coupling between counterfactual worlds — on 28% of models to values no valid model can produce, while the true answer is an interval that more data never narrows. WorldKernel casts a world model as a single positive semidefinite coupling kernel K(T,T′) over admissible worlds: the diagonal is the ordinary posterior, the off-diagonal is the cross-world coupling that fixes counterfactuals, and PSD-enforcement yields tractable bounds on quantities that exact response-type programs cannot compute.
Key Contributions
- Failure-mode identification — empirical demonstration that strong predictors fail on 28% of SCMs because point predictions cannot represent uncertainty over counterfactual couplings
- WorldKernel framework — a world model = a single PSD coupling kernel K(T,T′) over admissible worlds; diagonal = posterior, off-diagonal = counterfactual coupling
- Tractable counterfactual bounds — PSD-enforcement gives polynomial-time bounds where the exact response-type program is intractable
- Logical structure sharpens the bound — ontology axioms tighten bounds by up to 1/3, propagating to couplings they never directly touch
- Targeted scars for efficient acquisition — constraints learned from encountered infeasibilities close the gap "several times faster" than untargeted ones
- Computational complexity result — full kernel reconstruction ≈ counting admissible worlds; tractable below the Sly–Sun threshold, inapproximable above
Method Details
Conceptual architecture:
- Admissible worlds — the set of all structural causal models (SCMs) consistent with the observed data
- Coupling kernel K(T,T′) — a positive semidefinite matrix indexed by pairs of admissible worlds; entries encode joint plausibility of two worlds under the same observations
- Diagonal = posterior — K(T,T) is exactly the Bayesian posterior over worlds given the data (what predictors recover)
- Off-diagonal = counterfactual coupling — K(T,T′) for T≠T′ fixes the answer to "what would have happened in T′ if the world were T," a quantity no point predictor can express
- PSD-enforcement as partial identification — positive semidefiniteness is a constraint the marginal posteriors lack; it bounds counterfactuals in polynomial time
- Ontology axioms — logical structure over admissible worlds tightens the PSD bound by propagating constraints to couplings that were not directly constrained (up to 1/3 tighter)
- Targeted scars — additional constraints learned from observed infeasibilities during data collection, prioritized to close the kernel gap faster than uniform sampling
- Reconstruction complexity — exact K is equivalent to #P-hard counting of admissible worlds; tractable below the Sly–Sun threshold, inapproximable above
Key Results
- On 28% of tested SCMs, point predictors and Bayesian baselines collapse to invalid counterfactual values that no valid world model can produce
- More data does not narrow the off-diagonal interval — the gap is structural, not statistical
- PSD-enforcement gives polynomial-time counterfactual bounds where exact response-type computation is intractable
- Ontology axioms tighten the PSD bound by up to ~33% on couplings they never directly touch
- Targeted scars close the kernel gap "several times faster" than untargeted acquisition
- Full K reconstruction is tractable below Sly–Sun, inapproximable above — a clean complexity-theoretic boundary
Limitations and Future Work
- The complexity result says K is inapproximable above Sly–Sun — the authors do not claim to beat the worst case, only to provide useful partial identification
- Empirical results are on synthetic SCM families; scaling to real-world causal graphs is open
- Targeted scars require encountering infeasibilities in deployment, which is a chicken-and-egg problem for the very domains where bounds matter most
- The framework is theoretical; turning K into a deployable world-model architecture for embodied agents is not addressed
Relevance to Patrick's Research
WorldKernel is the rare world-model paper that attacks the epistemic foundations rather than the architecture. For Patrick's tracking, it formalizes a sharp distinction between diagonal (what any predictor learns) and off-diagonal (what counterfactual reasoning needs) — a distinction that doesn't show up in JEPA, Genie, or Sora-style models but matters for any agent that asks "what would have happened if I had done X." The PSD-coupling formulation is also conceptually aligned with the kernel-method tradition Yann LeCun's JEPA lineage draws on. The 28% failure rate is a concrete number to cite when arguing that generative world models alone are insufficient for counterfactual planning.