Skip to main content

The JEPA Bet: Predict Before Act

A world model is a learned model of how an environment responds to actions. For Masar the environment is program construction: the state is a partial Orb program, an action is a behavior dispatch, and the next state is the expanded program. A world model over these (state, action, next_state) triples could predict the outcome of a construction choice before it is made — ranking or rejecting choices in latent space instead of building, validating, and backtracking.

This is the predict-before-act idea, and it is a deliberate, honest research bet — currently parked, with a clear gate. This page is the real story, not a product claim.

The idea

Formalize construction as a trajectory of (state_t, action_t, state_{t+1}) triples drawn from real build traces. Train a Joint-Embedding Predictive Architecture (JEPA): an online encoder maps the current state to a latent, a slow (EMA) target encoder maps the next state, and a predictor learns to map (latent_state, action) → predicted next latent. The self-supervised objective follows the VICReg → LeJEPA/SIGReg line (Bardes, Ponce & LeCun, 2022, arXiv:2105.04906; Balestriero & LeCun, 2025, arXiv:2511.08544).

If it works, three advisory uses follow: score whether a plan's steps stay coherent, suggest the orbital a near-complete app is missing, and flag knob/wiring combinations that drift off the valid manifold — all before spending a compile.

The honest result

We built it and ran the proof of concept. It did not validate: across seeds, discrimination sat at chance (AUC ≈ 0.49–0.51), and a single lucky run that looked good turned out, under an action-shuffle test, not to be using the action at all — the signal had leaked from the target encoder.

We then did a literature-grounded redesign (paper-correct sliced SIGReg, a FiLM action-conditioned predictor, an inverse-dynamics auxiliary head, an action-shuffle contrastive term, and input-norm rebalancing). That made the architecture demonstrably correct: seed-to-seed variance collapsed to near zero and every diagnostic moved as theory predicts. But on synthetic data the learning curve still sits below the self-supervised break-even floor — JEPA-style objectives are known to need many thousands of examples, and we have fewer than ten thousand synthetic ones. The model is not broken; it is data-starved.

Why it is parked — and the gate

The right tool depends on the regime, and at our current scale supervised models already win:

  • A linear co-occurrence model recovers base-build composition.
  • An intent-conditioned MLP handles intent×set interactions.
  • A graph network reads wiring when the behavior set is fixed.

JEPA only earns its place when labels become the bottleneck — when unlabeled traces vastly outnumber labeled ones and the supervised curve has plateaued. The concrete gate:

  1. Ship the supervised picker to production.
  2. Log real (state, action, next_state) traces from live builds.
  3. Once there are ≥ 10k real triples, re-run the gated sweep on the real distribution (which carries more signal than uniform synthetic data).
  4. If the pretrained representation beats supervised-from-scratch, wire the advisory scorers into the planner.

So JEPA is a gated research investment with the instrumentation already in place, not a failed experiment and not a shipped feature. The supervised stack delivers value today; JEPA stacks on top if and when the data regime arrives.

Next steps