The JEPA Bet: Predict Before Act
A world model is a learned model of how an environment responds to actions. For
Masar the environment is program construction: the state is a partial Orb
program, an action is a behavior dispatch, and the next state is the expanded
program. A world model over these (state, action, next_state) triples could
predict the outcome of a construction choice before it is made — ranking or
rejecting choices in latent space instead of building, validating, and
backtracking.
This is the predict-before-act idea, and it is a deliberate, honest research bet — currently parked, with a clear gate. This page is the real story, not a product claim.
The idea
Formalize construction as a trajectory of (state_t, action_t, state_{t+1})
triples drawn from real build traces. Train a Joint-Embedding Predictive
Architecture (JEPA): an online encoder maps the current state to a latent, a
slow (EMA) target encoder maps the next state, and a predictor learns to map
(latent_state, action) → predicted next latent. The self-supervised objective
follows the VICReg → LeJEPA/SIGReg line (Bardes, Ponce & LeCun, 2022,
arXiv:2105.04906; Balestriero & LeCun, 2025,
arXiv:2511.08544).
If it works, three advisory uses follow: score whether a plan's steps stay coherent, suggest the orbital a near-complete app is missing, and flag knob/wiring combinations that drift off the valid manifold — all before spending a compile.
The honest result
We built it and ran the proof of concept. It did not validate: across seeds, discrimination sat at chance (AUC ≈ 0.49–0.51), and a single lucky run that looked good turned out, under an action-shuffle test, not to be using the action at all — the signal had leaked from the target encoder.
We then did a literature-grounded redesign (paper-correct sliced SIGReg, a FiLM action-conditioned predictor, an inverse-dynamics auxiliary head, an action-shuffle contrastive term, and input-norm rebalancing). That made the architecture demonstrably correct: seed-to-seed variance collapsed to near zero and every diagnostic moved as theory predicts. But on synthetic data the learning curve still sits below the self-supervised break-even floor — JEPA-style objectives are known to need many thousands of examples, and we have fewer than ten thousand synthetic ones. The model is not broken; it is data-starved.
Why it is parked — and the gate
The right tool depends on the regime, and at our current scale supervised models already win:
- A linear co-occurrence model recovers base-build composition.
- An intent-conditioned MLP handles intent×set interactions.
- A graph network reads wiring when the behavior set is fixed.
JEPA only earns its place when labels become the bottleneck — when unlabeled traces vastly outnumber labeled ones and the supervised curve has plateaued. The concrete gate:
- Ship the supervised picker to production.
- Log real
(state, action, next_state)traces from live builds. - Once there are ≥ 10k real triples, re-run the gated sweep on the real distribution (which carries more signal than uniform synthetic data).
- If the pretrained representation beats supervised-from-scratch, wire the advisory scorers into the planner.
So JEPA is a gated research investment with the instrumentation already in place, not a failed experiment and not a shipped feature. The supervised stack delivers value today; JEPA stacks on top if and when the data regime arrives.