System 2 for AI Agents
Daniel Kahneman's framing distinguishes two modes of thought: System 1 (fast, intuitive, automatic) and System 2 (slow, deliberate, analytical). An LLM is System 1. Masar is the System 2 layer beside it — though, importantly, Masar's "deliberation" is not another, bigger model: it is a symbolic compiler and a pair of verifiers.
System 1: the LLM
An LLM responds in one forward pass. That is powerful for tasks near its training distribution — interpreting intent, handling ambiguity, producing plausible structure. But autoregressive generation has structural limits:
- No guaranteed correctness. It cannot tell whether its output compiles or runs; confidence and validity are uncorrelated.
- No bounded action space. Asked for code, it can emit anything — including undeclared or unsafe constructs.
- No execution feedback by itself. It does not know what its output did.
These are properties of the paradigm, not bugs in any one model.
System 2: the symbolic layer
Masar closes those gaps without a second large model:
| System 1 limitation | System 2 answer |
|---|---|
| Output might not be valid | The compiler validates to zero errors/warnings; two verifiers gate every build |
| Unbounded action space | The model may only invoke declared, pre-verified behaviors |
| No execution feedback | Training and verification signals come from real dispatch + validation |
When an LLM says "here is your schema," nothing has been checked. When Masar accepts a build, it has been compiled and executed through both verifiers. The guarantee is symbolic and earned, not a probability the model asserts about itself.
How they work together
The LLM proposes; the symbolic layer disposes. A typical flow: the model reads the intent and selects a behavior and its typed parameters; the deterministic compiler composes and validates the program; the verifiers execute it; if something fails, the error feeds a bounded repair turn rather than a free-form guess.
The model never has to invent structure, and the symbolic layer never has to interpret ambiguous language. Each does what it is good at — which is also why the model can stay small and run locally.