System 2 Thinking for AI Agents
Daniel Kahneman's research distinguishes two modes of thought: System 1 (fast, intuitive, automatic) and System 2 (slow, deliberate, analytical). LLMs are System 1. Masar is System 2. Together, they form a complete agent.
System 1: The LLM
An LLM responds instantly. Give it a prompt, and it generates an answer in one forward pass. This is powerful for tasks that match its training distribution: writing prose, translating languages, answering factual questions, generating code snippets.
But System 1 has known failure modes:
- No planning: It generates left to right. It cannot reason about dependencies or ordering before starting.
- No verification: It has no way to check whether its output is correct. Confidence and accuracy are uncorrelated.
- No memory across sessions: Each conversation starts from zero. Last week's successful approach is forgotten.
- Overconfidence on edge cases: Novel combinations of familiar elements get treated the same as common cases.
These aren't bugs in any particular model. They're structural properties of autoregressive generation.
System 2: Masar
Masar addresses each of these gaps:
| System 1 Gap | System 2 Solution |
|---|---|
| No planning | Dependency-ordered instruction sequences |
| No verification | Validity prediction with error categorization |
| No memory | Episode storage, recall, and pattern extraction |
| Overconfidence | Calibrated probability scores |
When an LLM says "here's your schema," it has no idea whether it's valid. When Masar says "this has a 94% chance of being valid with no predicted errors," that probability is calibrated against thousands of evaluated schemas.
How They Work Together
The LLM and Masar have complementary strengths:
LLM strengths: Understanding natural language intent, generating syntactically correct code, handling ambiguity, creative solutions.
Masar strengths: Structural planning, fast validation, experience-based learning, calibrated confidence.
A typical interaction:
- User says: "Build me a helpdesk system"
- LLM interprets the intent and identifies the domain
- Masar recalls past helpdesk builds and generates a 14-step plan
- LLM executes each step, generating the actual schema content
- Masar verifies each intermediate result
- If something fails, Masar suggests repairs; LLM applies them
- Masar stores the completed episode
The LLM never has to guess what to build next. Masar never has to generate natural language or interpret ambiguous requests. Each system does what it's good at.
The Result
Agents with both systems consistently outperform agents with only an LLM:
- Higher first-attempt success rates
- Fewer wasted LLM calls (no generating in the wrong direction)
- Improving performance over time (memory accumulation)
- Reliable confidence estimates (know when to ask for help)
Next Steps
- How Masar Works - Technical details on the four capabilities
- Agent Integration - Build an agent that uses both systems