Skip to main content

System 2 Thinking for AI Agents

Daniel Kahneman's research distinguishes two modes of thought: System 1 (fast, intuitive, automatic) and System 2 (slow, deliberate, analytical). LLMs are System 1. Masar is System 2. Together, they form a complete agent.

System 1: The LLM

An LLM responds instantly. Give it a prompt, and it generates an answer in one forward pass. This is powerful for tasks that match its training distribution: writing prose, translating languages, answering factual questions, generating code snippets.

But System 1 has known failure modes:

  • No planning: It generates left to right. It cannot reason about dependencies or ordering before starting.
  • No verification: It has no way to check whether its output is correct. Confidence and accuracy are uncorrelated.
  • No memory across sessions: Each conversation starts from zero. Last week's successful approach is forgotten.
  • Overconfidence on edge cases: Novel combinations of familiar elements get treated the same as common cases.

These aren't bugs in any particular model. They're structural properties of autoregressive generation.

System 2: Masar

Masar addresses each of these gaps:

System 1 GapSystem 2 Solution
No planningDependency-ordered instruction sequences
No verificationValidity prediction with error categorization
No memoryEpisode storage, recall, and pattern extraction
OverconfidenceCalibrated probability scores

When an LLM says "here's your schema," it has no idea whether it's valid. When Masar says "this has a 94% chance of being valid with no predicted errors," that probability is calibrated against thousands of evaluated schemas.

How They Work Together

The LLM and Masar have complementary strengths:

LLM strengths: Understanding natural language intent, generating syntactically correct code, handling ambiguity, creative solutions.

Masar strengths: Structural planning, fast validation, experience-based learning, calibrated confidence.

A typical interaction:

  1. User says: "Build me a helpdesk system"
  2. LLM interprets the intent and identifies the domain
  3. Masar recalls past helpdesk builds and generates a 14-step plan
  4. LLM executes each step, generating the actual schema content
  5. Masar verifies each intermediate result
  6. If something fails, Masar suggests repairs; LLM applies them
  7. Masar stores the completed episode

The LLM never has to guess what to build next. Masar never has to generate natural language or interpret ambiguous requests. Each system does what it's good at.

The Result

Agents with both systems consistently outperform agents with only an LLM:

  • Higher first-attempt success rates
  • Fewer wasted LLM calls (no generating in the wrong direction)
  • Improving performance over time (memory accumulation)
  • Reliable confidence estimates (know when to ask for help)

Next Steps