← AI Agents Course9 / 14

Planner–Executor & Self-Critique

Two architectures that make agents more reliable — split planning from doing so the agent works to a plan, and add a critic pass so it checks its own work before finishing.

Ad 728×90

Plan first, then execute

Why: on complex tasks, deciding the whole plan up front beats improvising step by step — fewer wrong turns, and you can inspect the plan before any action runs. When: use planner–executor for multi-step jobs with a clear goal. Where: the planner produces steps; the executor runs each with your existing agent loop.

def plan(goal):
    response = client.messages.create(
        model="claude-opus-4-8", max_tokens=1024,
        messages=[{"role": "user", "content":
            f"Break this goal into a numbered list of concrete steps.\nGoal: {goal}"}],
    )
    return text_of(response)

def execute(goal, tools):
    steps = plan(goal)                       # decide everything up front
    return run_agent(f"Follow this plan:\n{steps}\n\nGoal: {goal}", tools)

print(execute("Find this week's top news story and write a 3-line summary.",
              tools=[search_tool]))

Add a self-critique pass

Why: a second look catches mistakes the first pass misses — the agent grades its own answer against the requirements and revises. When: use it for tasks with checkable criteria (format, completeness, factual grounding). Where: feed the draft back with the rubric and ask for a verdict plus a fix.

def with_critique(goal, tools):
    draft = execute(goal, tools)

    review = text_of(client.messages.create(
        model="claude-opus-4-8", max_tokens=1024,
        messages=[{"role": "user", "content":
            f"Goal: {goal}\n\nDraft answer:\n{draft}\n\n"
            "Does the draft fully meet the goal? Reply PASS or FAIL with a "
            "reason. If FAIL, output a corrected version."}],
    ))
    return review

print(with_critique("Summarise the news story in exactly 3 lines.", [search_tool]))

A fresh critic beats self-review

Why: an agent reviewing its own context shares its blind spots; a critic with a clean slate, given only the task and the output, catches more. When: for high-stakes work, run the critic as a separate call with no access to the executor's reasoning. Where: pass just the goal and the final answer — nothing else.

Weaker:   same conversation reviews itself
            (carries the same assumptions and mistakes)

Stronger: a SEPARATE call sees only { goal, final answer }
            and judges against the rubric from scratch

The fresh critic has no stake in the draft, so it's a tougher,
more honest reviewer.

Plan first, then execute

def plan(goal):
    response = client.messages.create(
        model="claude-opus-4-8", max_tokens=1024,
        messages=[{"role": "user", "content":
            f"Break this goal into a numbered list of concrete steps.\nGoal: {goal}"}],
    )
    return text_of(response)

def execute(goal, tools):
    steps = plan(goal)                       # decide everything up front
    return run_agent(f"Follow this plan:\n{steps}\n\nGoal: {goal}", tools)

print(execute("Find this week's top news story and write a 3-line summary.",
              tools=[search_tool]))

Add a self-critique pass

def with_critique(goal, tools):
    draft = execute(goal, tools)

    review = text_of(client.messages.create(
        model="claude-opus-4-8", max_tokens=1024,
        messages=[{"role": "user", "content":
            f"Goal: {goal}\n\nDraft answer:\n{draft}\n\n"
            "Does the draft fully meet the goal? Reply PASS or FAIL with a "
            "reason. If FAIL, output a corrected version."}],
    ))
    return review

print(with_critique("Summarise the news story in exactly 3 lines.", [search_tool]))

A fresh critic beats self-review

Weaker:   same conversation reviews itself
            (carries the same assumptions and mistakes)

Stronger: a SEPARATE call sees only { goal, final answer }
            and judges against the rubric from scratch

The fresh critic has no stake in the draft, so it's a tougher,
more honest reviewer.