Two architectures that make agents more reliable — split planning from doing so the agent works to a plan, and add a critic pass so it checks its own work before finishing.
Why: on complex tasks, deciding the whole plan up front beats improvising step by step — fewer wrong turns, and you can inspect the plan before any action runs. When: use planner–executor for multi-step jobs with a clear goal. Where: the planner produces steps; the executor runs each with your existing agent loop.
def plan(goal):
response = client.messages.create(
model="claude-opus-4-8", max_tokens=1024,
messages=[{"role": "user", "content":
f"Break this goal into a numbered list of concrete steps.\nGoal: {goal}"}],
)
return text_of(response)
def execute(goal, tools):
steps = plan(goal) # decide everything up front
return run_agent(f"Follow this plan:\n{steps}\n\nGoal: {goal}", tools)
print(execute("Find this week's top news story and write a 3-line summary.",
tools=[search_tool]))Why: a second look catches mistakes the first pass misses — the agent grades its own answer against the requirements and revises. When: use it for tasks with checkable criteria (format, completeness, factual grounding). Where: feed the draft back with the rubric and ask for a verdict plus a fix.
def with_critique(goal, tools):
draft = execute(goal, tools)
review = text_of(client.messages.create(
model="claude-opus-4-8", max_tokens=1024,
messages=[{"role": "user", "content":
f"Goal: {goal}\n\nDraft answer:\n{draft}\n\n"
"Does the draft fully meet the goal? Reply PASS or FAIL with a "
"reason. If FAIL, output a corrected version."}],
))
return review
print(with_critique("Summarise the news story in exactly 3 lines.", [search_tool]))Why: an agent reviewing its own context shares its blind spots; a critic with a clean slate, given only the task and the output, catches more. When: for high-stakes work, run the critic as a separate call with no access to the executor's reasoning. Where: pass just the goal and the final answer — nothing else.
Weaker: same conversation reviews itself
(carries the same assumptions and mistakes)
Stronger: a SEPARATE call sees only { goal, final answer }
and judges against the rubric from scratch
The fresh critic has no stake in the draft, so it's a tougher,
more honest reviewer.