An agent with no memory forgets everything between turns. Give it short-term memory inside the prompt, long-term memory that persists, and a strategy to keep it from overflowing.
Why: the model is stateless — it remembers a conversation only because you resend the whole message history each turn. When: this is short-term (working) memory; it lasts as long as the list and lives entirely inside the prompt. Where: append every user and assistant turn so context carries forward.
class Conversation:
def __init__(self):
self.messages = [] # this list IS the short-term memory
def ask(self, user_text):
self.messages.append({"role": "user", "content": user_text})
response = client.messages.create(
model="claude-opus-4-8", max_tokens=1024, messages=self.messages,
)
self.messages.append({"role": "assistant", "content": response.content})
return text_of(response)
chat = Conversation()
chat.ask("My name is Sam.")
print(chat.ask("What's my name?")) # "Sam" — it remembers within the listWhy: short-term memory dies when the program stops; long-term memory survives by writing facts somewhere durable — a file, a database, or a vector store. When: use it for user preferences and facts that must outlive a single conversation. Where: load it at the start of a session and save new facts as you learn them.
import json, pathlib
STORE = pathlib.Path("memory.json")
def load_memory():
return json.loads(STORE.read_text()) if STORE.exists() else {}
def remember(key, value):
mem = load_memory()
mem[key] = value
STORE.write_text(json.dumps(mem))
# At the start of a session, fold long-term memory into the system prompt:
mem = load_memory() # {"name": "Sam", "plan": "Pro"}
system = "Known facts about the user: " + json.dumps(mem)Why: not all memory is the same — episodic memory is what happened (past conversations, events), semantic memory is distilled facts (the user prefers metric units). When: store raw turns as episodic, then summarise them into semantic facts you actually reuse. Where: vector databases suit episodic recall; a key-value store suits semantic facts.
Episodic -> "On 2026-06-20 the user asked about refunds and
mentioned order ORD-5512." (what happened)
Semantic -> "User's default currency is EUR." (a distilled fact)
Pattern: log episodes as they happen, periodically summarise them
into semantic facts, and feed only the semantic facts back into
the prompt — they're smaller and more useful turn to turn.Why: the context window is finite, so a long conversation eventually must be compressed or it stops fitting (and gets expensive). When: once history grows past a threshold, summarise the oldest turns into one note and drop them — a forgetting strategy. Where: keep recent turns verbatim and replace only the distant ones.
def compress(messages, keep_recent=6):
if len(messages) <= keep_recent:
return messages
old, recent = messages[:-keep_recent], messages[-keep_recent:]
summary = text_of(client.messages.create(
model="claude-opus-4-8", max_tokens=512,
messages=old + [{"role": "user",
"content": "Summarise the conversation so far in 3 lines."}],
))
# Replace the old turns with one compact summary note.
return [{"role": "user", "content": f"Earlier summary: {summary}"}] + recent