← AI Agents Course8 / 14

Agent Memory

An agent with no memory forgets everything between turns. Give it short-term memory inside the prompt, long-term memory that persists, and a strategy to keep it from overflowing.

Ad 728×90

Short-term memory is the message list

Why: the model is stateless — it remembers a conversation only because you resend the whole message history each turn. When: this is short-term (working) memory; it lasts as long as the list and lives entirely inside the prompt. Where: append every user and assistant turn so context carries forward.

class Conversation:
    def __init__(self):
        self.messages = []          # this list IS the short-term memory

    def ask(self, user_text):
        self.messages.append({"role": "user", "content": user_text})
        response = client.messages.create(
            model="claude-opus-4-8", max_tokens=1024, messages=self.messages,
        )
        self.messages.append({"role": "assistant", "content": response.content})
        return text_of(response)

chat = Conversation()
chat.ask("My name is Sam.")
print(chat.ask("What's my name?"))   # "Sam" — it remembers within the list

Long-term memory persists across sessions

Why: short-term memory dies when the program stops; long-term memory survives by writing facts somewhere durable — a file, a database, or a vector store. When: use it for user preferences and facts that must outlive a single conversation. Where: load it at the start of a session and save new facts as you learn them.

import json, pathlib

STORE = pathlib.Path("memory.json")

def load_memory():
    return json.loads(STORE.read_text()) if STORE.exists() else {}

def remember(key, value):
    mem = load_memory()
    mem[key] = value
    STORE.write_text(json.dumps(mem))

# At the start of a session, fold long-term memory into the system prompt:
mem = load_memory()                     # {"name": "Sam", "plan": "Pro"}
system = "Known facts about the user: " + json.dumps(mem)

Episodic vs semantic memory

Why: not all memory is the same — episodic memory is what happened (past conversations, events), semantic memory is distilled facts (the user prefers metric units). When: store raw turns as episodic, then summarise them into semantic facts you actually reuse. Where: vector databases suit episodic recall; a key-value store suits semantic facts.

Episodic  ->  "On 2026-06-20 the user asked about refunds and
               mentioned order ORD-5512."   (what happened)

Semantic  ->  "User's default currency is EUR."   (a distilled fact)

Pattern: log episodes as they happen, periodically summarise them
into semantic facts, and feed only the semantic facts back into
the prompt — they're smaller and more useful turn to turn.

Keep memory from overflowing

Why: the context window is finite, so a long conversation eventually must be compressed or it stops fitting (and gets expensive). When: once history grows past a threshold, summarise the oldest turns into one note and drop them — a forgetting strategy. Where: keep recent turns verbatim and replace only the distant ones.

def compress(messages, keep_recent=6):
    if len(messages) <= keep_recent:
        return messages

    old, recent = messages[:-keep_recent], messages[-keep_recent:]
    summary = text_of(client.messages.create(
        model="claude-opus-4-8", max_tokens=512,
        messages=old + [{"role": "user",
                         "content": "Summarise the conversation so far in 3 lines."}],
    ))
    # Replace the old turns with one compact summary note.
    return [{"role": "user", "content": f"Earlier summary: {summary}"}] + recent

Short-term memory is the message list

class Conversation:
    def __init__(self):
        self.messages = []          # this list IS the short-term memory

    def ask(self, user_text):
        self.messages.append({"role": "user", "content": user_text})
        response = client.messages.create(
            model="claude-opus-4-8", max_tokens=1024, messages=self.messages,
        )
        self.messages.append({"role": "assistant", "content": response.content})
        return text_of(response)

chat = Conversation()
chat.ask("My name is Sam.")
print(chat.ask("What's my name?"))   # "Sam" — it remembers within the list

Long-term memory persists across sessions

import json, pathlib

STORE = pathlib.Path("memory.json")

def load_memory():
    return json.loads(STORE.read_text()) if STORE.exists() else {}

def remember(key, value):
    mem = load_memory()
    mem[key] = value
    STORE.write_text(json.dumps(mem))

# At the start of a session, fold long-term memory into the system prompt:
mem = load_memory()                     # {"name": "Sam", "plan": "Pro"}
system = "Known facts about the user: " + json.dumps(mem)

Episodic vs semantic memory

Episodic  ->  "On 2026-06-20 the user asked about refunds and
               mentioned order ORD-5512."   (what happened)

Semantic  ->  "User's default currency is EUR."   (a distilled fact)

Pattern: log episodes as they happen, periodically summarise them
into semantic facts, and feed only the semantic facts back into
the prompt — they're smaller and more useful turn to turn.

Keep memory from overflowing

def compress(messages, keep_recent=6):
    if len(messages) <= keep_recent:
        return messages

    old, recent = messages[:-keep_recent], messages[-keep_recent:]
    summary = text_of(client.messages.create(
        model="claude-opus-4-8", max_tokens=512,
        messages=old + [{"role": "user",
                         "content": "Summarise the conversation so far in 3 lines."}],
    ))
    # Replace the old turns with one compact summary note.
    return [{"role": "user", "content": f"Earlier summary: {summary}"}] + recent