Files
agent-claw/agentcore-memory-research.md
daniel 0369a74ac1 Initial research: OpenClaw on AgentCore architecture
- Architecture comparison (OpenClaw daemon vs AgentCore serverless)
- Component compatibility analysis
- Fargate analysis
- AgentCore rebuild plan (Telegram, zero always-on compute)
- Memory strategy: AgentCore Memory + factbase as structured KB
- Serverless relay patterns per channel
- All open questions resolved
- OpenClaw feature delta March→May 2026
- Build phases and cost estimates
2026-05-04 08:28:52 -05:00

10 KiB

AgentCore Memory — Deep Dive & MEMORY.md Replacement Analysis

How AgentCore Memory Works

Architecture

Memory is a managed service completely separate from the Runtime container. Two tiers:

Short-term (events)         Long-term (extracted records)
─────────────────────       ──────────────────────────────
actor → session → events    actor → namespace → records
                             (semantic search available)
                             (cross-session, persistent)

Short-term Memory

  • Stores raw conversation events (turns) via CreateEvent API
  • Keyed by actor_id + session_id
  • Retrieval: ListEvents(session_id) → full conversation history
  • Survives microVM termination — stored in the managed service, not the container
  • This replaces JSONL session transcripts completely

Long-term Memory — Three Built-in Strategies

Configured when creating a Memory resource. Extraction runs asynchronously in the background after each CreateEvent. Model costs for built-in extraction are included in AgentCore Memory pricing (confirmed by AWS support).

Strategy What it extracts Namespace pattern
SUMMARIZATION Session summaries /summaries/{actorId}/{sessionId}/
USER_PREFERENCE Preferences, habits, recurring facts /preferences/{actorId}/
SEMANTIC Raw facts, entities, knowledge /facts/{actorId}/

All three can run on the same memory resource simultaneously.

Self-managed Strategy

You control the entire extraction pipeline:

  1. Configure triggers: message count (messageCount: 6), token count (tokenCount: 1000), or idle timeout (idleSessionTimeout: 30)
  2. AgentCore writes conversation payload to your S3 bucket
  3. Publishes notification to your SNS topic
  4. Your Lambda picks it up, runs whatever extraction logic you want
  5. You write results back via BatchCreateMemoryRecords

This is the MEMORY.md pattern but managed in the cloud — you decide what to write and how.

Strands Integration

The Strands AgentCoreMemorySessionManager handles everything automatically:

config = AgentCoreMemoryConfig(
    memory_id=MEMORY_ID,
    session_id=SESSION_ID,    # maps to Telegram chat_id + date
    actor_id=ACTOR_ID,        # = user identity
    batch_size=5,             # buffer 5 turns before flushing to save API calls
)

with AgentCoreMemorySessionManager(config) as session_manager:
    agent = Agent(
        system_prompt=build_system_prompt(),  # SOUL.md + AGENTS.md + retrieved memories
        session_manager=session_manager,
    )
    response = agent(user_message)
# on exit: buffers flushed, async long-term extraction kicks off

Every conversation turn is automatically stored. batch_size reduces API calls for rapid exchanges.


MEMORY.md vs AgentCore Memory

What MEMORY.md Does Today

  • Curated long-term memory the agent manually edits
  • Loaded wholesale into the system prompt each session
  • Agent writes specific things it wants to remember
  • Human-readable markdown

What AgentCore Memory Provides

  • Short-term: full conversation history per session (replaces JSONL)
  • Long-term SUMMARIZATION: session summaries auto-extracted
  • Long-term USER_PREFERENCE: preferences auto-extracted and consolidated across sessions
  • Long-term SEMANTIC: facts/entities auto-extracted
  • Semantic search: RetrieveMemoryRecords(query="...") → relevant memories surfaced into system prompt
  • Self-managed strategy: explicit "write this to memory" control, just like the agent writing MEMORY.md

Verdict: Replace MEMORY.md with AgentCore Memory

AgentCore Memory is strictly more powerful:

  • Auto-extraction means the agent doesn't have to manually curate (though it can via self-managed strategy)
  • Semantic search means you don't inject ALL memories into the system prompt — you inject the RELEVANT ones
  • No MEMORY.md bloat: today MEMORY.md grows unbounded; AgentCore Memory consolidates automatically
  • Cross-session persistence without any file I/O

The tradeoff: less direct control over what gets written. Mitigated with self-managed strategy for explicit writes.


The S3 Round-Trip Concern — Addressed

Daniel's concern: S3 round-trip on every interaction.

With AgentCore Memory + Strands:

What When Round-trip?
Conversation turns (short-term) Each turn, async/batched Non-blocking, buffered by batch_size
Long-term extraction Background async after turns Zero latency impact
Memory retrieval (session start) Once per session One RetrieveMemoryRecords call, ~50ms
Personality files (SOUL.md etc.) Once per session start See below

For personality files specifically: load them once when the session starts, cache in the container's in-memory dict. The same warm microVM handles all messages in an 8-hour session — SOUL.md loads once, not once per message. No per-message S3 calls.

In practice, the flow is:

Session start (once):
  1. Load SOUL.md, AGENTS.md, USER.md from S3 → cache in container memory
  2. RetrieveMemoryRecords(query="important context, preferences") → top-k memories
  3. Build system_prompt = static_files + retrieved_memories
  4. Pass to Strands agent

Each message (no extra round-trips):
  - Strands auto-stores turns to AgentCore Memory (async/batched)
  - Long-term extraction runs in background

┌─────────────────────────────────────────────────────────────┐
│  S3 (persona bucket)                                        │
│  SOUL.md, AGENTS.md, IDENTITY.md, USER.md, HEARTBEAT.md    │
│  → Loaded ONCE at session start, cached in container memory │
│  → Updated rarely (when Daniel edits them)                  │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  AgentCore Memory (replaces MEMORY.md + JSONL transcripts)  │
│                                                             │
│  Short-term: conversation turns (per session)               │
│  → Strands session_manager handles automatically            │
│                                                             │
│  Long-term strategies:                                      │
│  SUMMARIZATION → /summaries/{actorId}/{sessionId}/          │
│  USER_PREFERENCE → /preferences/{actorId}/                  │
│  SEMANTIC → /facts/{actorId}/                               │
│                                                             │
│  Self-managed strategy (for explicit "remember this"):      │
│  Trigger: idle timeout or message count                     │
│  SNS → Lambda → custom extraction → BatchCreateMemoryRecords│
│  → "/curated/{actorId}/" namespace                          │
│  → This is the MEMORY.md equivalent, automated             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  DynamoDB                                                   │
│  telegram_chat_id → agentcore_session_id + actor_id         │
│  heartbeat state (last check timestamps)                    │
│  cron job definitions                                       │
└─────────────────────────────────────────────────────────────┘

Session Start Pattern

@app.entrypoint
async def main(payload, context):
    actor_id = payload["actor_id"]   # = Telegram user ID
    session_id = payload["session_id"]  # = from DynamoDB lookup

    # Load static files (once per warm session, cached)
    if not PERSONA_CACHE.loaded:
        PERSONA_CACHE.update(load_from_s3(["SOUL.md", "AGENTS.md", "USER.md"]))

    # Retrieve relevant long-term memories (semantic search)
    memories = memory_session.search_long_term_memories(
        query=payload["message"],
        namespace_prefix=f"/preferences/{actor_id}/",
        top_k=5
    )

    # Build system prompt
    system_prompt = build_prompt(PERSONA_CACHE, memories)

    # Run agent (session_manager handles turn storage automatically)
    with AgentCoreMemorySessionManager(config) as session_manager:
        agent = Agent(system_prompt=system_prompt, session_manager=session_manager)
        return {"response": agent(payload["message"]).message}

What AgentCore Memory Pricing Covers

From the pricing page and AWS re:Post confirmation:

  • Built-in strategies (SUMMARIZATION, USER_PREFERENCE, SEMANTIC): model extraction costs are included in Memory pricing
  • Self-managed strategy: you pay for your own Lambda + Bedrock calls
  • Memory storage: billed per GB stored
  • RetrieveMemoryRecords (semantic search): billed per search

Exact rates not yet published clearly, but designed to be low for personal assistant scale.


Open Questions Remaining

  1. Pricing for AgentCore Memory: exact rates for storage + retrieval not clearly published yet. Need to check when actually provisioning.
  2. S3 persona file cache invalidation: when SOUL.md is updated in S3, the warm container won't know. Need a mechanism — either DynamoDB version flag checked at session start, or just accept ~8hr staleness (fine for persona files).
  3. Self-managed extraction timing: confirm whether idle-session trigger in self-managed strategy fires reliably at session end vs requiring explicit trigger. This determines whether the "write to memory" tool works reliably.

Research: 2026-05-04. Sources: AgentCore Memory docs (memory-types, memory-strategies, memory-organization, strands integration), AgentCore pricing page.