Files

daniel 0369a74ac1 Initial research: OpenClaw on AgentCore architecture

- Architecture comparison (OpenClaw daemon vs AgentCore serverless)
- Component compatibility analysis
- Fargate analysis
- AgentCore rebuild plan (Telegram, zero always-on compute)
- Memory strategy: AgentCore Memory + factbase as structured KB
- Serverless relay patterns per channel
- All open questions resolved
- OpenClaw feature delta March→May 2026
- Build phases and cost estimates

2026-05-04 08:28:52 -05:00

10 KiB

Raw Blame History

AgentCore Memory — Deep Dive & MEMORY.md Replacement Analysis

How AgentCore Memory Works

Architecture

Memory is a managed service completely separate from the Runtime container. Two tiers:

Short-term (events)         Long-term (extracted records)
─────────────────────       ──────────────────────────────
actor → session → events    actor → namespace → records
                             (semantic search available)
                             (cross-session, persistent)

Short-term Memory

Stores raw conversation events (turns) via CreateEvent API
Keyed by actor_id + session_id
Retrieval: ListEvents(session_id) → full conversation history
Survives microVM termination — stored in the managed service, not the container
This replaces JSONL session transcripts completely

Long-term Memory — Three Built-in Strategies

Configured when creating a Memory resource. Extraction runs asynchronously in the background after each CreateEvent. Model costs for built-in extraction are included in AgentCore Memory pricing (confirmed by AWS support).

Strategy	What it extracts	Namespace pattern
`SUMMARIZATION`	Session summaries	`/summaries/{actorId}/{sessionId}/`
`USER_PREFERENCE`	Preferences, habits, recurring facts	`/preferences/{actorId}/`
`SEMANTIC`	Raw facts, entities, knowledge	`/facts/{actorId}/`

All three can run on the same memory resource simultaneously.

Self-managed Strategy

You control the entire extraction pipeline:

Configure triggers: message count (messageCount: 6), token count (tokenCount: 1000), or idle timeout (idleSessionTimeout: 30)
AgentCore writes conversation payload to your S3 bucket
Publishes notification to your SNS topic
Your Lambda picks it up, runs whatever extraction logic you want
You write results back via BatchCreateMemoryRecords

This is the MEMORY.md pattern but managed in the cloud — you decide what to write and how.

Strands Integration

The Strands AgentCoreMemorySessionManager handles everything automatically:

config = AgentCoreMemoryConfig(
    memory_id=MEMORY_ID,
    session_id=SESSION_ID,    # maps to Telegram chat_id + date
    actor_id=ACTOR_ID,        # = user identity
    batch_size=5,             # buffer 5 turns before flushing to save API calls
)

with AgentCoreMemorySessionManager(config) as session_manager:
    agent = Agent(
        system_prompt=build_system_prompt(),  # SOUL.md + AGENTS.md + retrieved memories
        session_manager=session_manager,
    )
    response = agent(user_message)
# on exit: buffers flushed, async long-term extraction kicks off

Every conversation turn is automatically stored. batch_size reduces API calls for rapid exchanges.

MEMORY.md vs AgentCore Memory

What MEMORY.md Does Today

Curated long-term memory the agent manually edits
Loaded wholesale into the system prompt each session
Agent writes specific things it wants to remember
Human-readable markdown

What AgentCore Memory Provides

Short-term: full conversation history per session (replaces JSONL)
Long-term SUMMARIZATION: session summaries auto-extracted
Long-term USER_PREFERENCE: preferences auto-extracted and consolidated across sessions
Long-term SEMANTIC: facts/entities auto-extracted
Semantic search: RetrieveMemoryRecords(query="...") → relevant memories surfaced into system prompt
Self-managed strategy: explicit "write this to memory" control, just like the agent writing MEMORY.md

Verdict: Replace MEMORY.md with AgentCore Memory

AgentCore Memory is strictly more powerful:

Auto-extraction means the agent doesn't have to manually curate (though it can via self-managed strategy)
Semantic search means you don't inject ALL memories into the system prompt — you inject the RELEVANT ones
No MEMORY.md bloat: today MEMORY.md grows unbounded; AgentCore Memory consolidates automatically
Cross-session persistence without any file I/O

The tradeoff: less direct control over what gets written. Mitigated with self-managed strategy for explicit writes.

The S3 Round-Trip Concern — Addressed

Daniel's concern: S3 round-trip on every interaction.

With AgentCore Memory + Strands:

What	When	Round-trip?
Conversation turns (short-term)	Each turn, async/batched	Non-blocking, buffered by `batch_size`
Long-term extraction	Background async after turns	Zero latency impact
Memory retrieval (session start)	Once per session	One `RetrieveMemoryRecords` call, ~50ms
Personality files (SOUL.md etc.)	Once per session start	See below

For personality files specifically: load them once when the session starts, cache in the container's in-memory dict. The same warm microVM handles all messages in an 8-hour session — SOUL.md loads once, not once per message. No per-message S3 calls.

In practice, the flow is:

Session start (once):
  1. Load SOUL.md, AGENTS.md, USER.md from S3 → cache in container memory
  2. RetrieveMemoryRecords(query="important context, preferences") → top-k memories
  3. Build system_prompt = static_files + retrieved_memories
  4. Pass to Strands agent

Each message (no extra round-trips):
  - Strands auto-stores turns to AgentCore Memory (async/batched)
  - Long-term extraction runs in background

Recommended Storage Architecture

┌─────────────────────────────────────────────────────────────┐
│  S3 (persona bucket)                                        │
│  SOUL.md, AGENTS.md, IDENTITY.md, USER.md, HEARTBEAT.md    │
│  → Loaded ONCE at session start, cached in container memory │
│  → Updated rarely (when Daniel edits them)                  │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  AgentCore Memory (replaces MEMORY.md + JSONL transcripts)  │
│                                                             │
│  Short-term: conversation turns (per session)               │
│  → Strands session_manager handles automatically            │
│                                                             │
│  Long-term strategies:                                      │
│  SUMMARIZATION → /summaries/{actorId}/{sessionId}/          │
│  USER_PREFERENCE → /preferences/{actorId}/                  │
│  SEMANTIC → /facts/{actorId}/                               │
│                                                             │
│  Self-managed strategy (for explicit "remember this"):      │
│  Trigger: idle timeout or message count                     │
│  SNS → Lambda → custom extraction → BatchCreateMemoryRecords│
│  → "/curated/{actorId}/" namespace                          │
│  → This is the MEMORY.md equivalent, automated             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  DynamoDB                                                   │
│  telegram_chat_id → agentcore_session_id + actor_id         │
│  heartbeat state (last check timestamps)                    │
│  cron job definitions                                       │
└─────────────────────────────────────────────────────────────┘

Session Start Pattern

@app.entrypoint
async def main(payload, context):
    actor_id = payload["actor_id"]   # = Telegram user ID
    session_id = payload["session_id"]  # = from DynamoDB lookup

    # Load static files (once per warm session, cached)
    if not PERSONA_CACHE.loaded:
        PERSONA_CACHE.update(load_from_s3(["SOUL.md", "AGENTS.md", "USER.md"]))

    # Retrieve relevant long-term memories (semantic search)
    memories = memory_session.search_long_term_memories(
        query=payload["message"],
        namespace_prefix=f"/preferences/{actor_id}/",
        top_k=5
    )

    # Build system prompt
    system_prompt = build_prompt(PERSONA_CACHE, memories)

    # Run agent (session_manager handles turn storage automatically)
    with AgentCoreMemorySessionManager(config) as session_manager:
        agent = Agent(system_prompt=system_prompt, session_manager=session_manager)
        return {"response": agent(payload["message"]).message}

What AgentCore Memory Pricing Covers

From the pricing page and AWS re:Post confirmation:

Built-in strategies (SUMMARIZATION, USER_PREFERENCE, SEMANTIC): model extraction costs are included in Memory pricing
Self-managed strategy: you pay for your own Lambda + Bedrock calls
Memory storage: billed per GB stored
RetrieveMemoryRecords (semantic search): billed per search

Exact rates not yet published clearly, but designed to be low for personal assistant scale.

Open Questions Remaining

Pricing for AgentCore Memory: exact rates for storage + retrieval not clearly published yet. Need to check when actually provisioning.
S3 persona file cache invalidation: when SOUL.md is updated in S3, the warm container won't know. Need a mechanism — either DynamoDB version flag checked at session start, or just accept ~8hr staleness (fine for persona files).
Self-managed extraction timing: confirm whether idle-session trigger in self-managed strategy fires reliably at session end vs requiring explicit trigger. This determines whether the "write to memory" tool works reliably.

Research: 2026-05-04. Sources: AgentCore Memory docs (memory-types, memory-strategies, memory-organization, strands integration), AgentCore pricing page.

10 KiB Raw Blame History