# AgentCore Memory — Deep Dive & MEMORY.md Replacement Analysis ## How AgentCore Memory Works ### Architecture Memory is a managed service completely separate from the Runtime container. Two tiers: ``` Short-term (events) Long-term (extracted records) ───────────────────── ────────────────────────────── actor → session → events actor → namespace → records (semantic search available) (cross-session, persistent) ``` ### Short-term Memory - Stores raw conversation events (turns) via `CreateEvent` API - Keyed by `actor_id` + `session_id` - Retrieval: `ListEvents(session_id)` → full conversation history - **Survives microVM termination** — stored in the managed service, not the container - This replaces JSONL session transcripts completely ### Long-term Memory — Three Built-in Strategies Configured when creating a Memory resource. Extraction runs **asynchronously in the background** after each `CreateEvent`. Model costs for built-in extraction are **included in AgentCore Memory pricing** (confirmed by AWS support). | Strategy | What it extracts | Namespace pattern | |---|---|---| | `SUMMARIZATION` | Session summaries | `/summaries/{actorId}/{sessionId}/` | | `USER_PREFERENCE` | Preferences, habits, recurring facts | `/preferences/{actorId}/` | | `SEMANTIC` | Raw facts, entities, knowledge | `/facts/{actorId}/` | All three can run on the same memory resource simultaneously. ### Self-managed Strategy You control the entire extraction pipeline: 1. Configure triggers: message count (`messageCount: 6`), token count (`tokenCount: 1000`), or idle timeout (`idleSessionTimeout: 30`) 2. AgentCore writes conversation payload to **your S3 bucket** 3. Publishes notification to **your SNS topic** 4. Your Lambda picks it up, runs whatever extraction logic you want 5. You write results back via `BatchCreateMemoryRecords` This is the **MEMORY.md pattern but managed in the cloud** — you decide what to write and how. ### Strands Integration The Strands `AgentCoreMemorySessionManager` handles everything automatically: ```python config = AgentCoreMemoryConfig( memory_id=MEMORY_ID, session_id=SESSION_ID, # maps to Telegram chat_id + date actor_id=ACTOR_ID, # = user identity batch_size=5, # buffer 5 turns before flushing to save API calls ) with AgentCoreMemorySessionManager(config) as session_manager: agent = Agent( system_prompt=build_system_prompt(), # SOUL.md + AGENTS.md + retrieved memories session_manager=session_manager, ) response = agent(user_message) # on exit: buffers flushed, async long-term extraction kicks off ``` Every conversation turn is automatically stored. `batch_size` reduces API calls for rapid exchanges. --- ## MEMORY.md vs AgentCore Memory ### What MEMORY.md Does Today - Curated long-term memory the agent manually edits - Loaded wholesale into the system prompt each session - Agent writes specific things it wants to remember - Human-readable markdown ### What AgentCore Memory Provides - **Short-term**: full conversation history per session (replaces JSONL) - **Long-term SUMMARIZATION**: session summaries auto-extracted - **Long-term USER_PREFERENCE**: preferences auto-extracted and consolidated across sessions - **Long-term SEMANTIC**: facts/entities auto-extracted - **Semantic search**: `RetrieveMemoryRecords(query="...")` → relevant memories surfaced into system prompt - **Self-managed strategy**: explicit "write this to memory" control, just like the agent writing MEMORY.md ### Verdict: Replace MEMORY.md with AgentCore Memory AgentCore Memory is strictly more powerful: - Auto-extraction means the agent doesn't have to manually curate (though it can via self-managed strategy) - Semantic search means you don't inject ALL memories into the system prompt — you inject the RELEVANT ones - No MEMORY.md bloat: today MEMORY.md grows unbounded; AgentCore Memory consolidates automatically - Cross-session persistence without any file I/O **The tradeoff**: less direct control over what gets written. Mitigated with self-managed strategy for explicit writes. --- ## The S3 Round-Trip Concern — Addressed Daniel's concern: S3 round-trip on every interaction. With AgentCore Memory + Strands: | What | When | Round-trip? | |---|---|---| | Conversation turns (short-term) | Each turn, async/batched | Non-blocking, buffered by `batch_size` | | Long-term extraction | Background async after turns | Zero latency impact | | Memory retrieval (session start) | Once per session | One `RetrieveMemoryRecords` call, ~50ms | | Personality files (SOUL.md etc.) | Once per session start | See below | **For personality files specifically**: load them once when the session starts, cache in the container's in-memory dict. The same warm microVM handles all messages in an 8-hour session — SOUL.md loads once, not once per message. No per-message S3 calls. In practice, the flow is: ``` Session start (once): 1. Load SOUL.md, AGENTS.md, USER.md from S3 → cache in container memory 2. RetrieveMemoryRecords(query="important context, preferences") → top-k memories 3. Build system_prompt = static_files + retrieved_memories 4. Pass to Strands agent Each message (no extra round-trips): - Strands auto-stores turns to AgentCore Memory (async/batched) - Long-term extraction runs in background ``` --- ## Recommended Storage Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ S3 (persona bucket) │ │ SOUL.md, AGENTS.md, IDENTITY.md, USER.md, HEARTBEAT.md │ │ → Loaded ONCE at session start, cached in container memory │ │ → Updated rarely (when Daniel edits them) │ └─────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ AgentCore Memory (replaces MEMORY.md + JSONL transcripts) │ │ │ │ Short-term: conversation turns (per session) │ │ → Strands session_manager handles automatically │ │ │ │ Long-term strategies: │ │ SUMMARIZATION → /summaries/{actorId}/{sessionId}/ │ │ USER_PREFERENCE → /preferences/{actorId}/ │ │ SEMANTIC → /facts/{actorId}/ │ │ │ │ Self-managed strategy (for explicit "remember this"): │ │ Trigger: idle timeout or message count │ │ SNS → Lambda → custom extraction → BatchCreateMemoryRecords│ │ → "/curated/{actorId}/" namespace │ │ → This is the MEMORY.md equivalent, automated │ └─────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ DynamoDB │ │ telegram_chat_id → agentcore_session_id + actor_id │ │ heartbeat state (last check timestamps) │ │ cron job definitions │ └─────────────────────────────────────────────────────────────┘ ``` ### Session Start Pattern ```python @app.entrypoint async def main(payload, context): actor_id = payload["actor_id"] # = Telegram user ID session_id = payload["session_id"] # = from DynamoDB lookup # Load static files (once per warm session, cached) if not PERSONA_CACHE.loaded: PERSONA_CACHE.update(load_from_s3(["SOUL.md", "AGENTS.md", "USER.md"])) # Retrieve relevant long-term memories (semantic search) memories = memory_session.search_long_term_memories( query=payload["message"], namespace_prefix=f"/preferences/{actor_id}/", top_k=5 ) # Build system prompt system_prompt = build_prompt(PERSONA_CACHE, memories) # Run agent (session_manager handles turn storage automatically) with AgentCoreMemorySessionManager(config) as session_manager: agent = Agent(system_prompt=system_prompt, session_manager=session_manager) return {"response": agent(payload["message"]).message} ``` --- ## What AgentCore Memory Pricing Covers From the pricing page and AWS re:Post confirmation: - **Built-in strategies** (SUMMARIZATION, USER_PREFERENCE, SEMANTIC): model extraction costs are **included** in Memory pricing - **Self-managed strategy**: you pay for your own Lambda + Bedrock calls - Memory storage: billed per GB stored - `RetrieveMemoryRecords` (semantic search): billed per search Exact rates not yet published clearly, but designed to be low for personal assistant scale. --- ## Open Questions Remaining 1. **Pricing for AgentCore Memory**: exact rates for storage + retrieval not clearly published yet. Need to check when actually provisioning. 2. **S3 persona file cache invalidation**: when SOUL.md is updated in S3, the warm container won't know. Need a mechanism — either DynamoDB version flag checked at session start, or just accept ~8hr staleness (fine for persona files). 3. **Self-managed extraction timing**: confirm whether idle-session trigger in self-managed strategy fires reliably at session end vs requiring explicit trigger. This determines whether the "write to memory" tool works reliably. --- *Research: 2026-05-04. Sources: AgentCore Memory docs (memory-types, memory-strategies, memory-organization, strands integration), AgentCore pricing page.*