Initial research: OpenClaw on AgentCore architecture

- Architecture comparison (OpenClaw daemon vs AgentCore serverless) - Component compatibility analysis - Fargate analysis - AgentCore rebuild plan (Telegram, zero always-on compute) - Memory strategy: AgentCore Memory + factbase as structured KB - Serverless relay patterns per channel - All open questions resolved - OpenClaw feature delta March→May 2026 - Build phases and cost estimates
2026-05-04 08:28:52 -05:00
parent 4afa16a9cd
commit 0369a74ac1
13 changed files with 1876 additions and 1 deletions
--- a/build-plan.md
+++ b/build-plan.md
@@ -0,0 +1,236 @@
+# Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute)
+
+## Target Architecture
+
+```
+[Telegram User]
+      │ message
+      ▼
+[Telegram Servers]
+      │ POST (webhook)
+      ▼
+[API Gateway (HTTP API)]
+      │
+      ▼
+[Lambda: tg-ingest]          ← verify sig, send typing action, enqueue
+      │ SQS message
+      ▼
+[SQS: agent-queue]
+      │ trigger
+      ▼
+[Lambda: agent-runner]       ← load workspace from S3, build system prompt,
+      │ InvokeAgentRuntime       map chat_id → session_id
+      ▼
+[AgentCore Runtime]          ← Strands agent container (ARM64)
+      │ streaming response       tools: web_search, read/write S3, memory
+      ▼
+[Lambda: agent-runner]       ← stream reply back
+      │ Telegram Bot API
+      ▼
+[Telegram User]              ← receives message
+
+[EventBridge Scheduler]      ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt)
+      │                                                      │
+      ▼                                                      ▼ same response routing
+[Lambda: heartbeat-trigger]                          [Telegram Bot API]
+```
+
+**No 24/7 compute anywhere.** Everything is event-driven.
+
+---
+
+## What We've Answered
+
+- ✅ AgentCore is the right runtime (stateless container, event-driven)
+- ✅ Telegram supports full webhook mode (all message types)
+- ✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s)
+- ✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3
+- ✅ System prompt construction logic is portable (pure string ops)
+- ✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool
+- ✅ EventBridge handles heartbeat and cron (no gateway process needed)
+- ✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction
+- ✅ InvokeAgentRuntime supports streaming responses
+- ✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s)
+- ✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation
+- ✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container)
+- ✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories()
+
+---
+
+## Open Questions (Not Yet Answered)
+
+### 🔴 Critical — blocks architecture decisions
+
+**Q1: Response routing for async runs**
+When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll?
+
+*Why it matters*: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern.
+
+*Research needed*: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications.
+
+**Q2: Session ID strategy and daily session lifecycle**
+`idleRuntimeSessionTimeout` is configurable (60s–8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created.
+
+- Map Telegram `chat_id` → `runtimeSessionId` in DynamoDB (create new session ID at start of day / when previous session maxes out)
+- On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration
+- The 8hr session boundary is a daily rhythm, not a UX problem
+
+*Simplified*: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3.
+
+**Q3: AgentCore Memory — is long-term extraction automatic or manual?**
+The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every `add_turns()` call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)?
+
+*Why it matters*: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly.
+
+**Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory**
+When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths:
+- Write to S3, reload on next invocation — simple but doesn't benefit from semantic search
+- Write to AgentCore Memory — benefits from extraction + search but changes the access pattern
+
+Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history?
+
+**Q5: Cold start UX impact — first session only**
+AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the *first* invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant.
+
+- Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation?
+- What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)?
+
+**Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity**
+AgentCore requires ARM64 containers. Strands is Python. The base image needs:
+- Python 3.11+
+- `strands-agents`, `bedrock-agentcore` pip packages
+- AWS credentials via task role (IAM)
+- Access to Bedrock models (need to check regional availability for the models we want)
+
+What's the actual container build + push + deploy flow? Is there a starter template?
+
+---
+
+### 🟡 Important — needs answer before first spike
+
+**Q7: Which Bedrock model and region?**
+AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency. 
+
+Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model?
+
+**Q8: Telegram → AgentCore payload structure**
+The Telegram Update object contains `message.chat.id`, `message.from.id`, `message.text`, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain?
+
+**Q9: Telegram response back to user — token management**
+The agent-runner Lambda needs to call `api.telegram.org/bot{token}/sendMessage` after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one.
+
+**Q10: Heartbeat response delivery**
+The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver). 
+
+Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env).
+
+**Q11: Multi-turn within a single AgentCore session**
+If a user sends 3 rapid messages (before the session expires), do they all land in the same `runtimeSessionId`? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired.
+
+Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock.
+
+**Q12: Telegram send_chat_action ("typing") timing**
+Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming?
+
+---
+
+### 🟢 Lower priority — figure out during build
+
+**Q13: What tools does the container expose?**
+OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set?
+- `read_file(path)` — S3 workspace
+- `write_file(path, content)` — S3 workspace
+- `web_search(query)` — Brave API
+- `web_fetch(url)` — HTTP + readability
+- `memory_search(query)` — AgentCore Memory
+- `send_telegram_message(text)` — for multi-message replies? or just return the response?
+
+Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation.
+
+**Q14: Cron job management from within the agent**
+OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a `create_cron_job` tool would need to call `eventbridge.put_rule()`. Doable but needs IAM permissions baked in. Scope for v2.
+
+**Q15: Secrets rotation**
+Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call.
+
+**Q16: IaC choice**
+CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters.
+
+---
+
+## Proposed Build Phases
+
+### Phase 0 — Spike (1-2 days)
+Answer Q1, Q2, Q5 by actually running the thing:
+- Deploy the smallest possible Strands container to AgentCore
+- Send it a test InvokeAgentRuntime call
+- Measure cold start latency in practice
+- Test what happens when a session expires and you reinvoke with the same ID
+
+### Phase 1 — Telegram → Agent → Response (1 week)
+- API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204)
+- SQS queue
+- agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply)
+- AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool
+- S3 workspace bucket with SOUL.md, AGENTS.md, USER.md
+- DynamoDB: chat_id → session_id mapping
+
+**Done when**: can send a Telegram message and get a reply from the agent, personality intact.
+
+### Phase 2 — Memory + Workspace (1 week)
+- AgentCore Memory provisioned (memory_id per user)
+- Conversation history stored after each turn
+- Long-term memory extraction confirmed working
+- MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search
+- write_file / read_file tools pointing at S3 workspace
+
+**Done when**: agent remembers things across sessions (>15min gaps).
+
+### Phase 3 — Heartbeat + Cron (3-4 days)
+- EventBridge rule (every 30m)
+- heartbeat-trigger Lambda
+- HEARTBEAT_OK suppression logic
+- Delivery to configurable Telegram chat ID
+
+**Done when**: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram.
+
+### Phase 4 — Polish (ongoing)
+- Typing indicator refresh during long runs
+- Additional tools (image gen, TTS)
+- Error handling + DLQ
+- CDK/IaC for reproducible deploys
+- Cost monitoring
+
+---
+
+## Cost Estimate (Personal Scale, ~50 agent runs/day)
+
+| Service | Est. Monthly Cost | Notes |
+|---|---|---|
+| API Gateway (HTTP) | ~$0.01 | <1M requests/mo |
+| Lambda (ingest + runner + heartbeat) | ~$0.50 | ~2000 invocations/day, avg 30s |
+| SQS | ~$0.00 | Free tier |
+| AgentCore Runtime | ~$5-15 | 50 runs/day × 30s avg × ~$0.0x/compute-sec |
+| AgentCore Memory | TBD | Pricing not fully public yet |
+| S3 (workspace files) | ~$0.01 | <1 MB total |
+| DynamoDB (session mapping) | ~$0.01 | On-demand, minimal reads/writes |
+| Bedrock LLM calls | $20-80 | Same as today — model-dependent |
+| EventBridge | ~$0.00 | <100 rules/events/mo |
+| Secrets Manager | ~$0.40 | $0.40/secret/mo |
+| **Total infra (ex-LLM)** | **~$6-20/mo** | vs ~$26/mo for Fargate |
+
+**Zero always-on compute cost.** Pay only when messages arrive.
+
+---
+
+## Immediate Next Steps
+
+1. **Answer Q1 + Q2 with a spike** — deploy toy Strands container, measure cold start, test session expiry behavior
+2. **Clarify AgentCore Memory extraction** (Q3) — read the full SDK docs + test
+3. **Lock the Telegram payload schema** (Q8) — define what goes in InvokeAgentRuntime payload
+4. **Pick region + model** (Q7) — confirm Sonnet availability in target region
+5. **Start Phase 1 build**
+
+---
+
+*Updated 2026-05-04*