# Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute) ## Target Architecture ``` [Telegram User] │ message ▼ [Telegram Servers] │ POST (webhook) ▼ [API Gateway (HTTP API)] │ ▼ [Lambda: tg-ingest] ← verify sig, send typing action, enqueue │ SQS message ▼ [SQS: agent-queue] │ trigger ▼ [Lambda: agent-runner] ← load workspace from S3, build system prompt, │ InvokeAgentRuntime map chat_id → session_id ▼ [AgentCore Runtime] ← Strands agent container (ARM64) │ streaming response tools: web_search, read/write S3, memory ▼ [Lambda: agent-runner] ← stream reply back │ Telegram Bot API ▼ [Telegram User] ← receives message [EventBridge Scheduler] ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt) │ │ ▼ ▼ same response routing [Lambda: heartbeat-trigger] [Telegram Bot API] ``` **No 24/7 compute anywhere.** Everything is event-driven. --- ## What We've Answered - ✅ AgentCore is the right runtime (stateless container, event-driven) - ✅ Telegram supports full webhook mode (all message types) - ✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s) - ✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3 - ✅ System prompt construction logic is portable (pure string ops) - ✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool - ✅ EventBridge handles heartbeat and cron (no gateway process needed) - ✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction - ✅ InvokeAgentRuntime supports streaming responses - ✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s) - ✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation - ✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container) - ✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories() --- ## Open Questions (Not Yet Answered) ### 🔴 Critical — blocks architecture decisions **Q1: Response routing for async runs** When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll? *Why it matters*: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern. *Research needed*: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications. **Q2: Session ID strategy and daily session lifecycle** `idleRuntimeSessionTimeout` is configurable (60s–8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created. - Map Telegram `chat_id` → `runtimeSessionId` in DynamoDB (create new session ID at start of day / when previous session maxes out) - On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration - The 8hr session boundary is a daily rhythm, not a UX problem *Simplified*: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3. **Q3: AgentCore Memory — is long-term extraction automatic or manual?** The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every `add_turns()` call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)? *Why it matters*: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly. **Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory** When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths: - Write to S3, reload on next invocation — simple but doesn't benefit from semantic search - Write to AgentCore Memory — benefits from extraction + search but changes the access pattern Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history? **Q5: Cold start UX impact — first session only** AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the *first* invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant. - Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation? - What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)? **Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity** AgentCore requires ARM64 containers. Strands is Python. The base image needs: - Python 3.11+ - `strands-agents`, `bedrock-agentcore` pip packages - AWS credentials via task role (IAM) - Access to Bedrock models (need to check regional availability for the models we want) What's the actual container build + push + deploy flow? Is there a starter template? --- ### 🟡 Important — needs answer before first spike **Q7: Which Bedrock model and region?** AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency. Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model? **Q8: Telegram → AgentCore payload structure** The Telegram Update object contains `message.chat.id`, `message.from.id`, `message.text`, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain? **Q9: Telegram response back to user — token management** The agent-runner Lambda needs to call `api.telegram.org/bot{token}/sendMessage` after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one. **Q10: Heartbeat response delivery** The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver). Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env). **Q11: Multi-turn within a single AgentCore session** If a user sends 3 rapid messages (before the session expires), do they all land in the same `runtimeSessionId`? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired. Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock. **Q12: Telegram send_chat_action ("typing") timing** Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming? --- ### 🟢 Lower priority — figure out during build **Q13: What tools does the container expose?** OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set? - `read_file(path)` — S3 workspace - `write_file(path, content)` — S3 workspace - `web_search(query)` — Brave API - `web_fetch(url)` — HTTP + readability - `memory_search(query)` — AgentCore Memory - `send_telegram_message(text)` — for multi-message replies? or just return the response? Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation. **Q14: Cron job management from within the agent** OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a `create_cron_job` tool would need to call `eventbridge.put_rule()`. Doable but needs IAM permissions baked in. Scope for v2. **Q15: Secrets rotation** Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call. **Q16: IaC choice** CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters. --- ## Proposed Build Phases ### Phase 0 — Spike (1-2 days) Answer Q1, Q2, Q5 by actually running the thing: - Deploy the smallest possible Strands container to AgentCore - Send it a test InvokeAgentRuntime call - Measure cold start latency in practice - Test what happens when a session expires and you reinvoke with the same ID ### Phase 1 — Telegram → Agent → Response (1 week) - API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204) - SQS queue - agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply) - AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool - S3 workspace bucket with SOUL.md, AGENTS.md, USER.md - DynamoDB: chat_id → session_id mapping **Done when**: can send a Telegram message and get a reply from the agent, personality intact. ### Phase 2 — Memory + Workspace (1 week) - AgentCore Memory provisioned (memory_id per user) - Conversation history stored after each turn - Long-term memory extraction confirmed working - MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search - write_file / read_file tools pointing at S3 workspace **Done when**: agent remembers things across sessions (>15min gaps). ### Phase 3 — Heartbeat + Cron (3-4 days) - EventBridge rule (every 30m) - heartbeat-trigger Lambda - HEARTBEAT_OK suppression logic - Delivery to configurable Telegram chat ID **Done when**: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram. ### Phase 4 — Polish (ongoing) - Typing indicator refresh during long runs - Additional tools (image gen, TTS) - Error handling + DLQ - CDK/IaC for reproducible deploys - Cost monitoring --- ## Cost Estimate (Personal Scale, ~50 agent runs/day) | Service | Est. Monthly Cost | Notes | |---|---|---| | API Gateway (HTTP) | ~$0.01 | <1M requests/mo | | Lambda (ingest + runner + heartbeat) | ~$0.50 | ~2000 invocations/day, avg 30s | | SQS | ~$0.00 | Free tier | | AgentCore Runtime | ~$5-15 | 50 runs/day × 30s avg × ~$0.0x/compute-sec | | AgentCore Memory | TBD | Pricing not fully public yet | | S3 (workspace files) | ~$0.01 | <1 MB total | | DynamoDB (session mapping) | ~$0.01 | On-demand, minimal reads/writes | | Bedrock LLM calls | $20-80 | Same as today — model-dependent | | EventBridge | ~$0.00 | <100 rules/events/mo | | Secrets Manager | ~$0.40 | $0.40/secret/mo | | **Total infra (ex-LLM)** | **~$6-20/mo** | vs ~$26/mo for Fargate | **Zero always-on compute cost.** Pay only when messages arrive. --- ## Immediate Next Steps 1. **Answer Q1 + Q2 with a spike** — deploy toy Strands container, measure cold start, test session expiry behavior 2. **Clarify AgentCore Memory extraction** (Q3) — read the full SDK docs + test 3. **Lock the Telegram payload schema** (Q8) — define what goes in InvokeAgentRuntime payload 4. **Pick region + model** (Q7) — confirm Sonnet availability in target region 5. **Start Phase 1 build** --- *Updated 2026-05-04*