Files

daniel 0369a74ac1 Initial research: OpenClaw on AgentCore architecture

- Architecture comparison (OpenClaw daemon vs AgentCore serverless)
- Component compatibility analysis
- Fargate analysis
- AgentCore rebuild plan (Telegram, zero always-on compute)
- Memory strategy: AgentCore Memory + factbase as structured KB
- Serverless relay patterns per channel
- All open questions resolved
- OpenClaw feature delta March→May 2026
- Build phases and cost estimates

2026-05-04 08:28:52 -05:00

12 KiB

Raw Blame History

Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute)

Target Architecture

[Telegram User]
      │ message
      ▼
[Telegram Servers]
      │ POST (webhook)
      ▼
[API Gateway (HTTP API)]
      │
      ▼
[Lambda: tg-ingest]          ← verify sig, send typing action, enqueue
      │ SQS message
      ▼
[SQS: agent-queue]
      │ trigger
      ▼
[Lambda: agent-runner]       ← load workspace from S3, build system prompt,
      │ InvokeAgentRuntime       map chat_id → session_id
      ▼
[AgentCore Runtime]          ← Strands agent container (ARM64)
      │ streaming response       tools: web_search, read/write S3, memory
      ▼
[Lambda: agent-runner]       ← stream reply back
      │ Telegram Bot API
      ▼
[Telegram User]              ← receives message

[EventBridge Scheduler]      ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt)
      │                                                      │
      ▼                                                      ▼ same response routing
[Lambda: heartbeat-trigger]                          [Telegram Bot API]

No 24/7 compute anywhere. Everything is event-driven.

What We've Answered

✅ AgentCore is the right runtime (stateless container, event-driven)
✅ Telegram supports full webhook mode (all message types)
✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s)
✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3
✅ System prompt construction logic is portable (pure string ops)
✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool
✅ EventBridge handles heartbeat and cron (no gateway process needed)
✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction
✅ InvokeAgentRuntime supports streaming responses
✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s)
✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation
✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container)
✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories()

Open Questions (Not Yet Answered)

🔴 Critical — blocks architecture decisions

Q1: Response routing for async runs When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll?

Why it matters: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern.

Research needed: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications.

Q2: Session ID strategy and daily session lifecycle idleRuntimeSessionTimeout is configurable (60s–8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created.

Map Telegram chat_id → runtimeSessionId in DynamoDB (create new session ID at start of day / when previous session maxes out)
On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration
The 8hr session boundary is a daily rhythm, not a UX problem

Simplified: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3.

Q3: AgentCore Memory — is long-term extraction automatic or manual? The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every add_turns() call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)?

Why it matters: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly.

Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths:

Write to S3, reload on next invocation — simple but doesn't benefit from semantic search
Write to AgentCore Memory — benefits from extraction + search but changes the access pattern

Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history?

Q5: Cold start UX impact — first session only AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the first invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant.

Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation?
What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)?

Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity AgentCore requires ARM64 containers. Strands is Python. The base image needs:

Python 3.11+
strands-agents, bedrock-agentcore pip packages
AWS credentials via task role (IAM)
Access to Bedrock models (need to check regional availability for the models we want)

What's the actual container build + push + deploy flow? Is there a starter template?

🟡 Important — needs answer before first spike

Q7: Which Bedrock model and region? AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency.

Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model?

Q8: Telegram → AgentCore payload structure The Telegram Update object contains message.chat.id, message.from.id, message.text, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain?

Q9: Telegram response back to user — token management The agent-runner Lambda needs to call api.telegram.org/bot{token}/sendMessage after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one.

Q10: Heartbeat response delivery The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver).

Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env).

Q11: Multi-turn within a single AgentCore session If a user sends 3 rapid messages (before the session expires), do they all land in the same runtimeSessionId? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired.

Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock.

Q12: Telegram send_chat_action ("typing") timing Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming?

🟢 Lower priority — figure out during build

Q13: What tools does the container expose? OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set?

read_file(path) — S3 workspace
write_file(path, content) — S3 workspace
web_search(query) — Brave API
web_fetch(url) — HTTP + readability
memory_search(query) — AgentCore Memory
send_telegram_message(text) — for multi-message replies? or just return the response?

Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation.

Q14: Cron job management from within the agent OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a create_cron_job tool would need to call eventbridge.put_rule(). Doable but needs IAM permissions baked in. Scope for v2.

Q15: Secrets rotation Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call.

Q16: IaC choice CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters.

Proposed Build Phases

Phase 0 — Spike (1-2 days)

Answer Q1, Q2, Q5 by actually running the thing:

Deploy the smallest possible Strands container to AgentCore
Send it a test InvokeAgentRuntime call
Measure cold start latency in practice
Test what happens when a session expires and you reinvoke with the same ID

Phase 1 — Telegram → Agent → Response (1 week)

API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204)
SQS queue
agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply)
AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool
S3 workspace bucket with SOUL.md, AGENTS.md, USER.md
DynamoDB: chat_id → session_id mapping

Done when: can send a Telegram message and get a reply from the agent, personality intact.

Phase 2 — Memory + Workspace (1 week)

AgentCore Memory provisioned (memory_id per user)
Conversation history stored after each turn
Long-term memory extraction confirmed working
MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search
write_file / read_file tools pointing at S3 workspace

Done when: agent remembers things across sessions (>15min gaps).

Phase 3 — Heartbeat + Cron (3-4 days)

EventBridge rule (every 30m)
heartbeat-trigger Lambda
HEARTBEAT_OK suppression logic
Delivery to configurable Telegram chat ID

Done when: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram.

Phase 4 — Polish (ongoing)

Typing indicator refresh during long runs
Additional tools (image gen, TTS)
Error handling + DLQ
CDK/IaC for reproducible deploys
Cost monitoring

Cost Estimate (Personal Scale, ~50 agent runs/day)

Service	Est. Monthly Cost	Notes
API Gateway (HTTP)	~$0.01	<1M requests/mo
Lambda (ingest + runner + heartbeat)	~$0.50	~2000 invocations/day, avg 30s
SQS	~$0.00	Free tier
AgentCore Runtime	~$5-15	50 runs/day × 30s avg × ~$0.0x/compute-sec
AgentCore Memory	TBD	Pricing not fully public yet
S3 (workspace files)	~$0.01	<1 MB total
DynamoDB (session mapping)	~$0.01	On-demand, minimal reads/writes
Bedrock LLM calls	$20-80	Same as today — model-dependent
EventBridge	~$0.00	<100 rules/events/mo
Secrets Manager	~$0.40	$0.40/secret/mo
Total infra (ex-LLM)	~$6-20/mo	vs ~$26/mo for Fargate

Zero always-on compute cost. Pay only when messages arrive.

Immediate Next Steps

Answer Q1 + Q2 with a spike — deploy toy Strands container, measure cold start, test session expiry behavior
Clarify AgentCore Memory extraction (Q3) — read the full SDK docs + test
Lock the Telegram payload schema (Q8) — define what goes in InvokeAgentRuntime payload
Pick region + model (Q7) — confirm Sonnet availability in target region
Start Phase 1 build

Updated 2026-05-04

12 KiB Raw Blame History Unescape Escape