Initial research: OpenClaw on AgentCore architecture
- Architecture comparison (OpenClaw daemon vs AgentCore serverless) - Component compatibility analysis - Fargate analysis - AgentCore rebuild plan (Telegram, zero always-on compute) - Memory strategy: AgentCore Memory + factbase as structured KB - Serverless relay patterns per channel - All open questions resolved - OpenClaw feature delta March→May 2026 - Build phases and cost estimates
This commit is contained in:
236
build-plan.md
Normal file
236
build-plan.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute)
|
||||
|
||||
## Target Architecture
|
||||
|
||||
```
|
||||
[Telegram User]
|
||||
│ message
|
||||
▼
|
||||
[Telegram Servers]
|
||||
│ POST (webhook)
|
||||
▼
|
||||
[API Gateway (HTTP API)]
|
||||
│
|
||||
▼
|
||||
[Lambda: tg-ingest] ← verify sig, send typing action, enqueue
|
||||
│ SQS message
|
||||
▼
|
||||
[SQS: agent-queue]
|
||||
│ trigger
|
||||
▼
|
||||
[Lambda: agent-runner] ← load workspace from S3, build system prompt,
|
||||
│ InvokeAgentRuntime map chat_id → session_id
|
||||
▼
|
||||
[AgentCore Runtime] ← Strands agent container (ARM64)
|
||||
│ streaming response tools: web_search, read/write S3, memory
|
||||
▼
|
||||
[Lambda: agent-runner] ← stream reply back
|
||||
│ Telegram Bot API
|
||||
▼
|
||||
[Telegram User] ← receives message
|
||||
|
||||
[EventBridge Scheduler] ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt)
|
||||
│ │
|
||||
▼ ▼ same response routing
|
||||
[Lambda: heartbeat-trigger] [Telegram Bot API]
|
||||
```
|
||||
|
||||
**No 24/7 compute anywhere.** Everything is event-driven.
|
||||
|
||||
---
|
||||
|
||||
## What We've Answered
|
||||
|
||||
- ✅ AgentCore is the right runtime (stateless container, event-driven)
|
||||
- ✅ Telegram supports full webhook mode (all message types)
|
||||
- ✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s)
|
||||
- ✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3
|
||||
- ✅ System prompt construction logic is portable (pure string ops)
|
||||
- ✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool
|
||||
- ✅ EventBridge handles heartbeat and cron (no gateway process needed)
|
||||
- ✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction
|
||||
- ✅ InvokeAgentRuntime supports streaming responses
|
||||
- ✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s)
|
||||
- ✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation
|
||||
- ✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container)
|
||||
- ✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories()
|
||||
|
||||
---
|
||||
|
||||
## Open Questions (Not Yet Answered)
|
||||
|
||||
### 🔴 Critical — blocks architecture decisions
|
||||
|
||||
**Q1: Response routing for async runs**
|
||||
When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll?
|
||||
|
||||
*Why it matters*: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern.
|
||||
|
||||
*Research needed*: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications.
|
||||
|
||||
**Q2: Session ID strategy and daily session lifecycle**
|
||||
`idleRuntimeSessionTimeout` is configurable (60s–8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created.
|
||||
|
||||
- Map Telegram `chat_id` → `runtimeSessionId` in DynamoDB (create new session ID at start of day / when previous session maxes out)
|
||||
- On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration
|
||||
- The 8hr session boundary is a daily rhythm, not a UX problem
|
||||
|
||||
*Simplified*: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3.
|
||||
|
||||
**Q3: AgentCore Memory — is long-term extraction automatic or manual?**
|
||||
The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every `add_turns()` call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)?
|
||||
|
||||
*Why it matters*: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly.
|
||||
|
||||
**Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory**
|
||||
When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths:
|
||||
- Write to S3, reload on next invocation — simple but doesn't benefit from semantic search
|
||||
- Write to AgentCore Memory — benefits from extraction + search but changes the access pattern
|
||||
|
||||
Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history?
|
||||
|
||||
**Q5: Cold start UX impact — first session only**
|
||||
AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the *first* invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant.
|
||||
|
||||
- Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation?
|
||||
- What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)?
|
||||
|
||||
**Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity**
|
||||
AgentCore requires ARM64 containers. Strands is Python. The base image needs:
|
||||
- Python 3.11+
|
||||
- `strands-agents`, `bedrock-agentcore` pip packages
|
||||
- AWS credentials via task role (IAM)
|
||||
- Access to Bedrock models (need to check regional availability for the models we want)
|
||||
|
||||
What's the actual container build + push + deploy flow? Is there a starter template?
|
||||
|
||||
---
|
||||
|
||||
### 🟡 Important — needs answer before first spike
|
||||
|
||||
**Q7: Which Bedrock model and region?**
|
||||
AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency.
|
||||
|
||||
Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model?
|
||||
|
||||
**Q8: Telegram → AgentCore payload structure**
|
||||
The Telegram Update object contains `message.chat.id`, `message.from.id`, `message.text`, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain?
|
||||
|
||||
**Q9: Telegram response back to user — token management**
|
||||
The agent-runner Lambda needs to call `api.telegram.org/bot{token}/sendMessage` after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one.
|
||||
|
||||
**Q10: Heartbeat response delivery**
|
||||
The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver).
|
||||
|
||||
Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env).
|
||||
|
||||
**Q11: Multi-turn within a single AgentCore session**
|
||||
If a user sends 3 rapid messages (before the session expires), do they all land in the same `runtimeSessionId`? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired.
|
||||
|
||||
Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock.
|
||||
|
||||
**Q12: Telegram send_chat_action ("typing") timing**
|
||||
Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming?
|
||||
|
||||
---
|
||||
|
||||
### 🟢 Lower priority — figure out during build
|
||||
|
||||
**Q13: What tools does the container expose?**
|
||||
OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set?
|
||||
- `read_file(path)` — S3 workspace
|
||||
- `write_file(path, content)` — S3 workspace
|
||||
- `web_search(query)` — Brave API
|
||||
- `web_fetch(url)` — HTTP + readability
|
||||
- `memory_search(query)` — AgentCore Memory
|
||||
- `send_telegram_message(text)` — for multi-message replies? or just return the response?
|
||||
|
||||
Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation.
|
||||
|
||||
**Q14: Cron job management from within the agent**
|
||||
OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a `create_cron_job` tool would need to call `eventbridge.put_rule()`. Doable but needs IAM permissions baked in. Scope for v2.
|
||||
|
||||
**Q15: Secrets rotation**
|
||||
Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call.
|
||||
|
||||
**Q16: IaC choice**
|
||||
CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters.
|
||||
|
||||
---
|
||||
|
||||
## Proposed Build Phases
|
||||
|
||||
### Phase 0 — Spike (1-2 days)
|
||||
Answer Q1, Q2, Q5 by actually running the thing:
|
||||
- Deploy the smallest possible Strands container to AgentCore
|
||||
- Send it a test InvokeAgentRuntime call
|
||||
- Measure cold start latency in practice
|
||||
- Test what happens when a session expires and you reinvoke with the same ID
|
||||
|
||||
### Phase 1 — Telegram → Agent → Response (1 week)
|
||||
- API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204)
|
||||
- SQS queue
|
||||
- agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply)
|
||||
- AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool
|
||||
- S3 workspace bucket with SOUL.md, AGENTS.md, USER.md
|
||||
- DynamoDB: chat_id → session_id mapping
|
||||
|
||||
**Done when**: can send a Telegram message and get a reply from the agent, personality intact.
|
||||
|
||||
### Phase 2 — Memory + Workspace (1 week)
|
||||
- AgentCore Memory provisioned (memory_id per user)
|
||||
- Conversation history stored after each turn
|
||||
- Long-term memory extraction confirmed working
|
||||
- MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search
|
||||
- write_file / read_file tools pointing at S3 workspace
|
||||
|
||||
**Done when**: agent remembers things across sessions (>15min gaps).
|
||||
|
||||
### Phase 3 — Heartbeat + Cron (3-4 days)
|
||||
- EventBridge rule (every 30m)
|
||||
- heartbeat-trigger Lambda
|
||||
- HEARTBEAT_OK suppression logic
|
||||
- Delivery to configurable Telegram chat ID
|
||||
|
||||
**Done when**: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram.
|
||||
|
||||
### Phase 4 — Polish (ongoing)
|
||||
- Typing indicator refresh during long runs
|
||||
- Additional tools (image gen, TTS)
|
||||
- Error handling + DLQ
|
||||
- CDK/IaC for reproducible deploys
|
||||
- Cost monitoring
|
||||
|
||||
---
|
||||
|
||||
## Cost Estimate (Personal Scale, ~50 agent runs/day)
|
||||
|
||||
| Service | Est. Monthly Cost | Notes |
|
||||
|---|---|---|
|
||||
| API Gateway (HTTP) | ~$0.01 | <1M requests/mo |
|
||||
| Lambda (ingest + runner + heartbeat) | ~$0.50 | ~2000 invocations/day, avg 30s |
|
||||
| SQS | ~$0.00 | Free tier |
|
||||
| AgentCore Runtime | ~$5-15 | 50 runs/day × 30s avg × ~$0.0x/compute-sec |
|
||||
| AgentCore Memory | TBD | Pricing not fully public yet |
|
||||
| S3 (workspace files) | ~$0.01 | <1 MB total |
|
||||
| DynamoDB (session mapping) | ~$0.01 | On-demand, minimal reads/writes |
|
||||
| Bedrock LLM calls | $20-80 | Same as today — model-dependent |
|
||||
| EventBridge | ~$0.00 | <100 rules/events/mo |
|
||||
| Secrets Manager | ~$0.40 | $0.40/secret/mo |
|
||||
| **Total infra (ex-LLM)** | **~$6-20/mo** | vs ~$26/mo for Fargate |
|
||||
|
||||
**Zero always-on compute cost.** Pay only when messages arrive.
|
||||
|
||||
---
|
||||
|
||||
## Immediate Next Steps
|
||||
|
||||
1. **Answer Q1 + Q2 with a spike** — deploy toy Strands container, measure cold start, test session expiry behavior
|
||||
2. **Clarify AgentCore Memory extraction** (Q3) — read the full SDK docs + test
|
||||
3. **Lock the Telegram payload schema** (Q8) — define what goes in InvokeAgentRuntime payload
|
||||
4. **Pick region + model** (Q7) — confirm Sonnet availability in target region
|
||||
5. **Start Phase 1 build**
|
||||
|
||||
---
|
||||
|
||||
*Updated 2026-05-04*
|
||||
Reference in New Issue
Block a user