Files
agent-claw/build-plan.md
daniel 0369a74ac1 Initial research: OpenClaw on AgentCore architecture
- Architecture comparison (OpenClaw daemon vs AgentCore serverless)
- Component compatibility analysis
- Fargate analysis
- AgentCore rebuild plan (Telegram, zero always-on compute)
- Memory strategy: AgentCore Memory + factbase as structured KB
- Serverless relay patterns per channel
- All open questions resolved
- OpenClaw feature delta March→May 2026
- Build phases and cost estimates
2026-05-04 08:28:52 -05:00

237 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute)
## Target Architecture
```
[Telegram User]
│ message
[Telegram Servers]
│ POST (webhook)
[API Gateway (HTTP API)]
[Lambda: tg-ingest] ← verify sig, send typing action, enqueue
│ SQS message
[SQS: agent-queue]
│ trigger
[Lambda: agent-runner] ← load workspace from S3, build system prompt,
│ InvokeAgentRuntime map chat_id → session_id
[AgentCore Runtime] ← Strands agent container (ARM64)
│ streaming response tools: web_search, read/write S3, memory
[Lambda: agent-runner] ← stream reply back
│ Telegram Bot API
[Telegram User] ← receives message
[EventBridge Scheduler] ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt)
│ │
▼ ▼ same response routing
[Lambda: heartbeat-trigger] [Telegram Bot API]
```
**No 24/7 compute anywhere.** Everything is event-driven.
---
## What We've Answered
- ✅ AgentCore is the right runtime (stateless container, event-driven)
- ✅ Telegram supports full webhook mode (all message types)
- ✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s)
- ✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3
- ✅ System prompt construction logic is portable (pure string ops)
- ✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool
- ✅ EventBridge handles heartbeat and cron (no gateway process needed)
- ✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction
- ✅ InvokeAgentRuntime supports streaming responses
- ✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s)
- ✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation
- ✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container)
- ✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories()
---
## Open Questions (Not Yet Answered)
### 🔴 Critical — blocks architecture decisions
**Q1: Response routing for async runs**
When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll?
*Why it matters*: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern.
*Research needed*: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications.
**Q2: Session ID strategy and daily session lifecycle**
`idleRuntimeSessionTimeout` is configurable (60s8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created.
- Map Telegram `chat_id``runtimeSessionId` in DynamoDB (create new session ID at start of day / when previous session maxes out)
- On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration
- The 8hr session boundary is a daily rhythm, not a UX problem
*Simplified*: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3.
**Q3: AgentCore Memory — is long-term extraction automatic or manual?**
The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every `add_turns()` call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)?
*Why it matters*: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly.
**Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory**
When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths:
- Write to S3, reload on next invocation — simple but doesn't benefit from semantic search
- Write to AgentCore Memory — benefits from extraction + search but changes the access pattern
Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history?
**Q5: Cold start UX impact — first session only**
AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the *first* invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant.
- Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation?
- What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)?
**Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity**
AgentCore requires ARM64 containers. Strands is Python. The base image needs:
- Python 3.11+
- `strands-agents`, `bedrock-agentcore` pip packages
- AWS credentials via task role (IAM)
- Access to Bedrock models (need to check regional availability for the models we want)
What's the actual container build + push + deploy flow? Is there a starter template?
---
### 🟡 Important — needs answer before first spike
**Q7: Which Bedrock model and region?**
AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency.
Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model?
**Q8: Telegram → AgentCore payload structure**
The Telegram Update object contains `message.chat.id`, `message.from.id`, `message.text`, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain?
**Q9: Telegram response back to user — token management**
The agent-runner Lambda needs to call `api.telegram.org/bot{token}/sendMessage` after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one.
**Q10: Heartbeat response delivery**
The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver).
Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env).
**Q11: Multi-turn within a single AgentCore session**
If a user sends 3 rapid messages (before the session expires), do they all land in the same `runtimeSessionId`? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired.
Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock.
**Q12: Telegram send_chat_action ("typing") timing**
Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming?
---
### 🟢 Lower priority — figure out during build
**Q13: What tools does the container expose?**
OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set?
- `read_file(path)` — S3 workspace
- `write_file(path, content)` — S3 workspace
- `web_search(query)` — Brave API
- `web_fetch(url)` — HTTP + readability
- `memory_search(query)` — AgentCore Memory
- `send_telegram_message(text)` — for multi-message replies? or just return the response?
Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation.
**Q14: Cron job management from within the agent**
OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a `create_cron_job` tool would need to call `eventbridge.put_rule()`. Doable but needs IAM permissions baked in. Scope for v2.
**Q15: Secrets rotation**
Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call.
**Q16: IaC choice**
CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters.
---
## Proposed Build Phases
### Phase 0 — Spike (1-2 days)
Answer Q1, Q2, Q5 by actually running the thing:
- Deploy the smallest possible Strands container to AgentCore
- Send it a test InvokeAgentRuntime call
- Measure cold start latency in practice
- Test what happens when a session expires and you reinvoke with the same ID
### Phase 1 — Telegram → Agent → Response (1 week)
- API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204)
- SQS queue
- agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply)
- AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool
- S3 workspace bucket with SOUL.md, AGENTS.md, USER.md
- DynamoDB: chat_id → session_id mapping
**Done when**: can send a Telegram message and get a reply from the agent, personality intact.
### Phase 2 — Memory + Workspace (1 week)
- AgentCore Memory provisioned (memory_id per user)
- Conversation history stored after each turn
- Long-term memory extraction confirmed working
- MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search
- write_file / read_file tools pointing at S3 workspace
**Done when**: agent remembers things across sessions (>15min gaps).
### Phase 3 — Heartbeat + Cron (3-4 days)
- EventBridge rule (every 30m)
- heartbeat-trigger Lambda
- HEARTBEAT_OK suppression logic
- Delivery to configurable Telegram chat ID
**Done when**: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram.
### Phase 4 — Polish (ongoing)
- Typing indicator refresh during long runs
- Additional tools (image gen, TTS)
- Error handling + DLQ
- CDK/IaC for reproducible deploys
- Cost monitoring
---
## Cost Estimate (Personal Scale, ~50 agent runs/day)
| Service | Est. Monthly Cost | Notes |
|---|---|---|
| API Gateway (HTTP) | ~$0.01 | <1M requests/mo |
| Lambda (ingest + runner + heartbeat) | ~$0.50 | ~2000 invocations/day, avg 30s |
| SQS | ~$0.00 | Free tier |
| AgentCore Runtime | ~$5-15 | 50 runs/day × 30s avg × ~$0.0x/compute-sec |
| AgentCore Memory | TBD | Pricing not fully public yet |
| S3 (workspace files) | ~$0.01 | <1 MB total |
| DynamoDB (session mapping) | ~$0.01 | On-demand, minimal reads/writes |
| Bedrock LLM calls | $20-80 | Same as today — model-dependent |
| EventBridge | ~$0.00 | <100 rules/events/mo |
| Secrets Manager | ~$0.40 | $0.40/secret/mo |
| **Total infra (ex-LLM)** | **~$6-20/mo** | vs ~$26/mo for Fargate |
**Zero always-on compute cost.** Pay only when messages arrive.
---
## Immediate Next Steps
1. **Answer Q1 + Q2 with a spike** — deploy toy Strands container, measure cold start, test session expiry behavior
2. **Clarify AgentCore Memory extraction** (Q3) — read the full SDK docs + test
3. **Lock the Telegram payload schema** (Q8) — define what goes in InvokeAgentRuntime payload
4. **Pick region + model** (Q7) — confirm Sonnet availability in target region
5. **Start Phase 1 build**
---
*Updated 2026-05-04*