- Architecture comparison (OpenClaw daemon vs AgentCore serverless) - Component compatibility analysis - Fargate analysis - AgentCore rebuild plan (Telegram, zero always-on compute) - Memory strategy: AgentCore Memory + factbase as structured KB - Serverless relay patterns per channel - All open questions resolved - OpenClaw feature delta March→May 2026 - Build phases and cost estimates
237 lines
12 KiB
Markdown
237 lines
12 KiB
Markdown
# Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute)
|
||
|
||
## Target Architecture
|
||
|
||
```
|
||
[Telegram User]
|
||
│ message
|
||
▼
|
||
[Telegram Servers]
|
||
│ POST (webhook)
|
||
▼
|
||
[API Gateway (HTTP API)]
|
||
│
|
||
▼
|
||
[Lambda: tg-ingest] ← verify sig, send typing action, enqueue
|
||
│ SQS message
|
||
▼
|
||
[SQS: agent-queue]
|
||
│ trigger
|
||
▼
|
||
[Lambda: agent-runner] ← load workspace from S3, build system prompt,
|
||
│ InvokeAgentRuntime map chat_id → session_id
|
||
▼
|
||
[AgentCore Runtime] ← Strands agent container (ARM64)
|
||
│ streaming response tools: web_search, read/write S3, memory
|
||
▼
|
||
[Lambda: agent-runner] ← stream reply back
|
||
│ Telegram Bot API
|
||
▼
|
||
[Telegram User] ← receives message
|
||
|
||
[EventBridge Scheduler] ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt)
|
||
│ │
|
||
▼ ▼ same response routing
|
||
[Lambda: heartbeat-trigger] [Telegram Bot API]
|
||
```
|
||
|
||
**No 24/7 compute anywhere.** Everything is event-driven.
|
||
|
||
---
|
||
|
||
## What We've Answered
|
||
|
||
- ✅ AgentCore is the right runtime (stateless container, event-driven)
|
||
- ✅ Telegram supports full webhook mode (all message types)
|
||
- ✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s)
|
||
- ✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3
|
||
- ✅ System prompt construction logic is portable (pure string ops)
|
||
- ✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool
|
||
- ✅ EventBridge handles heartbeat and cron (no gateway process needed)
|
||
- ✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction
|
||
- ✅ InvokeAgentRuntime supports streaming responses
|
||
- ✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s)
|
||
- ✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation
|
||
- ✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container)
|
||
- ✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories()
|
||
|
||
---
|
||
|
||
## Open Questions (Not Yet Answered)
|
||
|
||
### 🔴 Critical — blocks architecture decisions
|
||
|
||
**Q1: Response routing for async runs**
|
||
When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll?
|
||
|
||
*Why it matters*: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern.
|
||
|
||
*Research needed*: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications.
|
||
|
||
**Q2: Session ID strategy and daily session lifecycle**
|
||
`idleRuntimeSessionTimeout` is configurable (60s–8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created.
|
||
|
||
- Map Telegram `chat_id` → `runtimeSessionId` in DynamoDB (create new session ID at start of day / when previous session maxes out)
|
||
- On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration
|
||
- The 8hr session boundary is a daily rhythm, not a UX problem
|
||
|
||
*Simplified*: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3.
|
||
|
||
**Q3: AgentCore Memory — is long-term extraction automatic or manual?**
|
||
The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every `add_turns()` call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)?
|
||
|
||
*Why it matters*: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly.
|
||
|
||
**Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory**
|
||
When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths:
|
||
- Write to S3, reload on next invocation — simple but doesn't benefit from semantic search
|
||
- Write to AgentCore Memory — benefits from extraction + search but changes the access pattern
|
||
|
||
Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history?
|
||
|
||
**Q5: Cold start UX impact — first session only**
|
||
AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the *first* invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant.
|
||
|
||
- Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation?
|
||
- What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)?
|
||
|
||
**Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity**
|
||
AgentCore requires ARM64 containers. Strands is Python. The base image needs:
|
||
- Python 3.11+
|
||
- `strands-agents`, `bedrock-agentcore` pip packages
|
||
- AWS credentials via task role (IAM)
|
||
- Access to Bedrock models (need to check regional availability for the models we want)
|
||
|
||
What's the actual container build + push + deploy flow? Is there a starter template?
|
||
|
||
---
|
||
|
||
### 🟡 Important — needs answer before first spike
|
||
|
||
**Q7: Which Bedrock model and region?**
|
||
AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency.
|
||
|
||
Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model?
|
||
|
||
**Q8: Telegram → AgentCore payload structure**
|
||
The Telegram Update object contains `message.chat.id`, `message.from.id`, `message.text`, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain?
|
||
|
||
**Q9: Telegram response back to user — token management**
|
||
The agent-runner Lambda needs to call `api.telegram.org/bot{token}/sendMessage` after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one.
|
||
|
||
**Q10: Heartbeat response delivery**
|
||
The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver).
|
||
|
||
Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env).
|
||
|
||
**Q11: Multi-turn within a single AgentCore session**
|
||
If a user sends 3 rapid messages (before the session expires), do they all land in the same `runtimeSessionId`? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired.
|
||
|
||
Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock.
|
||
|
||
**Q12: Telegram send_chat_action ("typing") timing**
|
||
Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming?
|
||
|
||
---
|
||
|
||
### 🟢 Lower priority — figure out during build
|
||
|
||
**Q13: What tools does the container expose?**
|
||
OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set?
|
||
- `read_file(path)` — S3 workspace
|
||
- `write_file(path, content)` — S3 workspace
|
||
- `web_search(query)` — Brave API
|
||
- `web_fetch(url)` — HTTP + readability
|
||
- `memory_search(query)` — AgentCore Memory
|
||
- `send_telegram_message(text)` — for multi-message replies? or just return the response?
|
||
|
||
Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation.
|
||
|
||
**Q14: Cron job management from within the agent**
|
||
OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a `create_cron_job` tool would need to call `eventbridge.put_rule()`. Doable but needs IAM permissions baked in. Scope for v2.
|
||
|
||
**Q15: Secrets rotation**
|
||
Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call.
|
||
|
||
**Q16: IaC choice**
|
||
CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters.
|
||
|
||
---
|
||
|
||
## Proposed Build Phases
|
||
|
||
### Phase 0 — Spike (1-2 days)
|
||
Answer Q1, Q2, Q5 by actually running the thing:
|
||
- Deploy the smallest possible Strands container to AgentCore
|
||
- Send it a test InvokeAgentRuntime call
|
||
- Measure cold start latency in practice
|
||
- Test what happens when a session expires and you reinvoke with the same ID
|
||
|
||
### Phase 1 — Telegram → Agent → Response (1 week)
|
||
- API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204)
|
||
- SQS queue
|
||
- agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply)
|
||
- AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool
|
||
- S3 workspace bucket with SOUL.md, AGENTS.md, USER.md
|
||
- DynamoDB: chat_id → session_id mapping
|
||
|
||
**Done when**: can send a Telegram message and get a reply from the agent, personality intact.
|
||
|
||
### Phase 2 — Memory + Workspace (1 week)
|
||
- AgentCore Memory provisioned (memory_id per user)
|
||
- Conversation history stored after each turn
|
||
- Long-term memory extraction confirmed working
|
||
- MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search
|
||
- write_file / read_file tools pointing at S3 workspace
|
||
|
||
**Done when**: agent remembers things across sessions (>15min gaps).
|
||
|
||
### Phase 3 — Heartbeat + Cron (3-4 days)
|
||
- EventBridge rule (every 30m)
|
||
- heartbeat-trigger Lambda
|
||
- HEARTBEAT_OK suppression logic
|
||
- Delivery to configurable Telegram chat ID
|
||
|
||
**Done when**: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram.
|
||
|
||
### Phase 4 — Polish (ongoing)
|
||
- Typing indicator refresh during long runs
|
||
- Additional tools (image gen, TTS)
|
||
- Error handling + DLQ
|
||
- CDK/IaC for reproducible deploys
|
||
- Cost monitoring
|
||
|
||
---
|
||
|
||
## Cost Estimate (Personal Scale, ~50 agent runs/day)
|
||
|
||
| Service | Est. Monthly Cost | Notes |
|
||
|---|---|---|
|
||
| API Gateway (HTTP) | ~$0.01 | <1M requests/mo |
|
||
| Lambda (ingest + runner + heartbeat) | ~$0.50 | ~2000 invocations/day, avg 30s |
|
||
| SQS | ~$0.00 | Free tier |
|
||
| AgentCore Runtime | ~$5-15 | 50 runs/day × 30s avg × ~$0.0x/compute-sec |
|
||
| AgentCore Memory | TBD | Pricing not fully public yet |
|
||
| S3 (workspace files) | ~$0.01 | <1 MB total |
|
||
| DynamoDB (session mapping) | ~$0.01 | On-demand, minimal reads/writes |
|
||
| Bedrock LLM calls | $20-80 | Same as today — model-dependent |
|
||
| EventBridge | ~$0.00 | <100 rules/events/mo |
|
||
| Secrets Manager | ~$0.40 | $0.40/secret/mo |
|
||
| **Total infra (ex-LLM)** | **~$6-20/mo** | vs ~$26/mo for Fargate |
|
||
|
||
**Zero always-on compute cost.** Pay only when messages arrive.
|
||
|
||
---
|
||
|
||
## Immediate Next Steps
|
||
|
||
1. **Answer Q1 + Q2 with a spike** — deploy toy Strands container, measure cold start, test session expiry behavior
|
||
2. **Clarify AgentCore Memory extraction** (Q3) — read the full SDK docs + test
|
||
3. **Lock the Telegram payload schema** (Q8) — define what goes in InvokeAgentRuntime payload
|
||
4. **Pick region + model** (Q7) — confirm Sonnet availability in target region
|
||
5. **Start Phase 1 build**
|
||
|
||
---
|
||
|
||
*Updated 2026-05-04*
|