agent-claw/build-plan.md

# Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute)

## Target Architecture

```
[Telegram User]
      │ message
      ▼
[Telegram Servers]
      │ POST (webhook)
      ▼
[API Gateway (HTTP API)]
      │
      ▼
[Lambda: tg-ingest]          ← verify sig, send typing action, enqueue
      │ SQS message
      ▼
[SQS: agent-queue]
      │ trigger
      ▼
[Lambda: agent-runner]       ← load workspace from S3, build system prompt,
      │ InvokeAgentRuntime       map chat_id → session_id
      ▼
[AgentCore Runtime]          ← Strands agent container (ARM64)
      │ streaming response       tools: web_search, read/write S3, memory
      ▼
[Lambda: agent-runner]       ← stream reply back
      │ Telegram Bot API
      ▼
[Telegram User]              ← receives message

[EventBridge Scheduler]      ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt)
      │                                                      │
      ▼                                                      ▼ same response routing
[Lambda: heartbeat-trigger]                          [Telegram Bot API]
```

**No 24/7 compute anywhere.** Everything is event-driven.

---

## What We've Answered

- ✅ AgentCore is the right runtime (stateless container, event-driven)
- ✅ Telegram supports full webhook mode (all message types)
- ✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s)
- ✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3
- ✅ System prompt construction logic is portable (pure string ops)
- ✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool
- ✅ EventBridge handles heartbeat and cron (no gateway process needed)
- ✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction
- ✅ InvokeAgentRuntime supports streaming responses
- ✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s)
- ✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation
- ✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container)
- ✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories()

---

## Open Questions (Not Yet Answered)

### 🔴 Critical — blocks architecture decisions

**Q1: Response routing for async runs**
When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll?

*Why it matters*: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern.

*Research needed*: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications.

**Q2: Session ID strategy and daily session lifecycle**
`idleRuntimeSessionTimeout` is configurable (60s–8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created.

- Map Telegram `chat_id` → `runtimeSessionId` in DynamoDB (create new session ID at start of day / when previous session maxes out)
- On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration
- The 8hr session boundary is a daily rhythm, not a UX problem

*Simplified*: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3.

**Q3: AgentCore Memory — is long-term extraction automatic or manual?**
The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every `add_turns()` call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)?

*Why it matters*: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly.

**Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory**
When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths:
- Write to S3, reload on next invocation — simple but doesn't benefit from semantic search
- Write to AgentCore Memory — benefits from extraction + search but changes the access pattern

Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history?

**Q5: Cold start UX impact — first session only**
AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the *first* invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant.

- Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation?
- What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)?

**Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity**
AgentCore requires ARM64 containers. Strands is Python. The base image needs:
- Python 3.11+
- `strands-agents`, `bedrock-agentcore` pip packages
- AWS credentials via task role (IAM)
- Access to Bedrock models (need to check regional availability for the models we want)

What's the actual container build + push + deploy flow? Is there a starter template?

---

### 🟡 Important — needs answer before first spike

**Q7: Which Bedrock model and region?**
AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency.

Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model?

**Q8: Telegram → AgentCore payload structure**
The Telegram Update object contains `message.chat.id`, `message.from.id`, `message.text`, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain?

**Q9: Telegram response back to user — token management**
The agent-runner Lambda needs to call `api.telegram.org/bot{token}/sendMessage` after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one.

**Q10: Heartbeat response delivery**
The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver).

Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env).

**Q11: Multi-turn within a single AgentCore session**
If a user sends 3 rapid messages (before the session expires), do they all land in the same `runtimeSessionId`? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired.

Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock.

**Q12: Telegram send_chat_action ("typing") timing**
Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming?

---

### 🟢 Lower priority — figure out during build

**Q13: What tools does the container expose?**
OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set?
- `read_file(path)` — S3 workspace
- `write_file(path, content)` — S3 workspace
- `web_search(query)` — Brave API
- `web_fetch(url)` — HTTP + readability
- `memory_search(query)` — AgentCore Memory
- `send_telegram_message(text)` — for multi-message replies? or just return the response?

Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation.

**Q14: Cron job management from within the agent**
OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a `create_cron_job` tool would need to call `eventbridge.put_rule()`. Doable but needs IAM permissions baked in. Scope for v2.

**Q15: Secrets rotation**
Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call.

**Q16: IaC choice**
CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters.

---

## Proposed Build Phases

### Phase 0 — Spike (1-2 days)
Answer Q1, Q2, Q5 by actually running the thing:
- Deploy the smallest possible Strands container to AgentCore
- Send it a test InvokeAgentRuntime call
- Measure cold start latency in practice
- Test what happens when a session expires and you reinvoke with the same ID

### Phase 1 — Telegram → Agent → Response (1 week)
- API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204)
- SQS queue
- agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply)
- AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool
- S3 workspace bucket with SOUL.md, AGENTS.md, USER.md
- DynamoDB: chat_id → session_id mapping

**Done when**: can send a Telegram message and get a reply from the agent, personality intact.

### Phase 2 — Memory + Workspace (1 week)
- AgentCore Memory provisioned (memory_id per user)
- Conversation history stored after each turn
- Long-term memory extraction confirmed working
- MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search
- write_file / read_file tools pointing at S3 workspace

**Done when**: agent remembers things across sessions (>15min gaps).

### Phase 3 — Heartbeat + Cron (3-4 days)
- EventBridge rule (every 30m)
- heartbeat-trigger Lambda
- HEARTBEAT_OK suppression logic
- Delivery to configurable Telegram chat ID

**Done when**: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram.

### Phase 4 — Polish (ongoing)
- Typing indicator refresh during long runs
- Additional tools (image gen, TTS)
- Error handling + DLQ
- CDK/IaC for reproducible deploys
- Cost monitoring

---

## Cost Estimate (Personal Scale, ~50 agent runs/day)

| Service | Est. Monthly Cost | Notes |
|---|---|---|
| API Gateway (HTTP) | ~$0.01 | <1M requests/mo |
| Lambda (ingest + runner + heartbeat) | ~$0.50 | ~2000 invocations/day, avg 30s |
| SQS | ~$0.00 | Free tier |
| AgentCore Runtime | ~$5-15 | 50 runs/day × 30s avg × ~$0.0x/compute-sec |
| AgentCore Memory | TBD | Pricing not fully public yet |
| S3 (workspace files) | ~$0.01 | <1 MB total |
| DynamoDB (session mapping) | ~$0.01 | On-demand, minimal reads/writes |
| Bedrock LLM calls | $20-80 | Same as today — model-dependent |
| EventBridge | ~$0.00 | <100 rules/events/mo |
| Secrets Manager | ~$0.40 | $0.40/secret/mo |
| **Total infra (ex-LLM)** | **~$6-20/mo** | vs ~$26/mo for Fargate |

**Zero always-on compute cost.** Pay only when messages arrive.

---

## Immediate Next Steps

1. **Answer Q1 + Q2 with a spike** — deploy toy Strands container, measure cold start, test session expiry behavior
2. **Clarify AgentCore Memory extraction** (Q3) — read the full SDK docs + test
3. **Lock the Telegram payload schema** (Q8) — define what goes in InvokeAgentRuntime payload
4. **Pick region + model** (Q7) — confirm Sonnet availability in target region
5. **Start Phase 1 build**

---

*Updated 2026-05-04*