Initial research: OpenClaw on AgentCore architecture

- Architecture comparison (OpenClaw daemon vs AgentCore serverless)
- Component compatibility analysis
- Fargate analysis
- AgentCore rebuild plan (Telegram, zero always-on compute)
- Memory strategy: AgentCore Memory + factbase as structured KB
- Serverless relay patterns per channel
- All open questions resolved
- OpenClaw feature delta March→May 2026
- Build phases and cost estimates
This commit is contained in:
daniel
2026-05-04 08:28:52 -05:00
parent 4afa16a9cd
commit 0369a74ac1
13 changed files with 1876 additions and 1 deletions

View File

@@ -1,2 +1,15 @@
# agent-claw
# OpenClaw on AWS AgentCore — Research Project
Research into the feasibility of running [OpenClaw](https://github.com/openclaw/openclaw) on [AWS Bedrock AgentCore Runtime](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agents-tools-runtime.html).
## Files
- `architecture-comparison.md` — Side-by-side architecture comparison
- `compatibility-analysis.md` — Detailed component-by-component compatibility analysis
- `offload-requirements.md` — What needs to move to external services
- `feasibility-verdict.md` — Bottom-line assessment for AgentCore
- `fargate-analysis.md` — ECS Fargate deployment analysis (the better fit)
- `agentcore-memory-research.md` — AgentCore Memory deep dive + MEMORY.md replacement analysis
- `agentcore-rebuild.md` — What's reusable in an AgentCore-native rebuild
- `serverless-relay-patterns.md` — Lambda/webhook patterns per channel (Discord deep dive)
- `build-plan.md`**START HERE**: full build plan, open questions, phases, cost estimate

View File

@@ -0,0 +1,217 @@
# AgentCore Memory — Deep Dive & MEMORY.md Replacement Analysis
## How AgentCore Memory Works
### Architecture
Memory is a managed service completely separate from the Runtime container. Two tiers:
```
Short-term (events) Long-term (extracted records)
───────────────────── ──────────────────────────────
actor → session → events actor → namespace → records
(semantic search available)
(cross-session, persistent)
```
### Short-term Memory
- Stores raw conversation events (turns) via `CreateEvent` API
- Keyed by `actor_id` + `session_id`
- Retrieval: `ListEvents(session_id)` → full conversation history
- **Survives microVM termination** — stored in the managed service, not the container
- This replaces JSONL session transcripts completely
### Long-term Memory — Three Built-in Strategies
Configured when creating a Memory resource. Extraction runs **asynchronously in the background** after each `CreateEvent`. Model costs for built-in extraction are **included in AgentCore Memory pricing** (confirmed by AWS support).
| Strategy | What it extracts | Namespace pattern |
|---|---|---|
| `SUMMARIZATION` | Session summaries | `/summaries/{actorId}/{sessionId}/` |
| `USER_PREFERENCE` | Preferences, habits, recurring facts | `/preferences/{actorId}/` |
| `SEMANTIC` | Raw facts, entities, knowledge | `/facts/{actorId}/` |
All three can run on the same memory resource simultaneously.
### Self-managed Strategy
You control the entire extraction pipeline:
1. Configure triggers: message count (`messageCount: 6`), token count (`tokenCount: 1000`), or idle timeout (`idleSessionTimeout: 30`)
2. AgentCore writes conversation payload to **your S3 bucket**
3. Publishes notification to **your SNS topic**
4. Your Lambda picks it up, runs whatever extraction logic you want
5. You write results back via `BatchCreateMemoryRecords`
This is the **MEMORY.md pattern but managed in the cloud** — you decide what to write and how.
### Strands Integration
The Strands `AgentCoreMemorySessionManager` handles everything automatically:
```python
config = AgentCoreMemoryConfig(
memory_id=MEMORY_ID,
session_id=SESSION_ID, # maps to Telegram chat_id + date
actor_id=ACTOR_ID, # = user identity
batch_size=5, # buffer 5 turns before flushing to save API calls
)
with AgentCoreMemorySessionManager(config) as session_manager:
agent = Agent(
system_prompt=build_system_prompt(), # SOUL.md + AGENTS.md + retrieved memories
session_manager=session_manager,
)
response = agent(user_message)
# on exit: buffers flushed, async long-term extraction kicks off
```
Every conversation turn is automatically stored. `batch_size` reduces API calls for rapid exchanges.
---
## MEMORY.md vs AgentCore Memory
### What MEMORY.md Does Today
- Curated long-term memory the agent manually edits
- Loaded wholesale into the system prompt each session
- Agent writes specific things it wants to remember
- Human-readable markdown
### What AgentCore Memory Provides
- **Short-term**: full conversation history per session (replaces JSONL)
- **Long-term SUMMARIZATION**: session summaries auto-extracted
- **Long-term USER_PREFERENCE**: preferences auto-extracted and consolidated across sessions
- **Long-term SEMANTIC**: facts/entities auto-extracted
- **Semantic search**: `RetrieveMemoryRecords(query="...")` → relevant memories surfaced into system prompt
- **Self-managed strategy**: explicit "write this to memory" control, just like the agent writing MEMORY.md
### Verdict: Replace MEMORY.md with AgentCore Memory
AgentCore Memory is strictly more powerful:
- Auto-extraction means the agent doesn't have to manually curate (though it can via self-managed strategy)
- Semantic search means you don't inject ALL memories into the system prompt — you inject the RELEVANT ones
- No MEMORY.md bloat: today MEMORY.md grows unbounded; AgentCore Memory consolidates automatically
- Cross-session persistence without any file I/O
**The tradeoff**: less direct control over what gets written. Mitigated with self-managed strategy for explicit writes.
---
## The S3 Round-Trip Concern — Addressed
Daniel's concern: S3 round-trip on every interaction.
With AgentCore Memory + Strands:
| What | When | Round-trip? |
|---|---|---|
| Conversation turns (short-term) | Each turn, async/batched | Non-blocking, buffered by `batch_size` |
| Long-term extraction | Background async after turns | Zero latency impact |
| Memory retrieval (session start) | Once per session | One `RetrieveMemoryRecords` call, ~50ms |
| Personality files (SOUL.md etc.) | Once per session start | See below |
**For personality files specifically**: load them once when the session starts, cache in the container's in-memory dict. The same warm microVM handles all messages in an 8-hour session — SOUL.md loads once, not once per message. No per-message S3 calls.
In practice, the flow is:
```
Session start (once):
1. Load SOUL.md, AGENTS.md, USER.md from S3 → cache in container memory
2. RetrieveMemoryRecords(query="important context, preferences") → top-k memories
3. Build system_prompt = static_files + retrieved_memories
4. Pass to Strands agent
Each message (no extra round-trips):
- Strands auto-stores turns to AgentCore Memory (async/batched)
- Long-term extraction runs in background
```
---
## Recommended Storage Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ S3 (persona bucket) │
│ SOUL.md, AGENTS.md, IDENTITY.md, USER.md, HEARTBEAT.md │
│ → Loaded ONCE at session start, cached in container memory │
│ → Updated rarely (when Daniel edits them) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ AgentCore Memory (replaces MEMORY.md + JSONL transcripts) │
│ │
│ Short-term: conversation turns (per session) │
│ → Strands session_manager handles automatically │
│ │
│ Long-term strategies: │
│ SUMMARIZATION → /summaries/{actorId}/{sessionId}/ │
│ USER_PREFERENCE → /preferences/{actorId}/ │
│ SEMANTIC → /facts/{actorId}/ │
│ │
│ Self-managed strategy (for explicit "remember this"): │
│ Trigger: idle timeout or message count │
│ SNS → Lambda → custom extraction → BatchCreateMemoryRecords│
│ → "/curated/{actorId}/" namespace │
│ → This is the MEMORY.md equivalent, automated │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ DynamoDB │
│ telegram_chat_id → agentcore_session_id + actor_id │
│ heartbeat state (last check timestamps) │
│ cron job definitions │
└─────────────────────────────────────────────────────────────┘
```
### Session Start Pattern
```python
@app.entrypoint
async def main(payload, context):
actor_id = payload["actor_id"] # = Telegram user ID
session_id = payload["session_id"] # = from DynamoDB lookup
# Load static files (once per warm session, cached)
if not PERSONA_CACHE.loaded:
PERSONA_CACHE.update(load_from_s3(["SOUL.md", "AGENTS.md", "USER.md"]))
# Retrieve relevant long-term memories (semantic search)
memories = memory_session.search_long_term_memories(
query=payload["message"],
namespace_prefix=f"/preferences/{actor_id}/",
top_k=5
)
# Build system prompt
system_prompt = build_prompt(PERSONA_CACHE, memories)
# Run agent (session_manager handles turn storage automatically)
with AgentCoreMemorySessionManager(config) as session_manager:
agent = Agent(system_prompt=system_prompt, session_manager=session_manager)
return {"response": agent(payload["message"]).message}
```
---
## What AgentCore Memory Pricing Covers
From the pricing page and AWS re:Post confirmation:
- **Built-in strategies** (SUMMARIZATION, USER_PREFERENCE, SEMANTIC): model extraction costs are **included** in Memory pricing
- **Self-managed strategy**: you pay for your own Lambda + Bedrock calls
- Memory storage: billed per GB stored
- `RetrieveMemoryRecords` (semantic search): billed per search
Exact rates not yet published clearly, but designed to be low for personal assistant scale.
---
## Open Questions Remaining
1. **Pricing for AgentCore Memory**: exact rates for storage + retrieval not clearly published yet. Need to check when actually provisioning.
2. **S3 persona file cache invalidation**: when SOUL.md is updated in S3, the warm container won't know. Need a mechanism — either DynamoDB version flag checked at session start, or just accept ~8hr staleness (fine for persona files).
3. **Self-managed extraction timing**: confirm whether idle-session trigger in self-managed strategy fires reliably at session end vs requiring explicit trigger. This determines whether the "write to memory" tool works reliably.
---
*Research: 2026-05-04. Sources: AgentCore Memory docs (memory-types, memory-strategies, memory-organization, strands integration), AgentCore pricing page.*

308
agentcore-rebuild.md Normal file
View File

@@ -0,0 +1,308 @@
# AgentCore Rebuild: What's Reusable vs What's New
## The Premise
Instead of porting OpenClaw's monolithic gateway to AgentCore, build an **AgentCore-native personal assistant** that reuses the best parts of OpenClaw's design. Think of it as "the OpenClaw experience, built on AWS primitives."
---
## ✅ Directly Reusable (copy/adapt, no rewrite)
### 1. Personality & Workspace Files
All of these are just text that gets injected into the system prompt:
- `SOUL.md` — persona, tone, boundaries
- `AGENTS.md` — operating instructions
- `IDENTITY.md` — name, emoji, vibe
- `USER.md` — human's profile
- `TOOLS.md` — tool notes
- `MEMORY.md` — long-term memory
- `HEARTBEAT.md` — periodic task checklist
- `BOOTSTRAP.md` — first-run ritual
- `memory/YYYY-MM-DD.md` — daily notes
**Where they live**: S3 bucket (one prefix per user/agent). Loaded on each invocation, written back after mutations.
**Or**: AgentCore Memory for the conversational parts, S3 for the static persona files.
### 2. System Prompt Construction Logic
OpenClaw's context engine builds a rich system prompt from workspace files + tool descriptions + channel context + runtime metadata. This logic is pure string templating — framework-independent. Could be extracted and reused in a Strands/LangGraph agent or a custom Python agent.
Key pieces:
- Bootstrap file injection (with truncation markers for large files)
- Runtime context block (timezone, channel, OS, model, capabilities)
- Inbound message metadata (sender, group, timestamps)
- Tool policy injection
- Heartbeat/cron prompt variants
- Reply tag system (`[[reply_to_current]]` etc.)
### 3. Tool Definitions & Schemas
OpenClaw defines ~20+ tools. The **schemas** (parameters, descriptions) can be translated to any framework's tool format:
| OpenClaw Tool | AgentCore Equivalent | Notes |
|---|---|---|
| `read` | Custom tool (S3 or container FS) | Read files from workspace |
| `write` | Custom tool (S3 or container FS) | Write workspace files |
| `edit` | Custom tool | String replacement in files |
| `exec` | Custom tool (container shell) | Limited vs OpenClaw's full PTY |
| `web_search` | Custom tool or AgentCore Gateway | Brave API wrapper |
| `web_fetch` | Custom tool | HTTP fetch + readability extraction |
| `browser` | **AgentCore Browser Tool** | Built-in! Better than rolling your own |
| `message` (Discord/Slack) | **AgentCore Gateway** → Slack/Discord tool | 1-click integrations available |
| `memory_search` | **AgentCore Memory** | Semantic search over memory |
| `tts` | Custom tool (ElevenLabs API call) | Straightforward |
| `sessions_spawn` | AgentCore Runtime (A2A) | Agent-to-agent protocol |
| `canvas` | Custom tool or drop | Needs client-side renderer |
| `nodes` | Drop or custom | Requires physical devices |
| `cron` | EventBridge Scheduler API | Via custom tool |
### 4. Channel Integration Patterns
The **logic** of how to handle group chats, mentions, reply threading, chunking, etc. is reusable even if the transport changes:
- Group chat rules (when to speak, when to stay silent)
- Reply tag system
- Message chunking for long responses
- Typing indicators / presence
- Platform-specific formatting (Discord markdown vs WhatsApp formatting)
### 5. Skills System (Concept)
The skill discovery pattern (scan descriptions → load SKILL.md → follow instructions) works in any agent framework. The actual skill files are just markdown instructions.
### 6. Heartbeat Logic
The prompt, the HEARTBEAT_OK ack contract, the "check inbox/calendar/weather" patterns — all reusable. Just the **trigger mechanism** changes (EventBridge instead of internal timer).
---
## 🔧 Needs Rebuilding (new code, same concept)
### 7. Agent Loop
**OpenClaw**: pi-mono TypeScript (LLM call → tool parse → execute → loop)
**AgentCore**: Use **Strands Agents** (Python, AWS-native) or **LangGraph** or custom.
Strands is the path of least resistance on AgentCore — it's AWS-built, has native Bedrock integration, and the AgentCore SDK wraps it cleanly.
```python
from strands import Agent, tool
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@tool
def read_workspace_file(path: str) -> str:
"""Read a file from the agent workspace."""
# Load from S3
...
agent = Agent(tools=[read_workspace_file, ...])
@app.entrypoint
def main(payload):
prompt = payload.get("prompt")
system_prompt = build_system_prompt() # ← Reuse OpenClaw's logic
return {"message": agent(prompt, system_prompt=system_prompt).message}
```
### 8. Session / Memory Management
**OpenClaw**: JSONL files on disk, compaction algorithm
**AgentCore**:
- **Short-term**: AgentCore Memory (per-session turn history)
- **Long-term**: AgentCore Memory (extracted insights, preferences)
- **Workspace files**: S3 (MEMORY.md, SOUL.md, etc.)
- **Daily notes**: S3 or DynamoDB
The compaction algorithm could be reimplemented as a post-session hook that summarizes and stores to long-term memory.
### 9. Channel Relay Service
This is the **biggest new piece**. Options:
**Option A: Lightweight Fargate relay (recommended)**
- Small ECS Fargate task running a stripped-down Node.js service
- Maintains WS connections to WhatsApp/Discord/Telegram/Slack
- On inbound message → `InvokeAgentRuntime` (AgentCore)
- On agent response → route back to channel
- ~$10-15/mo for a tiny Fargate task
**Option B: Webhook-only channels**
- Telegram (webhook mode), Slack (Events API), Discord (interactions endpoint)
- API Gateway → Lambda → InvokeAgentRuntime
- No always-on infra needed
- But: no WhatsApp (Baileys needs persistent WS), no real-time Discord
**Option C: AgentCore Gateway integrations**
- AgentCore Gateway has 1-click Slack integration
- Could handle Slack as a tool (agent → Slack) but not as an inbound channel
- Would still need a relay for inbound messages
**Option D: SNS/SQS fan-out**
- Channels → SQS → Lambda → InvokeAgentRuntime
- Good for decoupling, adds latency
### 10. Scheduling (Heartbeat + Cron)
**EventBridge Scheduler** replaces OpenClaw's internal cron:
```
EventBridge Rule (every 30m)
→ Lambda function
→ InvokeAgentRuntime(prompt="Read HEARTBEAT.md...")
→ Route response to last channel
```
For dynamic cron (agent creates its own schedules), the agent needs a tool that creates/deletes EventBridge rules via the SDK.
### 11. File Operations on Workspace
**OpenClaw**: Direct filesystem read/write/edit
**AgentCore**: S3-backed workspace
```python
@tool
def write_file(path: str, content: str) -> str:
"""Write content to a workspace file."""
s3.put_object(Bucket=WORKSPACE_BUCKET, Key=f"{agent_id}/{path}", Body=content)
return f"Written {len(content)} bytes to {path}"
```
The `edit` tool (find-and-replace) needs to download, modify, re-upload. Slightly more complex but straightforward.
---
## ❌ Must Drop or Significantly Redesign
### 12. Shell Exec (Full PTY)
AgentCore containers can run basic commands, but:
- No persistent background processes (session dies)
- No PTY for interactive CLIs
- No host-level access
- **Coding agent sub-processes** (Codex, Claude Code) don't fit the session model
**Alternative**: Use AgentCore's A2A protocol to spin up specialized coding agent sessions, or use a separate Fargate task for heavy compute.
### 13. Device Nodes (Camera, Screen, Location)
Physical device features can't run on AgentCore. But:
- iOS/Android/macOS nodes could connect to the channel relay
- The relay could expose node commands as tools via AgentCore Gateway
- This is a stretch — likely better to keep nodes connecting to a local gateway
### 14. Browser Extension Relay
The Chrome extension relay requires a persistent WS connection to the gateway. Would need the relay service to proxy this.
### 15. Canvas / A2UI
Requires a client-side renderer (macOS app, browser). The AgentCore agent could generate canvas commands, but delivery depends on having a client.
---
## Architecture: "OpenClaw Experience on AgentCore"
```
┌─────────────────────┐
│ Channel Relay │ ECS Fargate (tiny, always-on, ~$10/mo)
│ WA/Discord/TG/Slack│ Inbound msgs → InvokeAgentRuntime
│ + webhook endpoints │ Agent responses → route to channel
└──────────┬──────────┘
┌─────────────────────┐
│ AgentCore Runtime │ Serverless container (pay per use)
│ Strands Agent │
│ ├─ System prompt │ ← SOUL.md, AGENTS.md from S3
│ ├─ Tools │ ← read/write (S3), web_search, browser, message
│ ├─ Memory │ ← AgentCore Memory (short + long term)
│ └─ LLM (Bedrock) │ ← Direct IAM role access
└──────────┬──────────┘
┌─────┼─────┬──────────┐
▼ ▼ ▼ ▼
┌──────┐ ┌───┐ ┌─────────┐ ┌───────────────┐
│ S3 │ │DDB│ │AgentCore│ │ AgentCore │
│ Work-│ │Cron│ │ Memory │ │ Gateway │
│ space│ │State│ │ │ │ (MCP tools) │
└──────┘ └───┘ └─────────┘ └───────────────┘
┌─────┼─────┐
▼ ▼ ▼
Slack Jira Custom
Tool Tool Lambda
Tools
```
### EventBridge Triggers
```
┌─────────────────────┐
│ EventBridge │
│ ├─ Heartbeat (30m) │ → Lambda → InvokeAgentRuntime
│ ├─ Cron jobs │ → Lambda → InvokeAgentRuntime
│ └─ Webhook events │ → Lambda → InvokeAgentRuntime
└─────────────────────┘
```
---
## Effort Estimate (Ground-Up Build)
| Component | Effort | Tech |
|---|---|---|
| Agent container (Strands + tools) | 2-3 weeks | Python, bedrock-agentcore SDK |
| System prompt builder | 3-5 days | Port from OpenClaw TS → Python |
| S3 workspace tools (read/write/edit) | 2-3 days | boto3 |
| Web search + fetch tools | 2-3 days | Brave API, readability |
| AgentCore Memory integration | 3-5 days | AgentCore Memory SDK |
| Channel relay (Fargate) | 2-3 weeks | Node.js (reuse OpenClaw channel code) |
| EventBridge scheduling | 2-3 days | CDK/Terraform |
| Webhook ingress (API GW) | 2-3 days | CDK/Terraform |
| AgentCore Gateway tools | 1 week | Slack, custom Lambda tools |
| IaC (CDK or Terraform) | 1 week | Full stack deployment |
| Testing + integration | 1-2 weeks | End-to-end |
| **Total** | **~8-12 weeks** | For one person, part-time |
---
## Cost Estimate (Monthly)
| Service | Cost |
|---|---|
| AgentCore Runtime (agent compute) | ~$5-15 (consumption-based, depends on usage) |
| Channel relay (Fargate 0.25 vCPU) | ~$9 |
| NAT Gateway | ~$3 |
| S3 (workspace files) | ~$0.02 |
| DynamoDB (cron state, metadata) | ~$1 |
| AgentCore Memory | TBD (managed service pricing) |
| EventBridge | ~$0.01 |
| Bedrock LLM calls | $20-100+ (model-dependent, same as today) |
| **Infrastructure total (ex-LLM)** | **~$20-30/mo** |
Comparable to Fargate-only ($26/mo) but with better scaling characteristics and per-use billing for the agent compute.
---
## What You Gain vs Fargate-Only
| Benefit | Fargate-Only | AgentCore Rebuild |
|---|---|---|
| Effort to deploy | Days | Months |
| Full OpenClaw feature set | ✅ Yes | ~70% (no PTY, no nodes, no canvas) |
| Per-invocation billing | ❌ Always-on | ✅ Pay per use |
| Session isolation (security) | ❌ Shared process | ✅ Per-session microVM |
| Built-in observability | ❌ DIY logging | ✅ AgentCore tracing |
| Built-in auth (OAuth/SigV4) | ❌ DIY | ✅ AgentCore Identity |
| Multi-user scalability | ❌ Single user | ✅ Designed for it |
| AgentCore Memory | ❌ File-based | ✅ Managed, semantic |
| AgentCore Gateway tools | ❌ N/A | ✅ 1-click Slack/Jira/etc |
| Browser Tool | DIY Playwright | ✅ Built-in |
| Future AWS integrations | Manual | ✅ First-class |
---
## When the Rebuild Makes Sense
**Do it if**:
- You want to offer this as a **multi-user product/service** (AgentCore's per-session isolation is purpose-built for this)
- You want to go deep on **AWS-native agent infra** (Memory, Gateway, Identity, observability)
- You're OK with a Python agent (Strands) instead of the pi-mono TypeScript stack
- You want consumption-based billing instead of always-on compute
- This is a learning/exploration project for AgentCore itself
**Don't do it if**:
- You just want your personal assistant running on AWS (Fargate in a day)
- You need the full OpenClaw feature set (nodes, canvas, PTY, coding agents)
- You want to stay on the OpenClaw upgrade path (community updates, new channels, skills)
---
*Added 2026-03-10*

111
architecture-comparison.md Normal file
View File

@@ -0,0 +1,111 @@
# Architecture Comparison: OpenClaw vs AgentCore Runtime
## OpenClaw Architecture
OpenClaw is a **long-lived, stateful daemon** designed to run on personal hardware (Mac, Linux, Pi, VPS).
### Core Components
**Gateway (the heart)**
- Single long-lived Node.js process (Node ≥22)
- Binds to a local port (default `127.0.0.1:18789`)
- Multiplexed WebSocket server: control plane, RPC, events
- Also serves HTTP (OpenAI-compatible API, hooks, Control UI, Canvas)
- Maintains **persistent outbound connections** to messaging providers
- Hot-reloads config changes; supervised via launchd/systemd
**Channel Plugins (always-on connections)**
- WhatsApp (Baileys — maintains a persistent WebSocket to WhatsApp servers)
- Telegram (grammY — long-polling or webhook)
- Discord (discord.js — persistent WebSocket gateway connection)
- Slack (Bolt — Socket Mode or webhook)
- Signal (signal-cli subprocess)
- iMessage/BlueBubbles (local macOS integration)
- IRC, Matrix, MS Teams, Google Chat, LINE, Feishu, Nostr, Twitch, Zalo, etc.
- Each channel maintains its own long-lived connection or polling loop
**Agent Runtime (Pi agent)**
- Embedded "pi-mono" agent runtime, invoked via internal RPC
- Tool execution: `exec` (shell commands), `read/write/edit` (filesystem), `browser` (Playwright/CDP)
- Session transcripts stored as JSONL files: `~/.openclaw/agents/<agentId>/sessions/*.jsonl`
- Workspace files: AGENTS.md, SOUL.md, MEMORY.md, etc. — filesystem-based memory
**Persistent State (filesystem)**
- Config: `~/.openclaw/openclaw.json`
- Session transcripts: `~/.openclaw/agents/*/sessions/*.jsonl`
- Pairing store (device trust)
- Secrets store
- WhatsApp auth state (Baileys session)
- Cron state
- Agent workspace (MEMORY.md, daily notes, etc.)
**Scheduled Tasks**
- Heartbeat: periodic agent turns in main session (default every 30m)
- Cron jobs: scheduled commands/agent invocations
- Both rely on the gateway being continuously running
**Connected Clients (inbound WS)**
- macOS menu bar app
- iOS/Android nodes (camera, screen, location, voice)
- CLI tools
- WebChat UI
- Browser control (Playwright/CDP to managed Chrome)
### Key Characteristics
- **Always-on process** — not request/response
- **Persistent outbound connections** — channels maintain socket connections
- **Local filesystem** — session transcripts, config, workspace, WhatsApp state
- **PTY/process spawning** — exec tool runs shell commands, coding agents
- **Native integrations** — macOS APIs (iMessage, voice), Bonjour discovery
- **Single-user, single-host** — designed for one person's machine
---
## AgentCore Runtime Architecture
AgentCore is a **serverless, request-driven container hosting environment** for AI agents.
### Core Components
**Runtime Container**
- ARM64 Docker container on a dedicated microVM
- Must listen on `0.0.0.0:8080` (HTTP) / `0.0.0.0:8000` (MCP) / `0.0.0.0:9000` (A2A)
- Invoked via `InvokeAgentRuntime` API with a session ID
**Session Model**
- Each session gets its own **isolated microVM**
- Session identified by `runtimeSessionId` (≥33 chars)
- Session states: Active → Idle → Terminated
- **Max lifetime: 8 hours**
- **Idle timeout: 15 minutes** — session terminated after 15min of no requests
- After termination: microVM destroyed, memory sanitized
- Filesystem is **ephemeral** — nothing persists beyond session lifetime
**Health Contract**
- Must implement `/ping` endpoint
- Returns `Healthy` (idle, can accept work) or `HealthyBusy` (processing async tasks)
- Used for session lifecycle management
**Networking**
- Optional VPC connectivity (ENIs in your VPC)
- Outbound internet: requires NAT gateway in VPC config
- Without VPC: default internet access for API calls
- No persistent inbound listener — invoked via AWS API
**Persistent Storage**
- None built-in — filesystem is ephemeral
- **AgentCore Memory**: managed service for short-term (per-session) and long-term (cross-session) memory
- External services: DynamoDB, S3, RDS for durable state
**Scaling**
- Consumption-based pricing (pay for active CPU; I/O wait is typically free)
- Auto-scales sessions
- No pre-provisioning needed
### Key Characteristics
- **Request-driven** — not always-on
- **Ephemeral filesystem** — nothing persists after session ends
- **Session-scoped** — 15min idle timeout, 8hr max
- **Container-based** — ARM64 Docker image
- **No persistent outbound connections** — designed for request/response + async tasks
- **Multi-tenant** — designed for scaling across users

236
build-plan.md Normal file
View File

@@ -0,0 +1,236 @@
# Plan: AgentCore-Native OpenClaw (Telegram, Zero Always-On Compute)
## Target Architecture
```
[Telegram User]
│ message
[Telegram Servers]
│ POST (webhook)
[API Gateway (HTTP API)]
[Lambda: tg-ingest] ← verify sig, send typing action, enqueue
│ SQS message
[SQS: agent-queue]
│ trigger
[Lambda: agent-runner] ← load workspace from S3, build system prompt,
│ InvokeAgentRuntime map chat_id → session_id
[AgentCore Runtime] ← Strands agent container (ARM64)
│ streaming response tools: web_search, read/write S3, memory
[Lambda: agent-runner] ← stream reply back
│ Telegram Bot API
[Telegram User] ← receives message
[EventBridge Scheduler] ← every 30m → Lambda → InvokeAgentRuntime (heartbeat prompt)
│ │
▼ ▼ same response routing
[Lambda: heartbeat-trigger] [Telegram Bot API]
```
**No 24/7 compute anywhere.** Everything is event-driven.
---
## What We've Answered
- ✅ AgentCore is the right runtime (stateless container, event-driven)
- ✅ Telegram supports full webhook mode (all message types)
- ✅ SQS decoupling handles the webhook ack requirement (respond 204 in <10s)
- ✅ OpenClaw workspace files (SOUL.md, AGENTS.md, MEMORY.md) reusable via S3
- ✅ System prompt construction logic is portable (pure string ops)
- ✅ Tool schemas (web_search, read, write, edit) translateable to Strands @tool
- ✅ EventBridge handles heartbeat and cron (no gateway process needed)
- ✅ AgentCore Memory SDK exists and supports conversation history + long-term extraction
- ✅ InvokeAgentRuntime supports streaming responses
- ✅ Lifecycle settings: idleRuntimeSessionTimeout is configurable (min 60s, default 900s)
- ✅ Cold start: Firecracker microVM ~2-5 seconds on first invocation
- ✅ Language/framework: Python + Strands + bedrock-agentcore SDK (ARM64 container)
- ✅ AgentCore Memory SDK: MemorySessionManager, actor_id + session_id model, search_long_term_memories()
---
## Open Questions (Not Yet Answered)
### 🔴 Critical — blocks architecture decisions
**Q1: Response routing for async runs**
When InvokeAgentRuntime is called from the agent-runner Lambda, does it block synchronously until the agent finishes? Lambda max timeout is 15 minutes. AgentCore sessions can run up to 8 hours. What's the maximum synchronous response wait? Is there a callback/webhook pattern for long agent runs, or do we always need to poll?
*Why it matters*: If an agent run takes 3 minutes (web browsing + LLM), the agent-runner Lambda needs to sit open for 3 minutes. That's fine up to ~15 minutes. But longer runs (coding tasks, deep research) need a different pattern.
*Research needed*: InvokeAgentRuntime streaming behavior + max Lambda concurrency implications.
**Q2: Session ID strategy and daily session lifecycle**
`idleRuntimeSessionTimeout` is configurable (60s8hr, default 15min). For a personal assistant, set it to 4-6 hours — the session stays warm all day. Max lifetime is 8 hours, after which a new session is created.
- Map Telegram `chat_id``runtimeSessionId` in DynamoDB (create new session ID at start of day / when previous session maxes out)
- On new session creation, load MEMORY.md + SOUL.md from S3 into system prompt — that's the context restoration
- The 8hr session boundary is a daily rhythm, not a UX problem
*Simplified*: One session per user per day. Session stays warm between messages. After 8hr, start a new one and reload workspace from S3.
**Q3: AgentCore Memory — is long-term extraction automatic or manual?**
The SDK docs mention "long-term memory automatically extracts and stores key insights." Is this extraction triggered on every `add_turns()` call, after a session ends, or does it require an explicit extraction call? Does it cost extra (separate LLM call)?
*Why it matters*: If extraction isn't automatic, MEMORY.md-equivalent content needs to be managed explicitly.
**Q4: Workspace file mutations (MEMORY.md writes) — S3 vs AgentCore Memory**
When the agent wants to write to MEMORY.md (e.g., "remember this for next time"), there are two paths:
- Write to S3, reload on next invocation — simple but doesn't benefit from semantic search
- Write to AgentCore Memory — benefits from extraction + search but changes the access pattern
Which approach for MEMORY.md? Can we use BOTH — S3 for large curated memory, AgentCore Memory for semantic search over conversation history?
**Q5: Cold start UX impact — first session only**
AgentCore keeps the microVM alive between requests (no cold start for warm sessions). The only startup cost is on the *first* invocation of a brand new session (container image pull + process start). Subsequent requests to the same warm session are instant.
- Does the Telegram "typing..." indicator cover the one-time startup gap on new session creation?
- What happens when the Lambda itself is cold (~500ms Lambda cold start, separate from the AgentCore session)?
**Q6: Strands agent + bedrock-agentcore container — ARM64 build complexity**
AgentCore requires ARM64 containers. Strands is Python. The base image needs:
- Python 3.11+
- `strands-agents`, `bedrock-agentcore` pip packages
- AWS credentials via task role (IAM)
- Access to Bedrock models (need to check regional availability for the models we want)
What's the actual container build + push + deploy flow? Is there a starter template?
---
### 🟡 Important — needs answer before first spike
**Q7: Which Bedrock model and region?**
AgentCore Runtime is available in us-east-1, us-west-2, and several other regions. The Bedrock models we want (Claude Sonnet 4, etc.) need to be available in the same region. Cross-region inference adds latency.
Need to confirm: which model for the agent (Sonnet? Haiku for speed?), which region for AgentCore, does the region support the model?
**Q8: Telegram → AgentCore payload structure**
The Telegram Update object contains `message.chat.id`, `message.from.id`, `message.text`, etc. The InvokeAgentRuntime payload is arbitrary JSON. What does the agent container expect to receive? How do we thread Telegram context (group vs DM, sender info, reply_to) through the SQS → Lambda → AgentCore chain?
**Q9: Telegram response back to user — token management**
The agent-runner Lambda needs to call `api.telegram.org/bot{token}/sendMessage` after the agent responds. The Bot Token must be available to the Lambda. Secrets Manager is the right answer — but it needs to be in the architecture from day one.
**Q10: Heartbeat response delivery**
The heartbeat EventBridge rule fires every 30 minutes. The heartbeat Lambda invokes AgentCore. The agent produces a response (either HEARTBEAT_OK to suppress, or an actual message to deliver).
Where does the heartbeat response go? The Lambda needs to know: "if the agent produces a non-HEARTBEAT_OK response, send it to Telegram chat_id X." This routing config (target Telegram chat ID for heartbeat delivery) needs to be stored somewhere (DynamoDB, Secrets Manager, or baked into the Lambda env).
**Q11: Multi-turn within a single AgentCore session**
If a user sends 3 rapid messages (before the session expires), do they all land in the same `runtimeSessionId`? The agent-runner Lambda needs to look up the current active session ID for a given Telegram chat_id from DynamoDB, or create a new one if expired.
Race condition: two messages arrive simultaneously → both Lambdas look up session → both see "no active session" → both create new sessions. Need a DynamoDB conditional write / lock.
**Q12: Telegram send_chat_action ("typing") timing**
Telegram's chat action expires in ~5 seconds. For a 30-second agent run, we need to refresh the typing indicator periodically. The agent-runner Lambda needs to refresh it while waiting for InvokeAgentRuntime to stream. Is this easy to do in a Lambda while streaming?
---
### 🟢 Lower priority — figure out during build
**Q13: What tools does the container expose?**
OpenClaw has ~20 tools. For an MVP, what's the minimum viable tool set?
- `read_file(path)` — S3 workspace
- `write_file(path, content)` — S3 workspace
- `web_search(query)` — Brave API
- `web_fetch(url)` — HTTP + readability
- `memory_search(query)` — AgentCore Memory
- `send_telegram_message(text)` — for multi-message replies? or just return the response?
Tools NOT in scope for v1: exec, browser, canvas, cron management, image generation.
**Q14: Cron job management from within the agent**
OpenClaw lets the agent create/delete cron jobs dynamically. With EventBridge, a `create_cron_job` tool would need to call `eventbridge.put_rule()`. Doable but needs IAM permissions baked in. Scope for v2.
**Q15: Secrets rotation**
Bot token, Brave API key, etc. — Secrets Manager. Need to decide: Lambda env vars (loaded on cold start) vs Secrets Manager SDK calls (per-invocation). For personal scale, env vars baked in at deploy time are fine. Secrets Manager adds ~50ms latency per call.
**Q16: IaC choice**
CDK (TypeScript) or Terraform or SAM. CDK is most AWS-native and has the highest-level constructs. SAM is simpler for Lambda-centric stacks. Terraform if portability matters.
---
## Proposed Build Phases
### Phase 0 — Spike (1-2 days)
Answer Q1, Q2, Q5 by actually running the thing:
- Deploy the smallest possible Strands container to AgentCore
- Send it a test InvokeAgentRuntime call
- Measure cold start latency in practice
- Test what happens when a session expires and you reinvoke with the same ID
### Phase 1 — Telegram → Agent → Response (1 week)
- API Gateway + tg-ingest Lambda (verify signature, SQS enqueue, return 204)
- SQS queue
- agent-runner Lambda (maps chat_id → session_id, invokes AgentCore, sends Telegram reply)
- AgentCore container: minimal Strands agent, system prompt from S3 workspace, web_search tool
- S3 workspace bucket with SOUL.md, AGENTS.md, USER.md
- DynamoDB: chat_id → session_id mapping
**Done when**: can send a Telegram message and get a reply from the agent, personality intact.
### Phase 2 — Memory + Workspace (1 week)
- AgentCore Memory provisioned (memory_id per user)
- Conversation history stored after each turn
- Long-term memory extraction confirmed working
- MEMORY.md sync pattern: S3 for curated, AgentCore Memory for semantic search
- write_file / read_file tools pointing at S3 workspace
**Done when**: agent remembers things across sessions (>15min gaps).
### Phase 3 — Heartbeat + Cron (3-4 days)
- EventBridge rule (every 30m)
- heartbeat-trigger Lambda
- HEARTBEAT_OK suppression logic
- Delivery to configurable Telegram chat ID
**Done when**: heartbeat fires, agent checks HEARTBEAT.md, delivers alerts to Telegram.
### Phase 4 — Polish (ongoing)
- Typing indicator refresh during long runs
- Additional tools (image gen, TTS)
- Error handling + DLQ
- CDK/IaC for reproducible deploys
- Cost monitoring
---
## Cost Estimate (Personal Scale, ~50 agent runs/day)
| Service | Est. Monthly Cost | Notes |
|---|---|---|
| API Gateway (HTTP) | ~$0.01 | <1M requests/mo |
| Lambda (ingest + runner + heartbeat) | ~$0.50 | ~2000 invocations/day, avg 30s |
| SQS | ~$0.00 | Free tier |
| AgentCore Runtime | ~$5-15 | 50 runs/day × 30s avg × ~$0.0x/compute-sec |
| AgentCore Memory | TBD | Pricing not fully public yet |
| S3 (workspace files) | ~$0.01 | <1 MB total |
| DynamoDB (session mapping) | ~$0.01 | On-demand, minimal reads/writes |
| Bedrock LLM calls | $20-80 | Same as today — model-dependent |
| EventBridge | ~$0.00 | <100 rules/events/mo |
| Secrets Manager | ~$0.40 | $0.40/secret/mo |
| **Total infra (ex-LLM)** | **~$6-20/mo** | vs ~$26/mo for Fargate |
**Zero always-on compute cost.** Pay only when messages arrive.
---
## Immediate Next Steps
1. **Answer Q1 + Q2 with a spike** — deploy toy Strands container, measure cold start, test session expiry behavior
2. **Clarify AgentCore Memory extraction** (Q3) — read the full SDK docs + test
3. **Lock the Telegram payload schema** (Q8) — define what goes in InvokeAgentRuntime payload
4. **Pick region + model** (Q7) — confirm Sonnet availability in target region
5. **Start Phase 1 build**
---
*Updated 2026-05-04*

103
compatibility-analysis.md Normal file
View File

@@ -0,0 +1,103 @@
# Component-by-Component Compatibility Analysis
## 🔴 Incompatible (fundamental architecture mismatch)
### 1. Gateway (Long-Lived Daemon)
**OpenClaw**: Single always-on process that multiplexes WS server + HTTP + channel connections.
**AgentCore**: Container is invoked per-request/session, idle-killed at 15min, max 8hr.
**Verdict**: 🔴 **Cannot run as-is.** The Gateway assumes it's a long-running daemon. AgentCore will kill it after 15 minutes of no inbound invocations. Even if you keep it warm with pings, the 8-hour max session kills any long-running process.
### 2. Channel Connections (WhatsApp, Discord, Telegram, etc.)
**OpenClaw**: Maintains persistent outbound WebSocket/polling connections to each messaging service. WhatsApp (Baileys) requires a persistent session with auth state. Discord uses a persistent gateway WebSocket.
**AgentCore**: Ephemeral sessions. No persistent outbound connections survive session termination.
**Verdict**: 🔴 **Fundamentally incompatible.** WhatsApp's Baileys library maintains a stateful WebSocket with auth keys that must persist. Discord.js maintains a real-time gateway connection. These cannot be started/stopped per request — they need to be always-on or you lose the connection and have to re-auth.
### 3. Filesystem Persistence (Session Transcripts, Config, Workspace)
**OpenClaw**: Stores everything on local filesystem — session JSONL files, config, WhatsApp auth state, pairing store, agent workspace (MEMORY.md, daily notes), secrets.
**AgentCore**: Filesystem is ephemeral. Destroyed when session terminates.
**Verdict**: 🔴 **All persistent state must be externalized.** Every file that OpenClaw writes and expects to read later would need to be backed by S3, DynamoDB, or AgentCore Memory.
### 4. Shell Exec / PTY (Agent Tool)
**OpenClaw**: The `exec` tool spawns real shell processes, supports PTY for interactive commands, runs coding agents (Codex, Claude Code) as child processes.
**AgentCore**: Runs inside a container, so basic exec is possible, but:
- No host-level access
- Container filesystem is ephemeral
- PTY support depends on container config
- Long-running background processes die with session (15min idle / 8hr max)
**Verdict**: 🟡 **Partially possible.** Basic shell commands work in containers. But coding agent subprocesses that run for extended periods will be killed. No access to host-level tools.
### 5. Heartbeat System
**OpenClaw**: Gateway-driven timer that fires periodic agent turns in the main session (default every 30m). Relies on the gateway being continuously running.
**AgentCore**: No built-in periodic task scheduler. Container only runs when invoked.
**Verdict**: 🔴 **Must be offloaded.** Would need EventBridge Scheduler or a Lambda cron to periodically invoke the agent. The heartbeat logic itself could run, but the trigger mechanism must be external.
### 6. Cron Jobs
**OpenClaw**: Built-in cron scheduler (`croner` library) that runs inside the gateway process.
**AgentCore**: No built-in scheduler.
**Verdict**: 🔴 **Must be offloaded.** Same as heartbeat — EventBridge Scheduler → InvokeAgentRuntime.
---
## 🟡 Partially Compatible (needs adaptation)
### 7. Agent Runtime (Pi-Mono)
**OpenClaw**: Embedded pi-mono agent runtime with RPC-based tool calling, streaming, and multi-turn sessions.
**AgentCore**: Expects your container to implement `/invocations` (POST) and `/ping` (GET). Returns JSON or SSE.
**Verdict**: 🟡 **Core agent loop could work.** The pi-mono agent runtime could be wrapped behind the AgentCore HTTP contract. The tool-calling loop would need to be adapted to the HTTP request/response pattern instead of internal RPC. The main challenge is the session model (see below).
### 8. Session Management
**OpenClaw**: Sessions are long-lived, stored as JSONL, persist indefinitely. The "main session" for a user is eternal and accumulates context over days/weeks.
**AgentCore**: Sessions max 8 hours. State is ephemeral. Cross-session continuity requires AgentCore Memory or external storage.
**Verdict**: 🟡 **Needs redesign.** Could use AgentCore Memory for cross-session context, but OpenClaw's JSONL-based session model (with compaction) would need to be completely rewritten to use AgentCore Memory or a database.
### 9. Browser Tool (Playwright)
**OpenClaw**: Launches a managed Chromium instance via Playwright, controls via CDP.
**AgentCore**: Containers can run headless browsers, but:
- ARM64 container (Chromium ARM builds exist)
- Ephemeral — browser state lost on session end
- Network access needed for web browsing (VPC + NAT or default internet)
**Verdict**: 🟡 **Possible but fragile.** Headless Chrome can run in containers, but you need the right base image, enough memory, and network egress. Browser sessions won't persist. AgentCore actually has a built-in Browser Tool you might use instead.
### 10. Web Search / Web Fetch
**OpenClaw**: Makes HTTP requests to Brave Search API, fetches web pages.
**AgentCore**: Outbound HTTP works fine (with internet access via VPC+NAT or default).
**Verdict**: 🟢 **Compatible.** Just needs outbound internet access.
### 11. TTS (Text-to-Speech)
**OpenClaw**: Calls external TTS APIs (ElevenLabs, Edge TTS, etc.)
**AgentCore**: Outbound API calls work fine.
**Verdict**: 🟢 **Compatible.**
### 12. LLM Provider Calls
**OpenClaw**: Calls Bedrock, Anthropic, OpenAI, etc. via HTTP APIs.
**AgentCore**: Outbound API calls work. Bedrock calls can use IAM roles.
**Verdict**: 🟢 **Compatible, and potentially better.** Bedrock calls from AgentCore can use the execution role directly — no API keys needed.
---
## 🟢 Compatible (works as-is or with minimal changes)
### 13. Model Provider Abstraction
The model routing/failover/selection logic is pure application code — works anywhere.
### 14. System Prompt Construction
Building system prompts from workspace files is pure logic — works anywhere (but workspace files need external storage).
### 15. Context Engine (Compaction, Pruning)
Session compaction/pruning logic is algorithmic — works in any runtime. But needs adapted storage backend.
---
## Node / Platform Features (N/A for AgentCore)
These features are inherently tied to physical devices and cannot run on AgentCore:
- **macOS app** (menu bar, Voice Wake, Talk Mode)
- **iOS/Android nodes** (camera, screen, location, voice)
- **iMessage** (requires macOS + Messages.app)
- **Signal** (requires signal-cli subprocess)
- **Canvas** (visual workspace rendered on client device)
- **Bonjour discovery** (LAN-based device pairing)
- **WhatsApp QR pairing** (requires interactive QR scan flow)
These would remain on the user's device, connecting to an AgentCore-hosted agent via API.

211
fargate-analysis.md Normal file
View File

@@ -0,0 +1,211 @@
# OpenClaw on ECS Fargate — Analysis
## TL;DR
**Fargate is the natural AWS home for OpenClaw.** Unlike AgentCore, Fargate's model is a long-running container with persistent storage — exactly what OpenClaw needs. The existing Docker support means this is largely a deployment/ops exercise, not a rewrite.
## Why Fargate Works
| OpenClaw Need | Fargate Support |
|---|---|
| Long-lived daemon process | ✅ ECS Services run indefinitely (no idle timeout) |
| Persistent outbound WS (WhatsApp, Discord) | ✅ Outbound connections stay alive as long as the task runs |
| Persistent filesystem | ✅ EFS volume mount for all state |
| Inbound WS/HTTP (clients, webhooks) | ✅ Via ALB or NLB |
| Shell exec / PTY | ✅ Full Linux container, exec works |
| Cron / Heartbeat | ✅ Runs inside the gateway process as normal |
| Node.js ≥22 | ✅ Any Node version in your container image |
| ARM64 support | ✅ Fargate supports ARM (Graviton) — cheaper |
## Architecture
```
Internet / Messaging APIs
┌──────────────┐ ┌──────────────────────────────────┐
│ ALB / NLB │────▶│ ECS Fargate Task │
│ (optional) │ │ ┌────────────────────────────┐ │
└──────────────┘ │ │ OpenClaw Gateway │ │
│ │ (Node.js, always-on) │ │
│ │ │ │
│ │ ├─ WhatsApp (Baileys WS) │ │
│ │ ├─ Discord (discord.js WS) │ │
│ │ ├─ Telegram (grammY) │ │
│ │ ├─ Slack (Bolt) │ │
│ │ ├─ Agent runtime (pi-mono) │ │
│ │ ├─ Cron / Heartbeat │ │
│ │ └─ WebSocket server (:18789)│ │
│ └────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ EFS Mount │ │
│ │ /home/node/ │ │
│ │ ~/.openclaw/ │ │
│ └────────────────┘ │
└──────────────────────────────────┘
```
## Deployment Details
### Container Image
OpenClaw already publishes Docker images:
- `ghcr.io/openclaw/openclaw:latest` (stable)
- `ghcr.io/openclaw/openclaw:main` (latest main)
- Base: `node:22-bookworm`
- Can build custom with `docker-setup.sh`
### EFS for Persistent State
Mount an EFS filesystem to persist:
- `~/.openclaw/` (config, sessions, pairing store, secrets, WhatsApp auth)
- `~/.openclaw/workspace/` (AGENTS.md, SOUL.md, MEMORY.md, daily notes)
EFS is ideal here because:
- Shared access if you ever run multiple tasks (blue/green deploys)
- Survives task restarts, deployments, Fargate spot interruptions
- Low-latency NFS for the small files OpenClaw uses
- Cost: ~$0.30/GB-month (Infrequent Access even cheaper)
### Fargate Task Sizing
OpenClaw is mostly I/O-bound (waiting on LLM APIs, channel WS):
| Config | vCPU | Memory | Monthly Cost (on-demand, us-east-1) |
|---|---|---|---|
| **Minimal** | 0.25 | 0.5 GB | ~$9/mo |
| **Recommended** | 0.5 | 1 GB | ~$18/mo |
| **With browser** | 1 | 2 GB | ~$36/mo |
| **Heavy (coding agents)** | 2 | 4 GB | ~$72/mo |
ARM (Graviton) is ~20% cheaper than x86. OpenClaw's Docker image supports both.
**Fargate Spot** could save up to 70%, but spot interruptions would kill channel connections (WhatsApp re-auth is painful). **Not recommended** for the gateway.
**Savings Plans**: 1-year commitment saves ~50%. For an always-on personal assistant, this makes sense.
### Networking
**Outbound (channels)**:
- Task in a private subnet with NAT Gateway for internet egress
- Or: task in public subnet with public IP (simpler, slightly less secure)
- All channel connections (WhatsApp WS, Discord WS, Telegram polling) work through NAT
**Inbound (webhooks, clients)**:
- ALB for HTTPS termination (Telegram webhooks, Slack Events API, WebChat)
- NLB for raw TCP/WebSocket passthrough
- Or: no LB at all if using only outbound channels (WhatsApp Baileys doesn't need inbound)
- Alternative: Tailscale sidecar container for private access
**Security Groups**:
- Outbound: allow all (channels need various ports/IPs)
- Inbound: port 18789 from ALB/NLB only (or restricted IPs)
### Service Configuration
```json
{
"serviceName": "openclaw-gateway",
"taskDefinition": "openclaw-gateway",
"desiredCount": 1,
"launchType": "FARGATE",
"deploymentConfiguration": {
"minimumHealthyPercent": 0,
"maximumPercent": 100
}
}
```
Key: `desiredCount: 1` — OpenClaw is single-instance by design (one WhatsApp session). Use `minimumHealthyPercent: 0` for rolling deploys (brief downtime is fine for a personal assistant).
### Health Check
- Container health: `curl http://localhost:18789/` (Control UI responds)
- Or: implement a lightweight `/health` endpoint
- ECS will restart the task if health checks fail
## What Still Needs Work
### 1. WhatsApp Re-Auth on Restart
WhatsApp Baileys stores session auth in the filesystem. With EFS, this persists across task restarts. But if the task is replaced (new deployment, Fargate maintenance), the WS connection drops and needs to reconnect. Baileys handles this automatically if the auth state is intact (on EFS).
**Risk**: LOW if using EFS. Baileys reconnects with stored creds.
### 2. No macOS/iOS Integration
Fargate containers can't run macOS APIs. No iMessage, no Voice Wake, no camera.
**Mitigation**: Run OpenClaw nodes (iOS/macOS/Android) at home, connecting to the Fargate gateway via Tailscale or WS tunnel.
### 3. Browser Tool
Playwright/Chromium needs more memory (2+ GB recommended). Runs fine in containers but adds cost.
**Alternative**: Use the OpenClaw Docker sandbox for browser isolation.
### 4. Signal
`signal-cli` is a Java subprocess. Runs in the container but adds ~200MB+ to image size and memory usage.
### 5. Gateway Token / Auth
With a public ALB, you need `gateway.auth.token` or `gateway.auth.password` set. Store in Secrets Manager, inject via ECS task definition environment/secrets.
## Cost Comparison
| Hosting Option | Monthly Cost | Effort |
|---|---|---|
| **Fargate (0.5 vCPU, 1GB)** | ~$18 + $5 EFS + $3 NAT = **~$26/mo** |Moderate (CDK/Terraform) |
| **Fargate w/ Savings Plan** | ~$13 + $5 + $3 = **~$21/mo** | Same + commitment |
| **EC2 t4g.micro** | ~$6/mo (or free tier) | Manual ops |
| **EC2 t4g.small** | ~$12/mo | Manual ops |
| **Lightsail (1GB)** | **$5/mo** | Easiest |
| **Hetzner VPS (CX22)** | **~$4/mo** | Non-AWS |
Fargate is more expensive than raw EC2/Lightsail, but you get:
- Auto-restart on crash
- No OS patching
- Easy deploys (update image, ECS rolls)
- CloudWatch integration
- IAM task roles (for Bedrock)
## Recommended Setup
### Minimum Viable Fargate Deployment
1. **VPC**: Default VPC or simple 2-AZ setup
2. **EFS**: One filesystem, mounted at `/home/node`
3. **Fargate Service**: 1 task, 0.5 vCPU / 1 GB, ARM64
4. **ALB** (optional): Only if using webhook-based channels or remote access
5. **NAT Gateway**: For outbound internet (channel connections, LLM APIs)
6. **Secrets Manager**: Gateway token, API keys
7. **IAM Task Role**: Bedrock access for LLM calls
8. **CloudWatch Logs**: Container stdout/stderr
### IaC Options
- **CDK**: Best for AWS-native, type-safe infra
- **Terraform**: More portable
- **Copilot CLI**: Fastest to prototype (`copilot init``copilot deploy`)
### Deploy Flow
```bash
# Build & push image
docker build -t openclaw-gateway .
docker tag openclaw-gateway:latest <account>.dkr.ecr.<region>.amazonaws.com/openclaw:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/openclaw:latest
# Update ECS service (rolls to new image)
aws ecs update-service --cluster openclaw --service openclaw-gateway --force-new-deployment
```
## vs AgentCore Runtime
| Dimension | Fargate | AgentCore |
|---|---|---|
| Architecture match | ✅ Long-lived daemon | ❌ Request-driven, ephemeral |
| Channel connections | ✅ Persistent WS | ❌ Killed on idle |
| State persistence | ✅ EFS | ❌ Ephemeral (need Memory service) |
| Code changes needed | Minimal (Docker already works) | Major rewrite |
| Scheduling | ✅ Built-in (gateway cron) | ❌ External (EventBridge) |
| Session isolation | Same container | Per-session microVM |
| Scaling | Manual (desiredCount) | Auto |
| Cost model | Pay for uptime | Pay for CPU usage |
**Bottom line**: Fargate is "run what you have on AWS." AgentCore would be "rewrite OpenClaw as a different kind of system."
---
*Research completed 2026-03-10. Sources: OpenClaw Docker docs, AWS Fargate pricing page, EFS/Fargate integration docs.*

109
feasibility-verdict.md Normal file
View File

@@ -0,0 +1,109 @@
# Feasibility Verdict: OpenClaw on AgentCore
## TL;DR
**Can OpenClaw run on AgentCore Runtime?** Not as-is. The architectures are fundamentally different. OpenClaw is a **long-lived daemon** with persistent connections; AgentCore is a **serverless, request-driven container** with ephemeral sessions.
You could run a **subset** of OpenClaw on AgentCore — specifically, the agent reasoning/tool-calling core — but you'd need to completely redesign the messaging layer, state management, and scheduling. At that point, you're essentially building a new system that borrows OpenClaw's agent logic.
## The Core Tension
| Dimension | OpenClaw | AgentCore |
|---|---|---|
| Process model | Always-on daemon | Request-invoked container |
| Session lifetime | Indefinite | 8 hours max, 15min idle kill |
| State | Local filesystem | Ephemeral (use AgentCore Memory) |
| Connections | Persistent WS to channels | No persistent outbound connections |
| Scheduling | Internal cron/heartbeat | None (use EventBridge) |
| User model | Single user, single host | Multi-user, multi-session |
## What Would Actually Work
### Realistic Architecture: "Split Gateway"
```
┌─────────────────────────────┐
│ Channel Relay │ ← ECS Fargate (always-on)
│ WhatsApp, Discord, Slack, │ Maintains persistent channel connections
│ Telegram, Signal, etc. │ Translates messages → InvokeAgentRuntime
└──────────────┬──────────────┘
│ InvokeAgentRuntime
┌─────────────────────────────┐
│ AgentCore Runtime │ ← Serverless container (ARM64)
│ Pi-mono agent loop │ Handles reasoning, tool calls, LLM calls
│ /invocations + /ping │ Ephemeral per-session
└──────────────┬──────────────┘
┌──────────┼──────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌──────────────┐
│ S3 │ │ DynamoDB│ │ AgentCore │
│ State │ │ Sessions│ │ Memory │
└────────┘ └────────┘ └──────────────┘
```
**Channel Relay** (must be always-on):
- Runs on ECS Fargate, EC2, or similar
- Extracted from OpenClaw's gateway — just the channel plugins + message routing
- On inbound message → calls `InvokeAgentRuntime` with session ID
- On agent response → routes back to the correct channel
**Agent Container** (runs on AgentCore):
- Pi-mono agent runtime wrapped in HTTP server
- Implements `/invocations`, `/ping`, optionally `/ws`
- Loads workspace files from S3 on session start
- Writes session state to DynamoDB/AgentCore Memory
- Makes LLM calls, web searches, etc.
**External Scheduling** (EventBridge):
- Heartbeat: EventBridge rule every 30m → Lambda → InvokeAgentRuntime
- Cron: dynamic EventBridge rules managed via an API
## Pros of This Approach
- **No infrastructure management** for the agent runtime (scaling, patching, etc.)
- **Cost-efficient** — pay only for active agent CPU time (I/O wait is free)
- **Security isolation** — each session in its own microVM
- **Built-in auth** — SigV4/OAuth for agent endpoints
- **Built-in observability** — agent tracing, tool invocations
- **Bedrock-native** — direct IAM-role access to Bedrock models
## Cons / Risks
- **Massive refactoring effort** — this is not a "deploy and go" situation
- **Channel relay still needs always-on infra** — you don't eliminate ops completely
- **Session continuity is harder** — 15min idle timeout means sessions are short-lived; need careful state management for multi-turn conversations
- **Cold start latency** — new sessions need to spin up a microVM
- **Loss of local features** — no macOS integrations, no device nodes, no browser extension relay
- **WhatsApp is the hardest** — Baileys requires persistent WebSocket + auth state; this alone might need a dedicated EC2 instance
- **Agent workspace semantics change** — MEMORY.md, daily notes, etc. need to be loaded from S3 and written back; the "personal local assistant" feel is lost
## Alternative: Don't Do This
Honestly? OpenClaw's design philosophy is **personal, local-first, always-on**. AgentCore's philosophy is **serverless, multi-user, request-driven**. These are almost diametrically opposed.
### Better alternatives for "OpenClaw on AWS":
1. **EC2/ECS + Docker** — Run the full OpenClaw gateway as a container on EC2 or ECS. This is what the existing Docker support does. You get the full feature set, persistent connections, local filesystem. Just add an EBS volume for state.
2. **Lightsail** — Cheap VPS that runs the gateway exactly as designed.
3. **ECS Fargate** — Run the gateway as a Fargate task with EFS for persistence. More serverless-y without the architecture mismatch.
### Where AgentCore _would_ make sense for OpenClaw:
- **Sub-agent offloading** — Run expensive coding agent tasks on AgentCore (Codex-style), keeping the gateway local but offloading heavy compute.
- **Tool hosting** — Host MCP tool servers on AgentCore (e.g., browser tool, code interpreter) and connect them to a local OpenClaw gateway.
- **Multi-user deployment** — If you wanted to offer OpenClaw-as-a-service to multiple users, AgentCore's per-session isolation would be valuable. But you'd still need the channel relay.
## Verdict
| Question | Answer |
|---|---|
| Can OpenClaw run on AgentCore? | Not without fundamental redesign |
| Is the agent core (reasoning loop) compatible? | Yes, with HTTP wrapper |
| Can channel connections run on AgentCore? | No — need separate always-on infra |
| Is the effort worth it? | Probably not for personal use. Maybe for multi-user SaaS. |
| Best AWS hosting for OpenClaw today? | EC2 or ECS with Docker |
| Where AgentCore adds value? | Sub-agent compute, MCP tool hosting |
---
*Research completed 2026-03-10. Sources: OpenClaw docs (docs.openclaw.ai), OpenClaw GitHub (github.com/openclaw/openclaw), AWS AgentCore docs (docs.aws.amazon.com/bedrock-agentcore), installed OpenClaw v2026.3.2 source inspection.*

48
framework-notes.md Normal file
View File

@@ -0,0 +1,48 @@
# Framework & Runtime Notes
## OpenClaw's Agent Framework
OpenClaw does **not** use any mainstream agent framework. No LangChain, LangGraph, CrewAI, Strands, or similar.
### Custom Stack: pi-mono
The agent runtime is built on **pi-mono** by Mario Zechner (`@mariozechner`), a custom TypeScript agent runtime:
| Package | Role |
|---|---|
| `@mariozechner/pi-agent-core` | Core agent loop: LLM call → tool parse → execute → loop |
| `@mariozechner/pi-ai` | LLM provider abstraction (Bedrock, Anthropic, OpenAI, Ollama, etc.) |
| `@mariozechner/pi-coding-agent` | Coding agent runtime (Codex/Claude Code style sub-agents) |
| `@mariozechner/pi-tui` | Terminal UI for interactive sessions |
### OpenClaw Owns Everything Else
On top of pi-mono, OpenClaw implements:
- Session management (JSONL transcripts, compaction, pruning)
- Channel routing (inbound message → agent → response → channel)
- Tool wiring (exec, read, write, edit, browser, canvas, nodes, message, etc.)
- Context engine (system prompt construction, bootstrap file injection)
- Heartbeat / Cron scheduling
- Device pairing and trust
- Config management and hot-reload
- Multi-agent routing
- Webhook/hook system
### Observability
No LangSmith, LangFuse, Arize, or OpenTelemetry integration.
- Internal JSONL session transcripts
- stdout/stderr logging (CloudWatch-compatible)
- `openclaw logs --follow` for live tailing
- Control UI for session inspection
### Implications for AgentCore
- No off-the-shelf framework adapter for AgentCore (unlike LangGraph/Strands starters)
- Would need custom `/invocations` + `/ping` HTTP wrapper around pi-mono
- LLM provider layer (`pi-ai`) could potentially be swapped for direct Bedrock SDK calls
- Tool definitions would need to be re-expressed as AgentCore-compatible tool schemas
---
*Added 2026-03-10*

88
offload-requirements.md Normal file
View File

@@ -0,0 +1,88 @@
# What Must Be Offloaded to External Services
## Must-Offload Components
### 1. Messaging Channel Connections → External Service / Sidecar
**Current**: WhatsApp (Baileys WS), Discord (discord.js WS), Telegram (grammY), Slack (Bolt), etc. all run as persistent connections inside the gateway.
**Required**: A separate always-on service that:
- Maintains persistent connections to each messaging platform
- Receives inbound messages
- Translates them into `InvokeAgentRuntime` calls to AgentCore
- Receives agent responses and routes them back to the correct channel
**Options**:
- **ECS/Fargate task** running a stripped-down "channel relay" service (the OpenClaw gateway minus the agent runtime)
- **Lambda + API Gateway** for webhook-based channels (Telegram webhook mode, Slack Events API)
- **EC2 or ECS** for WebSocket-based channels (WhatsApp Baileys, Discord)
- A lightweight **"gateway lite"** that keeps channel connections and forwards to AgentCore
**Complexity**: HIGH. This is the single biggest piece of work. The channel connection logic is deeply intertwined with the gateway.
### 2. Persistent State → S3 + DynamoDB / AgentCore Memory
**Current**: All state lives on the filesystem.
**Required**:
| State | Proposed External Store |
|---|---|
| Session transcripts (JSONL) | S3 (keyed by agent+session ID) or DynamoDB |
| Agent workspace (MEMORY.md, SOUL.md, etc.) | S3 bucket or EFS |
| WhatsApp auth state | DynamoDB or Secrets Manager |
| Config (openclaw.json) | S3 or Parameter Store |
| Pairing store (device trust) | DynamoDB |
| Secrets | Secrets Manager |
| Cron state | DynamoDB |
| Agent conversation history | AgentCore Memory (short-term + long-term) |
**Complexity**: HIGH. Every file read/write in OpenClaw assumes a local filesystem. Would need a storage abstraction layer.
### 3. Heartbeat → EventBridge Scheduler
**Current**: Gateway-internal timer fires every 30m.
**Required**: EventBridge Scheduler rule that periodically calls `InvokeAgentRuntime` with the heartbeat prompt.
**Complexity**: LOW. Straightforward EventBridge → Lambda → InvokeAgentRuntime.
### 4. Cron Jobs → EventBridge Scheduler
**Current**: `croner` library inside the gateway process.
**Required**: Each cron job becomes an EventBridge Scheduler rule.
**Complexity**: MEDIUM. Need a way to manage/create/delete rules dynamically (since OpenClaw supports runtime cron management via the agent).
### 5. Webhook Ingress → API Gateway / ALB
**Current**: Gateway's HTTP server handles webhook endpoints for channels like Telegram (webhook mode), Slack Events API.
**Required**: API Gateway or ALB fronting the channel relay service or Lambda functions.
**Complexity**: MEDIUM.
### 6. Browser Control → AgentCore Browser Tool or Sidecar
**Current**: Playwright controlling a local Chromium process.
**Required**: Either:
- Use AgentCore's built-in Browser Tool
- Run headless Chrome inside the container (heavy, ephemeral)
- Separate browser service (like Browserless.io or an ECS sidecar)
**Complexity**: MEDIUM. AgentCore has its own browser tool which could replace OpenClaw's.
## Can Stay Inside the Container
These components work within an AgentCore container:
- **Agent reasoning loop** (pi-mono core logic)
- **LLM API calls** (Bedrock, Anthropic, OpenAI — outbound HTTP)
- **Web search / web fetch** (outbound HTTP to Brave API, arbitrary URLs)
- **TTS calls** (outbound HTTP to ElevenLabs/Edge TTS)
- **System prompt construction** (pure logic)
- **Context windowing / compaction** (algorithmic)
- **Model selection / failover** (pure logic)
- **Basic shell exec** (within container, ephemeral)
- **File read/write** (within session, ephemeral — but see state offloading above)
## Summary: Effort Estimate
| Component | Effort | Notes |
|---|---|---|
| Channel relay service | XL | Core gateway refactor; messaging is deeply coupled |
| Storage abstraction layer | L | Every fs operation needs a backend swap |
| Heartbeat/Cron offload | S | EventBridge rules |
| Agent HTTP wrapper | M | Wrap pi-mono behind /invocations + /ping |
| Session model redesign | L | JSONL → AgentCore Memory or DynamoDB |
| Webhook ingress | M | API Gateway + routing |
| Browser adaptation | S-M | Use AgentCore Browser Tool or containerized Chrome |

129
open-questions-resolved.md Normal file
View File

@@ -0,0 +1,129 @@
# Open Questions — Final Research Findings
*Updated 2026-05-04 after research pass*
---
## Q1: Direct Code Deployment vs Container — ✅ RESOLVED
**CodeZip is the default and recommended path. No Docker needed.**
The AgentCore CLI scaffolds CodeZip by default:
```bash
agentcore create --name MyAgent --framework Strands --model-provider Bedrock --build CodeZip
agentcore deploy # AWS CodeBuild packages it; no local Docker required
```
Container mode is opt-in (`--build Container`). Q4 (ARM64 Dockerfile) is moot for initial build.
---
## Q2: Secrets in the Container — ✅ RESOLVED (with known limitation)
AgentCore Runtime env vars are **plaintext only** today. GitHub issue #396 (filed ~April 2026) requests ECS-style `valueFrom` Secrets Manager references — not yet implemented.
**Recommended pattern: IAM role + SDK fetch at startup**
```python
import boto3, os
def load_secrets():
sm = boto3.client('secretsmanager')
secret = sm.get_secret_value(SecretId='openclaw/agent/keys')
os.environ['BRAVE_API_KEY'] = secret['SecretString']
# etc.
# Call once at module load → cached for the 6-8hr warm session
load_secrets()
```
The container's IAM execution role grants Secrets Manager access. Runs once per session start — negligible cost. Don't pass secrets through the invocation payload.
---
## Q3: AgentCore Memory Pricing — ✅ RESOLVED (low risk for personal scale)
**Pricing structure confirmed:**
- Long-term retrieval: billed **per retrieve request**
- Built-in strategy model costs (extraction + consolidation): **included in Memory pricing** (confirmed by AWS re:Post)
- Storage: per GB
Exact per-event and per-GB rates not yet clearly published (still preview pricing). At personal assistant scale (~100 turns/day), cost will be pennies. Validate after first test deployment.
---
## Q4: ARM64 Container Build — ✅ RESOLVED (moot, but documented)
Superseded by CodeZip (Q1). If container mode ever needed:
```dockerfile
FROM --platform=linux/arm64 ghcr.io/astral-sh/uv:python3.11-bookworm-slim
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-cache
COPY agent.py ./
EXPOSE 8080
CMD ["uv", "run", "uvicorn", "agent:app", "--host", "0.0.0.0", "--port", "8080"]
```
Build: `docker buildx build --platform linux/arm64 -t <ecr-uri>:latest --push .`
⚠️ Hard requirement: ARM64 only. x86 image → `ValidationException: Architecture incompatible` on CreateAgentRuntime.
---
## Q5: Region + Model — ✅ RESOLVED
**Region: us-east-1** (broadest service availability, aligns with existing AWS work)
**Models (Bedrock cross-region inference, `us.` prefix):**
| Use | Model ID | Notes |
|---|---|---|
| Main agent | `us.anthropic.claude-3-7-sonnet-20250219-v1:0` | Primary workhorse |
| Heartbeats | `us.anthropic.claude-3-5-haiku-20241022-v1:0` | Fast, cheap |
| Experiment | `us.anthropic.claude-sonnet-4-*` | Sonnet 4 now on Bedrock (1M ctx preview) |
Strands defaults to Bedrock + Sonnet when AWS creds are present. No extra config needed for basic setup.
---
## Q6: Self-Managed Memory Strategy — ⚠️ NOT SUPPORTED YET
**Finding:** AgentCore CLI issue #677 (March 26, 2026): *"AgentCore memory does not currently support self-managed strategies."* Docs describe it; CLI doesn't implement it.
**Impact:** The "bring your own Lambda extraction pipeline" pattern is blocked via CLI.
**What still works:**
- ✅ Built-in strategies: SUMMARIZATION, USER_PREFERENCE, SEMANTIC — fully supported, automatic
- ✅ Strands `AgentCoreMemorySessionManager` — auto-stores turns, handles extraction
-`BatchCreateMemoryRecords` API directly — works for explicit writes, bypasses CLI
**Recommended mitigation:**
- Use built-in strategies for automatic extraction (covers ~90% of MEMORY.md value)
- Add `write_memory_record` as an agent tool that calls `BatchCreateMemoryRecords` directly
- This gives explicit "remember this" control without the self-managed strategy pipeline
```python
@tool
def write_memory_record(content: str, namespace: str = "/curated/daniel/") -> str:
"""Explicitly write an important fact or lesson to long-term memory."""
memory_client.batch_create_memory_records(
memoryId=MEMORY_ID,
memoryRecords=[{"content": {"text": content}, "namespace": namespace}]
)
return f"Written to memory: {content[:50]}..."
```
---
## Summary
| # | Question | Status | Decision |
|---|---|---|---|
| 1 | Direct code deploy vs container | ✅ | Use CodeZip — no Docker |
| 2 | Secrets in container | ✅ | IAM role + SDK fetch at startup |
| 3 | Memory pricing | ✅ | Unknown exact rates, low risk at personal scale |
| 4 | ARM64 Dockerfile | ✅ | Moot (CodeZip), documented for reference |
| 5 | Region + model | ✅ | us-east-1, Claude Sonnet (cross-region) |
| 6 | Self-managed memory trigger | ✅ | Use built-in + BatchCreateMemoryRecords as tool |
**All open questions resolved. Ready for Phase 0 spike.**

110
openclaw-feature-delta.md Normal file
View File

@@ -0,0 +1,110 @@
# OpenClaw Feature Delta: March 2026 → May 2026
*Covers releases 2026.3.7 through 2026.5.3 (current)*
*Research baseline was 2026.3.2*
---
## Features That Change the AgentCore Comparison
### 🔴 New features we'd need to replicate — not trivial
**Agents/Commitments** (2026.4.29)
New opt-in system where the agent infers follow-up commitments from conversations and delivers them via heartbeat. Config: `commitments.enabled`, `commitments.maxPerDay`. Commitments are extracted in a background sub-agent, stored with due times, and surfaced at the next appropriate heartbeat.
This is NEW OpenClaw behavior that our AgentCore build doesn't account for. We'd need:
- A "commitment extractor" pass after each conversation (similar to long-term memory extraction)
- Storage for commitments with due times (DynamoDB)
- Heartbeat Lambda checks due commitments and delivers them
**Memory → People Wiki** (2026.4.29)
OpenClaw's memory system has grown significantly: canonical people aliases, person cards, relationship graphs, privacy/provenance reports, evidence-kind drilldown. This is well beyond what we analyzed as "MEMORY.md replacement." The agent now has a structured knowledge graph about people and relationships.
AgentCore Memory's built-in strategies (SUMMARIZATION, USER_PREFERENCE, SEMANTIC) don't have a direct equivalent. This is a gap — OpenClaw's memory is now substantially more sophisticated.
**REM Dreaming** (2026.4.29)
Referenced in the `doctor.memory.remHarness` RPC — OpenClaw now has a "REM" background consolidation pass that synthesizes and reorganizes memories asynchronously. Similar to our planned "self-managed memory strategy" but built-in and richer.
No direct AgentCore Memory equivalent. Would need custom Lambda + BatchCreateMemoryRecords.
**Active Memory / Partial Recall** (2026.4.29)
- Per-conversation `allowedChatIds`/`deniedChatIds` filters for memory recall
- Returns partial recall results on timeout instead of failing closed
AgentCore Memory doesn't have conversation-scoped recall filters. Retrieving memories without a filter would return everything, not just conversation-relevant context.
---
### 🟡 Design decisions our build should incorporate
**Streaming Progress Drafts** (2026.5.3)
`streaming.mode: "progress"` — live draft messages across all channels (Discord, Telegram, Matrix, Slack, Teams) that update in-place as the agent processes. In Telegram, this means an editing message that shows progress without spamming new messages.
**Directly relevant to our Telegram build.** Instead of the "typing indicator" hack (which expires in 5s), we can send an initial "thinking..." message and edit it in-place with progress, then finalize. Much cleaner UX. We should implement this from day one.
**`sessions_yield`** (2026.3.12)
New tool: orchestrators can end the current turn immediately and carry a hidden payload into the next session turn. Enables cleaner multi-agent hand-offs.
For our AgentCore build: this would be handled differently (sub-agent via A2A), but the pattern is worth knowing.
**Queue Mode: steer is now default** (2026.4.29)
Active-run queueing now defaults to `steer` with 500ms debounce. This means multiple rapid messages don't queue sequentially — later messages "steer" the current run. The SQS batching approach we designed (bundle all queued messages at once) is aligned with this.
**`/steer` command** (2026.5.3)
Explicit steering of active session runs without starting a new turn when idle. Our architecture handles this naturally through SQS FIFO batching.
**ACP: resumeSessionId** (2026.3.11)
`sessions_spawn` with `runtime: "acp"` can now resume existing sessions. Relevant if the AgentCore build eventually spawns coding agents.
---
### 🟢 Improvements to existing features (no gap impact)
**Multimodal memory indexing** (2026.3.11)
Image and audio indexing for memory search with Gemini embedding. OpenClaw-only feature (uses local embedding + Gemini). AgentCore Memory handles multimodal separately via AgentCore Browser/Code Interpreter context.
**Dashboard v2** (2026.3.12)
Control UI overhaul: modular overview, chat, config, agent, and session views, command palette, mobile bottom tabs. Not relevant to AgentCore build (we won't have the OpenClaw Control UI).
**SQLite plugin state store** (2026.4.29)
Restart-safe keyed registries with TTL for plugins. Analogous to DynamoDB in our AgentCore build.
**Bedrock Opus 4.7 thinking** (2026.4.29)
Now available. Update model selection in the build plan — Opus 4.7 with thinking is an option.
**Telegram improvements** (2026.3.12, 2026.4.29, 2026.5.3)
- Model picker via inline buttons
- Chunking improvements
- Proxy/webhook/polling resilience
These are bug fixes in OpenClaw's Telegram channel; our build avoids these by using the simpler Telegram Bot API directly.
---
## Updated Gap Assessment
| Feature | OpenClaw (now) | AgentCore Build Plan | Gap |
|---|---|---|---|
| Streaming progress | ✅ Live draft edits | ❌ Not planned | Add Telegram edit-in-place |
| Commitments | ✅ Background extraction + heartbeat | ❌ Not designed | Medium effort to add |
| People wiki | ✅ Structured knowledge graph | ❌ Flat memories only | Significant gap |
| REM dreaming | ✅ Background consolidation | ❌ Built-in strategies only | Partial via built-ins |
| Memory filters | ✅ Per-conversation recall | ❌ Global retrieval | Small gap |
| Multimodal memory | ✅ Images + audio | ❌ N/A | Lower priority |
| Queue/steer default | ✅ Built-in | ✅ SQS FIFO batching | Equivalent |
| sessions_yield | ✅ Built-in | ✅ A2A equivalent | Equivalent |
---
## Recommendations for Build Plan Updates
1. **Add streaming progress to Phase 1** — don't defer this. In Telegram: send initial "⏳ thinking..." message, edit it with progress updates, replace with final answer. Better UX than typing indicator. OpenClaw ships this by default now.
2. **Add commitments to Phase 3 scope** — extract from conversations, store in DynamoDB with due_at, check in heartbeat Lambda, deliver if overdue. Simple but valuable.
3. **Acknowledge the memory gap** — OpenClaw's memory is now a full people-knowledge-graph system. AgentCore Memory's built-in strategies (SUMMARIZATION, USER_PREFERENCE, SEMANTIC) cover the basics but not the people wiki or REM dreaming depth. This is a real differentiation. The AgentCore build starts simpler — that's OK for a personal assistant, but worth calling out.
4. **Update model options** — Bedrock Opus 4.7 with thinking is now available. Consider for complex tasks.
---
*Updated 2026-05-04. Covers delta from v2026.3.2 → v2026.5.3.*

View File

@@ -0,0 +1,192 @@
# Serverless Channel Relay Patterns — Research
## The Core Question
Can we replace the always-on channel relay (Fargate task) with event-driven serverless (Lambda + API Gateway), eliminating the persistent connection cost?
**Short answer**: Depends entirely on the channel. Telegram and Slack — yes, fully serverless. Discord — yes for slash commands, **no for regular chat messages**. WhatsApp (Baileys/personal) — no.
---
## Channel-by-Channel: Serverless Viability
### ✅ Telegram — Fully Serverless
Telegram bots support **full webhook mode** for all message events. Every message type (text, photos, reactions, etc.) can be delivered as an HTTP POST to your endpoint.
**Pattern**: `Telegram servers → API Gateway → Lambda → SQS → InvokeAgentRuntime`
```
User sends message to Telegram bot
→ Telegram POSTs to your HTTPS endpoint
→ API Gateway (HTTP API, ~$1/million requests)
→ Lambda (verify, parse, enqueue)
→ SQS (buffer for async processing)
→ Lambda → InvokeAgentRuntime (AgentCore)
→ Agent responds
→ Lambda calls Telegram Bot API to send reply
```
- **Well-proven pattern**: multiple serverless frameworks have Telegram webhook plugins
- Lambda timeout: use deferred/async pattern for long agent runs (respond 204 immediately, process async)
- Cost: essentially **free** at personal assistant scale (~$0.01/mo)
- grammY (used by OpenClaw) supports webhook mode natively
### ✅ Slack — Fully Serverless
Slack's **Events API** delivers all events (messages, reactions, mentions) as HTTP webhooks. No persistent connection needed.
```
User sends Slack message
→ Slack POSTs to your endpoint within 3 seconds
→ API Gateway → Lambda (must respond 200 within 3s or Slack retries)
→ Lambda: ack immediately, process async via SQS/Lambda
→ Agent runs → Slack Web API to post reply
```
- Standard pattern: Bolt for JavaScript/Python works in Lambda
- **3-second ack requirement**: must decouple ack from agent processing
- OpenClaw's Slack (Bolt) supports Events API mode (vs Socket Mode which needs WS)
- AgentCore Gateway actually has a **1-click Slack integration** if you want to skip custom code
### ⚠️ Discord — PARTIAL (big asterisk)
This is the tricky one. Discord has two separate systems:
#### What's serverless: Interactions (slash commands, buttons, select menus)
Discord sends these as HTTP POSTs to an **Interactions Endpoint URL**. Lambda handles them fine.
- Pattern: `Discord → API Gateway → Lambda → respond within 3s (or defer)`
- Works for: `/ask`, `/chat`, button clicks, context menu commands
- **3-second hard limit** on initial response — need deferred responses for agent invocations
#### What's NOT serverless: Regular chat messages (MESSAGE_CREATE)
Regular messages in channels/DMs — the kind an AI assistant primarily needs — **only come via the Gateway WebSocket**. Discord does not send MESSAGE_CREATE as a webhook.
Discord has an outgoing webhook events system (separate from Interactions), but as of 2025/2026 it only supports 3 event types:
- `APPLICATION_AUTHORIZED` (user installs app)
- `ENTITLEMENT_CREATE` (premium purchase)
- `QUEST_USER_ENROLLMENT`
**No MESSAGE_CREATE, no REACTION_ADD, no DM events via webhook.** There's a long-standing feature request for this but Discord hasn't shipped it.
**Source**: Discord changelog: *"You can now subscribe to a limited number of HTTP-based outgoing webhook events... Currently, 3 events are available: APPLICATION_AUTHORIZED, ENTITLEMENT_CREATE, and QUEST_USER_ENROLLMENT."*
#### So for Discord, options are:
1. **Slash commands only** → Lambda, fully serverless. User has to use `/chat <message>` instead of plain messages.
2. **Regular messages** → still needs Gateway WS connection (always-on process)
3. **Hybrid**: slash commands via Lambda + tiny always-on relay just for MESSAGE_CREATE
### ❌ WhatsApp (Baileys/personal) — Not Serverless
Baileys uses the WhatsApp Web protocol — a persistent WebSocket with stateful auth. No webhook mode exists for personal WhatsApp. Needs always-on.
**Exception**: **WhatsApp Business API (Cloud API)** is fully webhook-based:
- Meta sends message events to your HTTPS endpoint
- Lambda + API Gateway pattern is well-documented
- But: requires a Business account, phone number registration through Meta, not the same as personal WhatsApp
If you're willing to switch from Baileys (personal WA) to the Business API, this is fully serverless. Different UX though — it's a separate business number, not your personal number.
---
## Recommended Architecture: Hybrid Serverless + Tiny Relay
```
┌─────────────────────────────────────────────────────┐
│ SERVERLESS LAYER (Lambda + API Gateway) │
│ │
│ Telegram ────────→ Lambda → SQS → Lambda │
│ Slack (Events) ──→ Lambda → SQS → Lambda ─────→ InvokeAgentRuntime
│ Discord (slash) ──→ Lambda → defer → Lambda │
│ WA Business ─────→ Lambda → SQS → Lambda │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ TINY RELAY (Fargate, only if you need it) │
│ │
│ Discord (chat msgs) ─→ tiny Node.js relay ─────→ InvokeAgentRuntime
│ WhatsApp (Baileys) ─→ tiny Node.js relay ─────→ InvokeAgentRuntime
│ │
│ ~0.25 vCPU / 0.5 GB ≈ $9/mo │
└─────────────────────────────────────────────────────┘
```
If Discord and Baileys are the only things that need persistent connections, you can minimize the relay to the smallest possible Fargate task (~$9/mo). Everything else is serverless.
---
## Discord-Specific: Slash-Command-Only Pattern
If you're OK with slash commands as the interface instead of plain messages, Discord becomes fully serverless:
```
/ask What's on my calendar today?
→ Discord POSTs to API Gateway
→ Lambda: validate sig, respond DEFERRED_CHANNEL_MESSAGE_WITH_SOURCE
→ async Lambda → InvokeAgentRuntime
→ agent processes
→ Lambda calls Discord webhook to post followup
```
The UX difference: users type `/ask <prompt>` instead of just `<prompt>`. For a personal assistant on your own server, this is tolerable. For a general-purpose assistant reading all messages, it's a significant regression.
---
## The SQS Pattern (Decouple Ack from Processing)
Applies to all webhook channels — Telegram, Slack, Discord interactions. The webhook Lambda must respond within 3 seconds (Slack/Discord) or 10 seconds (Telegram). Agent runs can take 30-120 seconds. Solution:
```
1. Webhook Lambda:
- Validate signature
- Enqueue to SQS (< 100ms)
- Return 200/204 immediately
2. Processing Lambda (SQS trigger):
- Pull message from SQS
- InvokeAgentRuntime (async, up to 15 min Lambda timeout)
- On completion: call channel API to post reply
```
Cost at personal assistant scale: effectively $0.
---
## Cost Comparison
| Architecture | Monthly Cost | Discord UX |
|---|---|---|
| Full Fargate relay (all channels) | ~$9-15 + NAT | Full chat |
| Full Lambda (all channels, slash commands only) | ~$0.01 | Slash commands |
| Hybrid: Lambda + tiny Fargate (Discord chat + Baileys) | ~$9 + $0.01 | Full chat |
| Lambda only + WhatsApp Business API (no Baileys) | ~$0.01 | Full chat (business WA) |
---
## Verdict Per Channel
| Channel | Serverless? | Lambda Pattern | Notes |
|---|---|---|---|
| Telegram | ✅ Fully | API GW → Lambda → SQS | grammY webhook mode |
| Slack | ✅ Fully | API GW → Lambda → SQS | Bolt Events API mode |
| Discord (slash only) | ✅ With limits | API GW → Lambda (deferred) | No chat message listening |
| Discord (all messages) | ❌ | Needs Gateway WS | Feature request open, not shipped |
| WhatsApp Business API | ✅ Fully | API GW → Lambda → SQS | Needs Meta Business account |
| WhatsApp Baileys (personal) | ❌ | Needs persistent WS | Personal account protocol |
| Signal | ❌ | Needs signal-cli subprocess | No webhook mode |
| iMessage (BlueBubbles) | ❌ | Local Mac process | By definition |
---
## What This Means for the Rebuild
A lean serverless relay is possible if you:
1. Use Telegram and/or Slack as primary channels (both fully serverless)
2. Accept slash-command UX for Discord OR run a tiny Fargate relay just for Discord chat
3. Optionally switch to WhatsApp Business API instead of Baileys
The relay doesn't have to be a full-fat OpenClaw gateway — for the serverless channels it's literally just "validate signature, enqueue to SQS, return 200." ~100 lines of code per channel.
---
*Research: 2026-05-04. Sources: Discord webhook events docs, Discord changelog (only 3 webhook event types confirmed), Telegram setWebhook docs, Slack Events API docs, WhatsApp Business API webhook docs.*