Initial research: OpenClaw on AgentCore architecture

- Architecture comparison (OpenClaw daemon vs AgentCore serverless) - Component compatibility analysis - Fargate analysis - AgentCore rebuild plan (Telegram, zero always-on compute) - Memory strategy: AgentCore Memory + factbase as structured KB - Serverless relay patterns per channel - All open questions resolved - OpenClaw feature delta March→May 2026 - Build phases and cost estimates
2026-05-04 08:28:52 -05:00
parent 4afa16a9cd
commit 0369a74ac1
13 changed files with 1876 additions and 1 deletions
--- a/feasibility-verdict.md
+++ b/feasibility-verdict.md
@@ -0,0 +1,109 @@
+# Feasibility Verdict: OpenClaw on AgentCore
+
+## TL;DR
+
+**Can OpenClaw run on AgentCore Runtime?** Not as-is. The architectures are fundamentally different. OpenClaw is a **long-lived daemon** with persistent connections; AgentCore is a **serverless, request-driven container** with ephemeral sessions.
+
+You could run a **subset** of OpenClaw on AgentCore — specifically, the agent reasoning/tool-calling core — but you'd need to completely redesign the messaging layer, state management, and scheduling. At that point, you're essentially building a new system that borrows OpenClaw's agent logic.
+
+## The Core Tension
+
+| Dimension | OpenClaw | AgentCore |
+|---|---|---|
+| Process model | Always-on daemon | Request-invoked container |
+| Session lifetime | Indefinite | 8 hours max, 15min idle kill |
+| State | Local filesystem | Ephemeral (use AgentCore Memory) |
+| Connections | Persistent WS to channels | No persistent outbound connections |
+| Scheduling | Internal cron/heartbeat | None (use EventBridge) |
+| User model | Single user, single host | Multi-user, multi-session |
+
+## What Would Actually Work
+
+### Realistic Architecture: "Split Gateway"
+
+```
+┌─────────────────────────────┐
+│  Channel Relay              │  ← ECS Fargate (always-on)
+│  WhatsApp, Discord, Slack,  │     Maintains persistent channel connections
+│  Telegram, Signal, etc.     │     Translates messages → InvokeAgentRuntime
+└──────────────┬──────────────┘
+               │ InvokeAgentRuntime
+               ▼
+┌─────────────────────────────┐
+│  AgentCore Runtime          │  ← Serverless container (ARM64)
+│  Pi-mono agent loop         │     Handles reasoning, tool calls, LLM calls
+│  /invocations + /ping       │     Ephemeral per-session
+└──────────────┬──────────────┘
+               │
+    ┌──────────┼──────────┐
+    ▼          ▼          ▼
+┌────────┐ ┌────────┐ ┌──────────────┐
+│ S3     │ │ DynamoDB│ │ AgentCore    │
+│ State  │ │ Sessions│ │ Memory       │
+└────────┘ └────────┘ └──────────────┘
+```
+
+**Channel Relay** (must be always-on):
+- Runs on ECS Fargate, EC2, or similar
+- Extracted from OpenClaw's gateway — just the channel plugins + message routing
+- On inbound message → calls `InvokeAgentRuntime` with session ID
+- On agent response → routes back to the correct channel
+
+**Agent Container** (runs on AgentCore):
+- Pi-mono agent runtime wrapped in HTTP server
+- Implements `/invocations`, `/ping`, optionally `/ws`
+- Loads workspace files from S3 on session start
+- Writes session state to DynamoDB/AgentCore Memory
+- Makes LLM calls, web searches, etc.
+
+**External Scheduling** (EventBridge):
+- Heartbeat: EventBridge rule every 30m → Lambda → InvokeAgentRuntime
+- Cron: dynamic EventBridge rules managed via an API
+
+## Pros of This Approach
+
+- **No infrastructure management** for the agent runtime (scaling, patching, etc.)
+- **Cost-efficient** — pay only for active agent CPU time (I/O wait is free)
+- **Security isolation** — each session in its own microVM
+- **Built-in auth** — SigV4/OAuth for agent endpoints
+- **Built-in observability** — agent tracing, tool invocations
+- **Bedrock-native** — direct IAM-role access to Bedrock models
+
+## Cons / Risks
+
+- **Massive refactoring effort** — this is not a "deploy and go" situation
+- **Channel relay still needs always-on infra** — you don't eliminate ops completely
+- **Session continuity is harder** — 15min idle timeout means sessions are short-lived; need careful state management for multi-turn conversations
+- **Cold start latency** — new sessions need to spin up a microVM
+- **Loss of local features** — no macOS integrations, no device nodes, no browser extension relay
+- **WhatsApp is the hardest** — Baileys requires persistent WebSocket + auth state; this alone might need a dedicated EC2 instance
+- **Agent workspace semantics change** — MEMORY.md, daily notes, etc. need to be loaded from S3 and written back; the "personal local assistant" feel is lost
+
+## Alternative: Don't Do This
+
+Honestly? OpenClaw's design philosophy is **personal, local-first, always-on**. AgentCore's philosophy is **serverless, multi-user, request-driven**. These are almost diametrically opposed.
+
+### Better alternatives for "OpenClaw on AWS":
+1. **EC2/ECS + Docker** — Run the full OpenClaw gateway as a container on EC2 or ECS. This is what the existing Docker support does. You get the full feature set, persistent connections, local filesystem. Just add an EBS volume for state.
+2. **Lightsail** — Cheap VPS that runs the gateway exactly as designed.
+3. **ECS Fargate** — Run the gateway as a Fargate task with EFS for persistence. More serverless-y without the architecture mismatch.
+
+### Where AgentCore _would_ make sense for OpenClaw:
+- **Sub-agent offloading** — Run expensive coding agent tasks on AgentCore (Codex-style), keeping the gateway local but offloading heavy compute.
+- **Tool hosting** — Host MCP tool servers on AgentCore (e.g., browser tool, code interpreter) and connect them to a local OpenClaw gateway.
+- **Multi-user deployment** — If you wanted to offer OpenClaw-as-a-service to multiple users, AgentCore's per-session isolation would be valuable. But you'd still need the channel relay.
+
+## Verdict
+
+| Question | Answer |
+|---|---|
+| Can OpenClaw run on AgentCore? | Not without fundamental redesign |
+| Is the agent core (reasoning loop) compatible? | Yes, with HTTP wrapper |
+| Can channel connections run on AgentCore? | No — need separate always-on infra |
+| Is the effort worth it? | Probably not for personal use. Maybe for multi-user SaaS. |
+| Best AWS hosting for OpenClaw today? | EC2 or ECS with Docker |
+| Where AgentCore adds value? | Sub-agent compute, MCP tool hosting |
+
+---
+
+*Research completed 2026-03-10. Sources: OpenClaw docs (docs.openclaw.ai), OpenClaw GitHub (github.com/openclaw/openclaw), AWS AgentCore docs (docs.aws.amazon.com/bedrock-agentcore), installed OpenClaw v2026.3.2 source inspection.*