Files
agent-claw/fargate-analysis.md
daniel 0369a74ac1 Initial research: OpenClaw on AgentCore architecture
- Architecture comparison (OpenClaw daemon vs AgentCore serverless)
- Component compatibility analysis
- Fargate analysis
- AgentCore rebuild plan (Telegram, zero always-on compute)
- Memory strategy: AgentCore Memory + factbase as structured KB
- Serverless relay patterns per channel
- All open questions resolved
- OpenClaw feature delta March→May 2026
- Build phases and cost estimates
2026-05-04 08:28:52 -05:00

9.2 KiB

OpenClaw on ECS Fargate — Analysis

TL;DR

Fargate is the natural AWS home for OpenClaw. Unlike AgentCore, Fargate's model is a long-running container with persistent storage — exactly what OpenClaw needs. The existing Docker support means this is largely a deployment/ops exercise, not a rewrite.

Why Fargate Works

OpenClaw Need Fargate Support
Long-lived daemon process ECS Services run indefinitely (no idle timeout)
Persistent outbound WS (WhatsApp, Discord) Outbound connections stay alive as long as the task runs
Persistent filesystem EFS volume mount for all state
Inbound WS/HTTP (clients, webhooks) Via ALB or NLB
Shell exec / PTY Full Linux container, exec works
Cron / Heartbeat Runs inside the gateway process as normal
Node.js ≥22 Any Node version in your container image
ARM64 support Fargate supports ARM (Graviton) — cheaper

Architecture

Internet / Messaging APIs
        │
        ▼
┌──────────────┐     ┌──────────────────────────────────┐
│   ALB / NLB  │────▶│  ECS Fargate Task                │
│   (optional) │     │  ┌────────────────────────────┐  │
└──────────────┘     │  │  OpenClaw Gateway           │  │
                     │  │  (Node.js, always-on)       │  │
                     │  │                              │  │
                     │  │  ├─ WhatsApp (Baileys WS)   │  │
                     │  │  ├─ Discord (discord.js WS)  │  │
                     │  │  ├─ Telegram (grammY)        │  │
                     │  │  ├─ Slack (Bolt)             │  │
                     │  │  ├─ Agent runtime (pi-mono)  │  │
                     │  │  ├─ Cron / Heartbeat         │  │
                     │  │  └─ WebSocket server (:18789)│  │
                     │  └────────────────────────────┘  │
                     │           │                       │
                     │           ▼                       │
                     │  ┌────────────────┐               │
                     │  │  EFS Mount     │               │
                     │  │  /home/node/   │               │
                     │  │  ~/.openclaw/  │               │
                     │  └────────────────┘               │
                     └──────────────────────────────────┘

Deployment Details

Container Image

OpenClaw already publishes Docker images:

  • ghcr.io/openclaw/openclaw:latest (stable)
  • ghcr.io/openclaw/openclaw:main (latest main)
  • Base: node:22-bookworm
  • Can build custom with docker-setup.sh

EFS for Persistent State

Mount an EFS filesystem to persist:

  • ~/.openclaw/ (config, sessions, pairing store, secrets, WhatsApp auth)
  • ~/.openclaw/workspace/ (AGENTS.md, SOUL.md, MEMORY.md, daily notes)

EFS is ideal here because:

  • Shared access if you ever run multiple tasks (blue/green deploys)
  • Survives task restarts, deployments, Fargate spot interruptions
  • Low-latency NFS for the small files OpenClaw uses
  • Cost: ~$0.30/GB-month (Infrequent Access even cheaper)

Fargate Task Sizing

OpenClaw is mostly I/O-bound (waiting on LLM APIs, channel WS):

Config vCPU Memory Monthly Cost (on-demand, us-east-1)
Minimal 0.25 0.5 GB ~$9/mo
Recommended 0.5 1 GB ~$18/mo
With browser 1 2 GB ~$36/mo
Heavy (coding agents) 2 4 GB ~$72/mo

ARM (Graviton) is ~20% cheaper than x86. OpenClaw's Docker image supports both.

Fargate Spot could save up to 70%, but spot interruptions would kill channel connections (WhatsApp re-auth is painful). Not recommended for the gateway.

Savings Plans: 1-year commitment saves ~50%. For an always-on personal assistant, this makes sense.

Networking

Outbound (channels):

  • Task in a private subnet with NAT Gateway for internet egress
  • Or: task in public subnet with public IP (simpler, slightly less secure)
  • All channel connections (WhatsApp WS, Discord WS, Telegram polling) work through NAT

Inbound (webhooks, clients):

  • ALB for HTTPS termination (Telegram webhooks, Slack Events API, WebChat)
  • NLB for raw TCP/WebSocket passthrough
  • Or: no LB at all if using only outbound channels (WhatsApp Baileys doesn't need inbound)
  • Alternative: Tailscale sidecar container for private access

Security Groups:

  • Outbound: allow all (channels need various ports/IPs)
  • Inbound: port 18789 from ALB/NLB only (or restricted IPs)

Service Configuration

{
  "serviceName": "openclaw-gateway",
  "taskDefinition": "openclaw-gateway",
  "desiredCount": 1,
  "launchType": "FARGATE",
  "deploymentConfiguration": {
    "minimumHealthyPercent": 0,
    "maximumPercent": 100
  }
}

Key: desiredCount: 1 — OpenClaw is single-instance by design (one WhatsApp session). Use minimumHealthyPercent: 0 for rolling deploys (brief downtime is fine for a personal assistant).

Health Check

  • Container health: curl http://localhost:18789/ (Control UI responds)
  • Or: implement a lightweight /health endpoint
  • ECS will restart the task if health checks fail

What Still Needs Work

1. WhatsApp Re-Auth on Restart

WhatsApp Baileys stores session auth in the filesystem. With EFS, this persists across task restarts. But if the task is replaced (new deployment, Fargate maintenance), the WS connection drops and needs to reconnect. Baileys handles this automatically if the auth state is intact (on EFS).

Risk: LOW if using EFS. Baileys reconnects with stored creds.

2. No macOS/iOS Integration

Fargate containers can't run macOS APIs. No iMessage, no Voice Wake, no camera.

Mitigation: Run OpenClaw nodes (iOS/macOS/Android) at home, connecting to the Fargate gateway via Tailscale or WS tunnel.

3. Browser Tool

Playwright/Chromium needs more memory (2+ GB recommended). Runs fine in containers but adds cost.

Alternative: Use the OpenClaw Docker sandbox for browser isolation.

4. Signal

signal-cli is a Java subprocess. Runs in the container but adds ~200MB+ to image size and memory usage.

5. Gateway Token / Auth

With a public ALB, you need gateway.auth.token or gateway.auth.password set. Store in Secrets Manager, inject via ECS task definition environment/secrets.

Cost Comparison

Hosting Option Monthly Cost Effort
Fargate (0.5 vCPU, 1GB) $18 + $5 EFS + $3 NAT = **$26/mo** Moderate (CDK/Terraform)
Fargate w/ Savings Plan $13 + $5 + $3 = **$21/mo** Same + commitment
EC2 t4g.micro ~$6/mo (or free tier) Manual ops
EC2 t4g.small ~$12/mo Manual ops
Lightsail (1GB) $5/mo Easiest
Hetzner VPS (CX22) ~$4/mo Non-AWS

Fargate is more expensive than raw EC2/Lightsail, but you get:

  • Auto-restart on crash
  • No OS patching
  • Easy deploys (update image, ECS rolls)
  • CloudWatch integration
  • IAM task roles (for Bedrock)

Minimum Viable Fargate Deployment

  1. VPC: Default VPC or simple 2-AZ setup
  2. EFS: One filesystem, mounted at /home/node
  3. Fargate Service: 1 task, 0.5 vCPU / 1 GB, ARM64
  4. ALB (optional): Only if using webhook-based channels or remote access
  5. NAT Gateway: For outbound internet (channel connections, LLM APIs)
  6. Secrets Manager: Gateway token, API keys
  7. IAM Task Role: Bedrock access for LLM calls
  8. CloudWatch Logs: Container stdout/stderr

IaC Options

  • CDK: Best for AWS-native, type-safe infra
  • Terraform: More portable
  • Copilot CLI: Fastest to prototype (copilot initcopilot deploy)

Deploy Flow

# Build & push image
docker build -t openclaw-gateway .
docker tag openclaw-gateway:latest <account>.dkr.ecr.<region>.amazonaws.com/openclaw:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/openclaw:latest

# Update ECS service (rolls to new image)
aws ecs update-service --cluster openclaw --service openclaw-gateway --force-new-deployment

vs AgentCore Runtime

Dimension Fargate AgentCore
Architecture match Long-lived daemon Request-driven, ephemeral
Channel connections Persistent WS Killed on idle
State persistence EFS Ephemeral (need Memory service)
Code changes needed Minimal (Docker already works) Major rewrite
Scheduling Built-in (gateway cron) External (EventBridge)
Session isolation Same container Per-session microVM
Scaling Manual (desiredCount) Auto
Cost model Pay for uptime Pay for CPU usage

Bottom line: Fargate is "run what you have on AWS." AgentCore would be "rewrite OpenClaw as a different kind of system."


Research completed 2026-03-10. Sources: OpenClaw Docker docs, AWS Fargate pricing page, EFS/Fargate integration docs.