# OpenClaw on ECS Fargate — Analysis ## TL;DR **Fargate is the natural AWS home for OpenClaw.** Unlike AgentCore, Fargate's model is a long-running container with persistent storage — exactly what OpenClaw needs. The existing Docker support means this is largely a deployment/ops exercise, not a rewrite. ## Why Fargate Works | OpenClaw Need | Fargate Support | |---|---| | Long-lived daemon process | ✅ ECS Services run indefinitely (no idle timeout) | | Persistent outbound WS (WhatsApp, Discord) | ✅ Outbound connections stay alive as long as the task runs | | Persistent filesystem | ✅ EFS volume mount for all state | | Inbound WS/HTTP (clients, webhooks) | ✅ Via ALB or NLB | | Shell exec / PTY | ✅ Full Linux container, exec works | | Cron / Heartbeat | ✅ Runs inside the gateway process as normal | | Node.js ≥22 | ✅ Any Node version in your container image | | ARM64 support | ✅ Fargate supports ARM (Graviton) — cheaper | ## Architecture ``` Internet / Messaging APIs │ ▼ ┌──────────────┐ ┌──────────────────────────────────┐ │ ALB / NLB │────▶│ ECS Fargate Task │ │ (optional) │ │ ┌────────────────────────────┐ │ └──────────────┘ │ │ OpenClaw Gateway │ │ │ │ (Node.js, always-on) │ │ │ │ │ │ │ │ ├─ WhatsApp (Baileys WS) │ │ │ │ ├─ Discord (discord.js WS) │ │ │ │ ├─ Telegram (grammY) │ │ │ │ ├─ Slack (Bolt) │ │ │ │ ├─ Agent runtime (pi-mono) │ │ │ │ ├─ Cron / Heartbeat │ │ │ │ └─ WebSocket server (:18789)│ │ │ └────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────┐ │ │ │ EFS Mount │ │ │ │ /home/node/ │ │ │ │ ~/.openclaw/ │ │ │ └────────────────┘ │ └──────────────────────────────────┘ ``` ## Deployment Details ### Container Image OpenClaw already publishes Docker images: - `ghcr.io/openclaw/openclaw:latest` (stable) - `ghcr.io/openclaw/openclaw:main` (latest main) - Base: `node:22-bookworm` - Can build custom with `docker-setup.sh` ### EFS for Persistent State Mount an EFS filesystem to persist: - `~/.openclaw/` (config, sessions, pairing store, secrets, WhatsApp auth) - `~/.openclaw/workspace/` (AGENTS.md, SOUL.md, MEMORY.md, daily notes) EFS is ideal here because: - Shared access if you ever run multiple tasks (blue/green deploys) - Survives task restarts, deployments, Fargate spot interruptions - Low-latency NFS for the small files OpenClaw uses - Cost: ~$0.30/GB-month (Infrequent Access even cheaper) ### Fargate Task Sizing OpenClaw is mostly I/O-bound (waiting on LLM APIs, channel WS): | Config | vCPU | Memory | Monthly Cost (on-demand, us-east-1) | |---|---|---|---| | **Minimal** | 0.25 | 0.5 GB | ~$9/mo | | **Recommended** | 0.5 | 1 GB | ~$18/mo | | **With browser** | 1 | 2 GB | ~$36/mo | | **Heavy (coding agents)** | 2 | 4 GB | ~$72/mo | ARM (Graviton) is ~20% cheaper than x86. OpenClaw's Docker image supports both. **Fargate Spot** could save up to 70%, but spot interruptions would kill channel connections (WhatsApp re-auth is painful). **Not recommended** for the gateway. **Savings Plans**: 1-year commitment saves ~50%. For an always-on personal assistant, this makes sense. ### Networking **Outbound (channels)**: - Task in a private subnet with NAT Gateway for internet egress - Or: task in public subnet with public IP (simpler, slightly less secure) - All channel connections (WhatsApp WS, Discord WS, Telegram polling) work through NAT **Inbound (webhooks, clients)**: - ALB for HTTPS termination (Telegram webhooks, Slack Events API, WebChat) - NLB for raw TCP/WebSocket passthrough - Or: no LB at all if using only outbound channels (WhatsApp Baileys doesn't need inbound) - Alternative: Tailscale sidecar container for private access **Security Groups**: - Outbound: allow all (channels need various ports/IPs) - Inbound: port 18789 from ALB/NLB only (or restricted IPs) ### Service Configuration ```json { "serviceName": "openclaw-gateway", "taskDefinition": "openclaw-gateway", "desiredCount": 1, "launchType": "FARGATE", "deploymentConfiguration": { "minimumHealthyPercent": 0, "maximumPercent": 100 } } ``` Key: `desiredCount: 1` — OpenClaw is single-instance by design (one WhatsApp session). Use `minimumHealthyPercent: 0` for rolling deploys (brief downtime is fine for a personal assistant). ### Health Check - Container health: `curl http://localhost:18789/` (Control UI responds) - Or: implement a lightweight `/health` endpoint - ECS will restart the task if health checks fail ## What Still Needs Work ### 1. WhatsApp Re-Auth on Restart WhatsApp Baileys stores session auth in the filesystem. With EFS, this persists across task restarts. But if the task is replaced (new deployment, Fargate maintenance), the WS connection drops and needs to reconnect. Baileys handles this automatically if the auth state is intact (on EFS). **Risk**: LOW if using EFS. Baileys reconnects with stored creds. ### 2. No macOS/iOS Integration Fargate containers can't run macOS APIs. No iMessage, no Voice Wake, no camera. **Mitigation**: Run OpenClaw nodes (iOS/macOS/Android) at home, connecting to the Fargate gateway via Tailscale or WS tunnel. ### 3. Browser Tool Playwright/Chromium needs more memory (2+ GB recommended). Runs fine in containers but adds cost. **Alternative**: Use the OpenClaw Docker sandbox for browser isolation. ### 4. Signal `signal-cli` is a Java subprocess. Runs in the container but adds ~200MB+ to image size and memory usage. ### 5. Gateway Token / Auth With a public ALB, you need `gateway.auth.token` or `gateway.auth.password` set. Store in Secrets Manager, inject via ECS task definition environment/secrets. ## Cost Comparison | Hosting Option | Monthly Cost | Effort | |---|---|---| | **Fargate (0.5 vCPU, 1GB)** | ~$18 + $5 EFS + $3 NAT = **~$26/mo** |Moderate (CDK/Terraform) | | **Fargate w/ Savings Plan** | ~$13 + $5 + $3 = **~$21/mo** | Same + commitment | | **EC2 t4g.micro** | ~$6/mo (or free tier) | Manual ops | | **EC2 t4g.small** | ~$12/mo | Manual ops | | **Lightsail (1GB)** | **$5/mo** | Easiest | | **Hetzner VPS (CX22)** | **~$4/mo** | Non-AWS | Fargate is more expensive than raw EC2/Lightsail, but you get: - Auto-restart on crash - No OS patching - Easy deploys (update image, ECS rolls) - CloudWatch integration - IAM task roles (for Bedrock) ## Recommended Setup ### Minimum Viable Fargate Deployment 1. **VPC**: Default VPC or simple 2-AZ setup 2. **EFS**: One filesystem, mounted at `/home/node` 3. **Fargate Service**: 1 task, 0.5 vCPU / 1 GB, ARM64 4. **ALB** (optional): Only if using webhook-based channels or remote access 5. **NAT Gateway**: For outbound internet (channel connections, LLM APIs) 6. **Secrets Manager**: Gateway token, API keys 7. **IAM Task Role**: Bedrock access for LLM calls 8. **CloudWatch Logs**: Container stdout/stderr ### IaC Options - **CDK**: Best for AWS-native, type-safe infra - **Terraform**: More portable - **Copilot CLI**: Fastest to prototype (`copilot init` → `copilot deploy`) ### Deploy Flow ```bash # Build & push image docker build -t openclaw-gateway . docker tag openclaw-gateway:latest .dkr.ecr..amazonaws.com/openclaw:latest docker push .dkr.ecr..amazonaws.com/openclaw:latest # Update ECS service (rolls to new image) aws ecs update-service --cluster openclaw --service openclaw-gateway --force-new-deployment ``` ## vs AgentCore Runtime | Dimension | Fargate | AgentCore | |---|---|---| | Architecture match | ✅ Long-lived daemon | ❌ Request-driven, ephemeral | | Channel connections | ✅ Persistent WS | ❌ Killed on idle | | State persistence | ✅ EFS | ❌ Ephemeral (need Memory service) | | Code changes needed | Minimal (Docker already works) | Major rewrite | | Scheduling | ✅ Built-in (gateway cron) | ❌ External (EventBridge) | | Session isolation | Same container | Per-session microVM | | Scaling | Manual (desiredCount) | Auto | | Cost model | Pay for uptime | Pay for CPU usage | **Bottom line**: Fargate is "run what you have on AWS." AgentCore would be "rewrite OpenClaw as a different kind of system." --- *Research completed 2026-03-10. Sources: OpenClaw Docker docs, AWS Fargate pricing page, EFS/Fargate integration docs.*