diff --git a/README.md b/README.md index 1433d322..718a991a 100644 --- a/README.md +++ b/README.md @@ -42,7 +42,7 @@ This demo highlights the core developer experience and "Agentic Infrastructure" 2. **State Persistence:** Persistent working memory (volatile RAM) and filesystem state preserved perfectly across hibernation cycles via full-state snapshots. 3. **Agent Swarm Multiplexing:** Demonstrates 30x+ oversubscription by "juggling" a large registry of stateful actors onto a small pool of shared physical pods. -To reproduce this demo in your own cluster, please refer to the detailed walkthroughs in the **[Counter Demo](demos/counter/README.md)** and **[Secret Agent Demo](demos/agent-secret/README.md)**. +To reproduce this demo in your own cluster, please refer to the detailed walkthroughs in the **[Counter Demo](demos/counter/README.md)** and **[Secret Agent Demo](demos/agent-secret/README.md)**, and **[OpenClaw Multiplexing](demos/openclaw/README.md)**. For more videos and walkthroughs, visit our YouTube channel: **[agent-substrate](https://www.youtube.com/channel/UCN9PPqlTtVxlcpbQ-NWpfZQ)**. @@ -53,6 +53,7 @@ Agent Substrate is designed to be **framework and agent harness agnostic**. Beca * **Agent Development Kit (ADK):** Native support for ADK-compatible session identity and persistent working memory. * **LangChain:** Ideal execution environment for long-running, stateful LangChain agents and sandboxed tool-calling. * **Claude Code & CodeX:** Support for high-density, stateful coding environments that preserve terminal and filesystem state across sessions. +* **OpenClaw (Google Claw):** Optimized for multiplexing stateful TypeScript agents with persistent in-memory reasoning and conversation history. * **Model Context Protocol (MCP):** Deploy secure, sandboxed MCP servers as Substrate Actors to provide durable tools for any LLM. ## Ecosystem & Examples @@ -188,6 +189,7 @@ We provide several sample applications demonstrating Agent Substrate's capabilit 2. **[Sandbox Demo (Antigravity)](demos/sandbox/README.md)**: A secure, sandboxed execution environment (running Alpine Linux) that allows arbitrary shell execution while preserving filesystem state across sessions. 3. **[Claude Code Multiplex](demos/claude-code-multiplex/README.md)**: Demonstrates oversubscribing physical hardware by multiplexing multiple Claude Code agents onto a limited pool of workers. 4. **[Secret Agent](demos/agent-secret/README.md)**: Highlights Substrate's "Zero-Idle" self-suspension and re-animation of volatile process memory. +5. **[OpenClaw Multiplexing](demos/openclaw/README.md)**: Showcases 1.5x hardware oversubscription using Google Claw agents, demonstrating stateful rehydration and rapid agent rotation across physical pods. ### Documentation & Guides * [API Configuration Guide](docs/api-guide.md): Detailed reference for configuring WorkerPools, ActorTemplates, Secrets, and Volumes. diff --git a/demos/openclaw/DEMO_SCRIPT.md b/demos/openclaw/DEMO_SCRIPT.md new file mode 100644 index 00000000..d4764364 --- /dev/null +++ b/demos/openclaw/DEMO_SCRIPT.md @@ -0,0 +1,51 @@ +# OpenClaw on Substrate: "Liquid Hardware" Demo Script + +This document provides a structured narrative for recording the OpenClaw-on-Substrate PoC demonstration. + +## **Metadata** +* **Environment**: `http://` +* **Logical Identities**: Claw-Luna (Blue 🟦), Claw-Mars (Pink 🟪), Claw-Nova (Gold 🟨) +* **Physical Constraint**: 2 Worker Pods (Replica Pool) +* **Core Value**: 1.5x Hardware Oversubscription without state loss. + +--- + +## **Phase 1: The Static Constraint (Setup)** +* **Action**: Open the dashboard. Ensure history is clear (Click **Reset Dashboard** if needed). +* **Narrative**: + > "Welcome to the OpenClaw Substrate PoC. Today we're demonstrating the next evolution of AI infrastructure: **Liquid Hardware**. + > + > Look at the bottom of the screen. We have **three logical agents**—Luna, Mars, and Nova—but we're only paying for **two physical worker pods**. In a traditional cloud setup, one agent would be permanently offline or require a slow cold-boot. With Substrate, hardware flows where the tasks are." + +## **Phase 2: Individual Process Rehydration** +* **Action**: Click **Give a task**. Wait for the agent to transition to `RESUMING`. +* **Narrative**: + > "I'll assign a task to Claw-Luna. Watch the 'Actors' panel. Luna is currently **RESUMING**. + > + > Substrate is reaching into Google Cloud Storage, pulling Luna's exact memory snapshot, and rehydrating it into one of our two worker pods. This isn't just starting a container—it's restoring a live process state in about 5 seconds." +* **Action**: Wait for task to move to `RUNNING`. Point to the **Live Logs**. + > "Now Luna is **RUNNING**. You can see the live telemetry in the pod log. Once the task completes, Substrate will automatically checkpoint the state and free the pod for the next agent." + +## **Phase 3: High-Concurrency Contention (The Pulse)** +* **Action**: Click **Pulse (10 Tasks)**. +* **Narrative**: + > "Now, let's put the system under pressure. I'm assigning 10 parallel tasks across all three agents. + > + > Watch the dashboard come alive. With 3 agents fighting for 2 slots, Substrate is performing a high-speed multiplex. Luna, Mars, and Nova are constantly swapping positions. When one agent finishes a short 3-second job, Substrate immediately 'hot-swaps' it for a queued agent." +* **Visual Cue**: Point out the **`SUSPENDING` (Orange)** and **`RESUMING` (Yellow)** badges flashing as the rotation happens. + +## **Phase 4: Latency & Cost Efficiency** +* **Action**: Scroll to the **Approximate Cost** card. +* **Narrative**: + > "This fluidity is made possible by our snapshot performance. We're currently seeing a **1.2-second suspend latency**. While resume is currently 5 seconds from a cold GCS fetch, moving this to a local SSD cache would bring us to sub-second rehydration. + > + > The business impact is clear: We are hosting **1.5x more agents** on the same physical hardware, reducing our simulated OpenClaw infrastructure costs by 33% while maintaining 100% state persistence. + > + > This is Liquid Hardware. This is OpenClaw on Substrate." + +--- + +## **Recording Tips** +1. **Cursor Movement**: Use slow, deliberate mouse movements to highlight the panels you are discussing. +2. **Timing**: Don't rush Phase 2. Let the viewer see the `RESUMING` -> `RUNNING` transition clearly before hitting the Pulse. +3. **The Reveal**: Ensure the **Live Logs** are visible during the Pulse so the viewer sees the agent ownership (telemetry) switching on the same pod name. diff --git a/demos/openclaw/Dockerfile b/demos/openclaw/Dockerfile new file mode 100644 index 00000000..d88a2766 --- /dev/null +++ b/demos/openclaw/Dockerfile @@ -0,0 +1,73 @@ +# Google Claw on Agent Substrate PoC +# Portable Dockerfile for OSS Substrate Migration + +# Stage 1: Build the standalone bundles +FROM node:22-slim AS builder + +WORKDIR /app + +# Install build dependencies +RUN apt-get update && apt-get install -y --no-install-recommends \ + ca-certificates \ + curl \ + && rm -rf /var/lib/apt/lists/* + +# Copy standalone package files +COPY package.json ./ +# Use npm install for simplicity and portability in the standalone package +RUN npm install + +# Copy source code +COPY src/ ./src/ + +# Build zero-dependency bundles +RUN ./node_modules/.bin/esbuild src/agent.ts \ + --bundle \ + --platform=node \ + --target=node22 \ + --outfile=dist/agent.js \ + --external:node:* + +RUN ./node_modules/.bin/esbuild src/demo-ui.ts \ + --bundle \ + --platform=node \ + --target=node22 \ + --outfile=dist/demo-ui.js \ + --external:node:* + +# Stage 2: Final Production Image +FROM node:22-slim AS runner + +WORKDIR /app + +# Copy the entire context to check for local binaries +COPY . . + +# Install runtime dependencies (tini for signal forwarding, kubectl for dashboard sync) +RUN apt-get update && apt-get install -y --no-install-recommends \ + ca-certificates \ + curl \ + tini \ + && curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" \ + && chmod +x kubectl \ + && mv kubectl /usr/local/bin/ \ + && rm -rf /var/lib/apt/lists/* + +# Copy built assets +COPY --from=builder /app/dist/ ./dist/ +# Copy kubectl-ate binary if it exists in context, otherwise download it +# This makes the Dockerfile portable across environments +RUN if [ -f "./kubectl-ate" ]; then \ + mv ./kubectl-ate /usr/local/bin/kubectl-ate; \ + else \ + curl -L -o /usr/local/bin/kubectl-ate https://github.com/agent-substrate/substrate/releases/latest/download/kubectl-ate-linux-amd64; \ + fi && chmod +x /usr/local/bin/kubectl-ate + +# Create a /pause hook for Substrate rehydration +RUN echo '#!/bin/sh' > /pause && \ + echo 'echo "[pause] Starting Google Claw agent..."' >> /pause && \ + echo 'exec /usr/bin/tini -- /usr/local/bin/node /app/dist/agent.js' >> /pause && \ + chmod +x /pause + +# Default entrypoint (can be overridden by deployment to run demo-ui) +ENTRYPOINT ["/usr/bin/tini", "--", "node", "dist/agent.js"] diff --git a/demos/openclaw/README.md b/demos/openclaw/README.md new file mode 100644 index 00000000..fedb0cdc --- /dev/null +++ b/demos/openclaw/README.md @@ -0,0 +1,213 @@ +# OpenClaw on Agent Substrate: Multiplexing Demo + +A high-density demonstration of three stateful **OpenClaw** agents (`Claw-Luna`, `Claw-Mars`, `Claw-Nova`) sharing two physical **Agent Substrate** worker pods. This PoC showcases **Liquid Hardware**: Substrate automatically suspends idle agents and rehydrates them on-demand, allowing a cluster to host significantly more logical agents than physical compute slots. + +**Live Demo URL:** [http://136.119.224.22](http://136.119.224.22) (Internal/GCP) + +> [!NOTE] +> This demo intentionally provisions **two pods for three agents** to force hardware contention. Substrate manages the state teleportation (checkpointing to GCS), ensuring that process memory (task counters) survives migration between physical pods. + +## System Information + +- **Google Claw Version**: `2026.3.14` +- **Substrate Mode**: Multi-Actor Multiplexing (1.5x oversubscription) +- **Runtime**: Node.js 22 (Debian Slim) +- **Isolation**: gVisor (runsc) + +## What this shows + +- **High-Density Multiplexing**: Three logical OpenClaw identities running on only two physical pods (1.5x oversubscription). +- **State Persistence**: A `taskCounter` maintained in the Node.js process memory survives multiple suspend/resume cycles. +- **Dynamic Rotation**: Agents finish work at different times (3-6s), forcing Substrate to constantly rotate pod ownership. +- **Visual Identity Tracking**: Color-coded agents (Blue/Pink/Gold) and live log tailing to make infrastructure sharing intuitively obvious. + +## Audience + +This guide is intended for engineers exploring Agent Substrate for hosting large-scale agentic workloads where cost-efficiency and stateful rehydration are critical. + +## Prerequisites + +- **Agent Substrate Cluster**: A Kubernetes cluster with Substrate installed. +- **Docker**: For building and pushing the unified actor/UI image. +- **GCS Bucket**: Configured for Substrate state snapshots (e.g., `gs://snapshot-substrate-gke-ai-eco-dev/`). +- **kubectl & kubectl-ate**: The Substrate CLI tool for managing logical actors. + +## Components + +| Path | Purpose | +|---|---| +| `substrate/src/agent.ts` | The workload: A Hono server with persistent memory state. | +| `substrate/src/demo-ui.ts` | The dashboard: A Node.js backend providing live logs, task queueing, and visual tracking. | +| `substrate/manifests/worker-pool.yaml` | The physical pool configuration (2 replicas). | +| `substrate/manifests/actor-template.yaml` | The logical identity definition (snapshots, container spec). | +| `substrate/manifests/valkey-init.yaml` | Utility Job for re-initializing the Valkey metadata store. | +| `substrate/Dockerfile` | Unified OCI image containing both the actor workload and the dashboard UI. | +| `substrate/DEMO_SCRIPT.md` | The narrative script for the demonstration recording. | + +## How to Run + +### 1. Provision Hardware +Scale the physical `WorkerPool` to the desired replica count (2 for this demo): +```bash +kubectl apply -f substrate/manifests/worker-pool.yaml +``` + +### 2. Deploy logical Agents +Create the three "fun-named" actors using the Substrate CLI. +```bash + + + +``` + +### 3. Launch the Dashboard +The dashboard runs as a standard Kubernetes Deployment with a LoadBalancer. +```bash +kubectl apply -f substrate/manifests/demo-ui.yaml +``` + +## Drive the Demo + +Open the dashboard and use the following interaction patterns: + +- **Pulse (10 Tasks)**: The primary demo button. It parallelizes 10 tasks across the registry. Watch the **colored icons** rapidly cycle through the 2 worker slots. +- **Live Logs**: Observe the pod log cards. You will see different Agent IDs appearing in the **same log stream**, proving that physical hardware is being recycled in real-time. + +## Integrating a Real LLM API + +Integrating an LLM into an OpenClaw logical actor is straightforward. Because Substrate persists the **entire process memory**, any in-memory conversation history or KV-cache will survive multiple suspend/resume cycles without requiring an external database. + +### 1. Add the LLM SDK +Add your preferred SDK (e.g., OpenAI or Anthropic) to the `substrate/package.json`: +```bash +npm install openai +``` + +### 2. Update the Actor Logic +Modify `substrate/src/agent.ts` to initialize the client and maintain a local chat history: +```typescript +import OpenAI from "openai"; + +const openai = new OpenAI({ apiKey: process.env.LLM_API_KEY }); +let history: any[] = []; // This array will survive Substrate snapshots! + +app.post("/v1/chat", async (c) => { + const { message } = await c.req.json(); + history.push({ role: "user", content: message }); + + const response = await openai.chat.completions.create({ + model: "gpt-4", + messages: history, + }); + + const aiMessage = response.choices[0].message; + history.push(aiMessage); + return c.json(aiMessage); +}); +``` + +### 3. Provide the API Key +Add the credential to the environment variables in `substrate/manifests/actor-template.yaml`: +```yaml +spec: + containers: + - name: agent + env: + - name: LLM_API_KEY + value: "sk-proj-..." # Or use a Kubernetes Secret reference +``` + +### 4. Rebuild & Deploy +Rebuild the image and Substrate will automatically pick up the new logic for any resumed actors. + +## Teardown + +```bash +kubectl delete -f substrate/manifests/demo-ui.yaml + +kubectl delete -f substrate/manifests/worker-pool.yaml +``` + +## Nuances & Workarounds + +This demo handles several environment-specific challenges to ensure stable multiplexing: + +- **Debian-Based Runtime**: Both the builder and runner use `node:22-slim` to ensure `glibc` parity during gVisor checkpointing. Alpine/Musl images are avoided to prevent snapshot corruption. +- **Tini Wrapper**: The `/pause` hook and the Node.js process are wrapped in `tini` to ensure signals are forwarded correctly, preventing zombie processes during the gVisor freeze cycle. +- **Valkey Recovery**: In the event of a "split-brain" cluster state (where Substrate loses track of free workers), the `valkey-init.yaml` Job is provided to reset the metadata hash slots. +- **Hermetic Bundling**: `esbuild` is used to create zero-dependency binaries for the actor and UI, ensuring that rehydration doesn't fail due to missing `node_modules` in the restored process tree. + +## Project Structure + +This folder is a standalone Node.js package, decoupled from the main Google Claw repository for easy migration to the [OSS Substrate repository](https://github.com/agent-substrate/substrate). + +```text +substrate/ +├── src/ # Standalone Hono source code (Actor & UI) +├── manifests/ # Kubernetes & Agent Substrate YAMLs +├── scripts/ # Environment-agnostic deployment utilities +├── demo/OpenClaw/ # High-fidelity recording script +├── Dockerfile # Self-contained build definition +├── package.json # Decoupled dependencies (Hono, esbuild) +├── tsconfig.json # Independent TypeScript configuration +└── README.md # Integrated documentation & System Info +``` +## The Claw Agent Pattern + +The core of this demo is the `ClawAgent` class found in `workload/agent.ts`. This class demonstrates the "stateful actor" pattern: + +1. **Native State**: The agent logic and state (like `taskCounter`) live in standard TypeScript variables. +2. **Infrastructure Rehydration**: Substrate transparently snapshots the entire process memory to GCS. When an agent is resumed on a different physical pod, this memory is rehydrated exactly as it was. +3. **No External DB Required**: Reasoning history, LLM context, and local state survive without the need for an external database or state-management code. + +## Code Navigation + +The OpenClaw demo is organized into specialized subdirectories to separate the agent logic from the demonstration infrastructure: + +- **`workload/`**: Contains the core agent logic. + - `agent.ts`: The stateful Node.js (Hono) server that runs inside the logical actors. This is where you implement reasoning logic and in-memory state management. +- **`ui/`**: Contains the demonstration dashboard. + - `demo-ui.ts`: The backend logic for the real-time dashboard, including the "Proactive Preemption" scheduler and state synchronization. +- **`manifests/`**: Kubernetes and Agent Substrate resource definitions. + - `actor-template.yaml`: Defines the logical agent identity, including container images and state storage locations. + - `worker-pool.yaml`: Configures the physical compute pool (Pods) that host the actors. +- **`scripts/`**: Automation for deployment and testing. + - `deploy-substrate-poc.sh`: A unified script for provisioning the environment. + +## Setup & Reproduction Guide + +To reproduce this demo in your own cluster, follow these steps: + +### 1. Build the Unified Image +The Dockerfile is self-contained and builds both the actor workload and the dashboard UI. +\`\`\`bash +cd demos/openclaw +docker build -t . +docker push +\`\`\` + +### 2. Configure Manifests +Update the image field in \`manifests/actor-template.yaml\` and \`manifests/demo-ui.yaml\` to point to your built image. Also, ensure the \`location\` field in \`actor-template.yaml\` points to a valid GCS/S3 bucket for state storage. + +### 3. Deploy the Environment +\`\`\`bash +# 1. Provision the worker pool (2 pods) +./hack/install-ate.sh --deploy-demo-openclaw + +# 2. Define the agent template + + +# 3. Create 3 logical agents + + + + +# 4. Launch the dashboard + +\`\`\` + +### 4. Verify Multiplexing +- Access the dashboard via the LoadBalancer IP. +- Click **Pulse (10 Tasks)**. +- Observe the **Worker Pods** section; you will see 3 agents rotating through 2 available slots. +- Check the **Live Logs**; logs from different Agent IDs will appear in the same pod log stream, proving stateful rehydration. diff --git a/demos/openclaw/openclaw-multiplex.yaml.tmpl b/demos/openclaw/openclaw-multiplex.yaml.tmpl new file mode 100644 index 00000000..9d16f470 --- /dev/null +++ b/demos/openclaw/openclaw-multiplex.yaml.tmpl @@ -0,0 +1,146 @@ +# Copyright 2026 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v1 +kind: Namespace +metadata: + name: openclaw + +--- + +apiVersion: v1 +kind: ServiceAccount +metadata: + name: demo-ui + namespace: openclaw + +--- + +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: demo-ui-cluster-role +rules: +- apiGroups: [""] + resources: ["pods", "services", "pods/log"] + verbs: ["get", "list", "watch"] +- apiGroups: [""] + resources: ["pods/portforward"] + verbs: ["create"] +- apiGroups: ["ate.dev"] + resources: ["*"] + verbs: ["*"] + +--- + +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: demo-ui-cluster-rb +subjects: +- kind: ServiceAccount + name: demo-ui + namespace: openclaw +roleRef: + kind: ClusterRole + name: demo-ui-cluster-role + apiGroup: rbac.authorization.k8s.io + +--- + +apiVersion: ate.dev/v1alpha1 +kind: WorkerPool +metadata: + name: agent-pool + namespace: openclaw +spec: + replicas: 2 + ateomImage: ko://github.com/agent-substrate/substrate/cmd/ateom-gvisor + +--- + +apiVersion: ate.dev/v1alpha1 +kind: ActorTemplate +metadata: + name: openclaw-agent + namespace: openclaw +spec: + runsc: + amd64: + url: "gs://gvisor/releases/nightly/2026-05-19/x86_64/runsc" + sha256Hash: "a397be1abc2420d26bce6c70e6e2ff96c73aaaab929756c56f5e2089ea842b63" + arm64: + url: "gs://gvisor/releases/nightly/2026-05-19/aarch64/runsc" + sha256Hash: "1ba2366ae2efceba166046f51a4104f9261c9cb72c6db8f5b3fe2dc57dea86b9" + pauseImage: "registry.k8s.io/pause:3.10.2@sha256:f548e0e8e3dc1896ca956272154dde3314e8cc4fde0a57577ee9fa1c63f5baf4" + containers: + - name: agent + image: ${OPENCLAW_IMAGE} + command: ["/usr/bin/tini", "--", "/usr/local/bin/node", "/app/dist/agent.js"] + ports: + - containerPort: 8080 + env: + - name: OPENCLAW_SANDBOX_BACKEND + value: "native" + workerPoolRef: + name: agent-pool + namespace: openclaw + snapshotsConfig: + location: gs://${BUCKET_NAME}/openclaw-agent/ + +--- + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: demo-ui + namespace: openclaw +spec: + replicas: 1 + selector: + matchLabels: + app: demo-ui + template: + metadata: + labels: + app: demo-ui + spec: + serviceAccountName: demo-ui + containers: + - name: ui + image: ${OPENCLAW_IMAGE} + imagePullPolicy: Always + command: ["node", "dist/demo-ui.js"] + ports: + - containerPort: 8090 + env: + - name: PORT + value: "8090" + - name: DEMO_NAMESPACE + value: "openclaw" + +--- + +apiVersion: v1 +kind: Service +metadata: + name: demo-ui + namespace: openclaw +spec: + selector: + app: demo-ui + ports: + - port: 80 + targetPort: 8090 + type: LoadBalancer diff --git a/demos/openclaw/package.json b/demos/openclaw/package.json new file mode 100644 index 00000000..71b49279 --- /dev/null +++ b/demos/openclaw/package.json @@ -0,0 +1,20 @@ +{ + "name": "openclaw-substrate-poc", + "version": "1.0.0", + "description": "Google Claw on Agent Substrate PoC", + "private": true, + "scripts": { + "build": "esbuild src/agent.ts --bundle --platform=node --target=node22 --outfile=dist/agent.js --external:node:* && esbuild src/demo-ui.ts --bundle --platform=node --target=node22 --outfile=dist/demo-ui.js --external:node:*", + "start:ui": "node dist/demo-ui.js", + "start:agent": "node dist/agent.js" + }, + "dependencies": { + "@hono/node-server": "^1.11.1", + "hono": "^4.4.2" + }, + "devDependencies": { + "@types/node": "^22.0.0", + "esbuild": "^0.21.5", + "typescript": "^5.5.2" + } +} diff --git a/demos/openclaw/tsconfig.json b/demos/openclaw/tsconfig.json new file mode 100644 index 00000000..ebd4c702 --- /dev/null +++ b/demos/openclaw/tsconfig.json @@ -0,0 +1,14 @@ +{ + "compilerOptions": { + "target": "ESNext", + "module": "ESNext", + "moduleResolution": "bundler", + "strict": true, + "skipLibCheck": true, + "isolatedModules": true, + "esModuleInterop": true, + "allowImportingTsExtensions": true, + "noEmit": true + }, + "include": ["./**/*"] +} diff --git a/demos/openclaw/tsdown.config.ts b/demos/openclaw/tsdown.config.ts new file mode 100644 index 00000000..dcdc59ee --- /dev/null +++ b/demos/openclaw/tsdown.config.ts @@ -0,0 +1,23 @@ +import { defineConfig } from "tsdown"; + +const env = { + NODE_ENV: "production", + OPENCLAW_SANDBOX_BACKEND: "native", +}; + +export default defineConfig({ + entry: { + "substrate/actor-wrapper": "substrate/workload/actor-wrapper.ts", + "substrate/demo-ui": "substrate/ui/demo-ui.ts", + }, + env, + fixedExtension: false, + platform: "node", + // Ensure we bundle everything + unbundle: false, + deps: { + // Only externalize core node modules + external: [/node:/, "fs", "path", "os", "child_process", "crypto", "http", "https", "net", "url", "util", "zlib", "stream", "events", "tty", "readline", "dns", "buffer"], + skipNodeModulesBundle: false, + }, +}); diff --git a/demos/openclaw/ui/demo-ui.ts b/demos/openclaw/ui/demo-ui.ts new file mode 100644 index 00000000..0a73ea63 --- /dev/null +++ b/demos/openclaw/ui/demo-ui.ts @@ -0,0 +1,698 @@ +import { Hono } from "hono"; +import { serve } from "@hono/node-server"; +import { exec } from "node:child_process"; + +const app = new Hono(); + +// --- Configuration --- +const DEMO_NAMESPACE = process.env.DEMO_NAMESPACE || "openclaw"; + +// --- State Management --- +const predefinedTasks = [ + "Analyze repo for security vulnerabilities", + "Summarize latest PR for team review", + "Write unit tests for the message gateway", + "Refactor the plugin discovery logic", + "Draft a response to Buganizer b/392182", + "Generate a cost report for GKE nodes", + "Optimize the gVisor memory mapping", + "Verify snapshot integrity on GCS", +]; + +interface Assignment { + id: string; + agent: string; + task: string; + state: "queued" | "running" | "completed"; + durationSec: number; + created_at: number; + started_at?: number; + completed_at?: number; +} + +let assignments: Assignment[] = []; +let taskCursor = 0; + +const AGENT_META: Record = { + "Claw-Luna": { color: "#79c0ff", emoji: "🟦", id: "agent-luna" }, + "Claw-Mars": { color: "#ff79c6", emoji: "🟪", id: "agent-mars" }, + "Claw-Nova": { color: "#f1fa8c", emoji: "🟨", id: "agent-nova" }, +}; + +const ID_TO_DISPLAY: Record = Object.entries(AGENT_META).reduce((acc, [display, meta]) => { + acc[meta.id] = display; + return acc; +}, {} as Record); + +// --- Shared State Cache --- +let clusterState = { + pods: [] as any[], + actors: [] as any[], +}; + +const agentLocks: Record = {}; +const podLogLocks: Record = {}; + +const nowSec = () => Date.now() / 1000; + +// Operation Queue for STATE-CHANGING commands only (resume/suspend) +let commandQueue: { cmd: string, resolve: (val: string) => void, reject: (err: any) => void }[] = []; +let isProcessingQueue = false; + +const runCmd = (cmd: string): Promise => { + return new Promise((resolve, reject) => { + exec(cmd, (error, stdout, stderr) => { + if (error) reject(new Error(stderr || error.message)); + else resolve(stdout); + }); + }); +}; + +const enqueueLifecycleCmd = (cmd: string): Promise => { + return new Promise((resolve, reject) => { + commandQueue.push({ cmd, resolve, reject }); + processQueue(); + }); +}; + +async function processQueue() { + if (isProcessingQueue || commandQueue.length === 0) return; + isProcessingQueue = true; + + const { cmd, resolve, reject } = commandQueue.shift()!; + + exec(cmd, (error, stdout, stderr) => { + isProcessingQueue = false; + if (error) reject(new Error(stderr || error.message)); + else resolve(stdout); + setTimeout(processQueue, 20); + }); +} + +// --- Background State Syncer --- +async function syncState() { + try { + const [actorsOut, podsOut] = await Promise.all([ + runCmd("kubectl-ate get actors -o json"), + runCmd(`kubectl get pods -n ${DEMO_NAMESPACE} -l app=agent-pool -o json`) + ]); + + const actors = JSON.parse(actorsOut).actors || []; + const podsRaw = JSON.parse(podsOut).items || []; + + clusterState.actors = actors.filter((a: any) => (a.actorId || a.actor_id).startsWith("agent-")).map((a: any) => { + const id = a.actorId || a.actor_id; + return { + name: id, + displayName: ID_TO_DISPLAY[id] || id, + template: a.actorTemplateName || a.actor_template_name || "openclaw-agent", + phase: a.status.replace("STATUS_", ""), + pod: a.ateomPodName || a.ateom_pod_name || "none", + ip: a.ateomPodIp || a.ateom_pod_ip || "n/a", + status: a.status + }; + }); + + clusterState.pods = podsRaw.map((p: any) => { + const activeActor = actors.find((a: any) => (a.ateomPodName || a.ateom_pod_name) === p.metadata.name); + const actorId = activeActor ? (activeActor.actorId || activeActor.actor_id) : "idle"; + return { + name: p.metadata.name, + phase: p.status.phase, + ready: p.status.containerStatuses?.[0]?.ready || false, + ip: p.status.podIP, + activeActor: ID_TO_DISPLAY[actorId] || "idle" + }; + }); + } catch (e) { + console.error("[syncer] Sync error:", e); + } + setTimeout(syncState, 400); +} + +// --- High-Utilization Substrate Scheduler --- +async function schedulerLoop() { + const activeAssignments = assignments.filter(a => a.state !== "completed"); + const queuedAgents = new Set(activeAssignments.filter(a => a.state === "queued").map(a => AGENT_META[a.agent].id)); + + // 1. PROGRESS: Move assignments through lifecycle FIRST + for (const a of activeAssignments) { + const actorMeta = AGENT_META[a.agent]; + const actor = clusterState.actors.find((act: any) => act.name === actorMeta.id); + + if (!actor) continue; + if (agentLocks[actor.name]) continue; + + const status = actor.status; + + // Transition: Queued -> Running (Confirmed by Substrate) + if (a.state === "queued" && status === "STATUS_RUNNING") { + a.state = "running"; + a.started_at = nowSec(); + const ms = a.durationSec * 1000; + runCmd(`curl -s -X POST http://${actor.ip}:8080/task -d '{"durationMs": ${ms}}'`).catch(() => {}); + continue; + } + + // Transition: Running -> Completed + if (a.state === "running" && a.started_at && (nowSec() - a.started_at) > a.durationSec) { + a.state = "completed"; + a.completed_at = nowSec(); + // Only suspend if no other tasks are queued for THIS agent + const moreTasksForMe = activeAssignments.some(other => other.agent === a.agent && other.state !== "completed" && other.id !== a.id); + if (!moreTasksForMe) { + console.log(`[scheduler] Agent ${actor.displayName} finished all tasks. Suspending.`); + agentLocks[actor.name] = true; + enqueueLifecycleCmd(`kubectl-ate suspend actor ${actor.name}`).finally(() => { agentLocks[actor.name] = false; }); + } + continue; + } + + // Demand: Resume + if (a.state === "queued" && status === "STATUS_SUSPENDED") { + console.log(`[scheduler] Agent ${actor.displayName} has queued tasks. Resuming.`); + agentLocks[actor.name] = true; + enqueueLifecycleCmd(`kubectl-ate resume actor ${actor.name}`).catch(async (e: any) => { + if (e.message.includes("no free workers available")) { + // Preempt a running agent that has ZERO tasks (queued or running) + const contender = clusterState.actors.find((act: any) => { + if (act.status !== "STATUS_RUNNING" || agentLocks[act.name]) return false; + const actHasWork = activeAssignments.some(asg => AGENT_META[asg.agent].id === act.name); + return !actHasWork; + }); + + if (contender) { + console.log(`[scheduler] Contention! Preempting idle ${contender.displayName} for busy ${actor.displayName}`); + agentLocks[contender.name] = true; + await enqueueLifecycleCmd(`kubectl-ate suspend actor ${contender.name}`).finally(() => { agentLocks[contender.name] = false; }); + } else { + // Fallback: If ALL running agents are busy, we just have to wait for one to finish. + } + } + }).finally(() => { agentLocks[actor.name] = false; }); + continue; + } + } + + // 2. CLEANUP: Suspend agents that are RUNNING but have ZERO tasks (queued or running), IF others are waiting + if (queuedAgents.size > 0) { + for (const actor of clusterState.actors) { + const hasAnyTasks = activeAssignments.some(a => AGENT_META[a.agent].id === actor.name); + if (actor.status === "STATUS_RUNNING" && !hasAnyTasks && !agentLocks[actor.name]) { + console.log(`[scheduler] Proactive Cleanup: ${actor.displayName} is idle, freeing pod for queued tasks.`); + agentLocks[actor.name] = true; + enqueueLifecycleCmd(`kubectl-ate suspend actor ${actor.name}`).finally(() => { agentLocks[actor.name] = false; }); + } + } + } + + setTimeout(schedulerLoop, 500); +} + +syncState(); +schedulerLoop(); + +// --- Dashboard Implementation --- + +app.get("/", (c) => { + return c.html(` + + + + + +OpenClaw Substrate Demo + + + +
+

OpenClaw multiplex demo

+
connecting…
+
+ +

+ This demo runs 3 OpenClaw agents on + 2 substrate worker pods. Each agent maintains its own in-memory context and state. + While an agent is idle, substrate can suspend it + (snapshot its process state, free its pod) and let a different agent borrow + that pod — that’s the multiplex. The dashboard + refreshes every second; pods, actors, and logs are all live so you + can watch the rotation happen. +

+ +
+ Approximate cost while running + GCP infrastructure: ~$0.40/hr + OpenClaw (3 agents): ~$1/hr simulated + Total: ~$1.40/hr typical + + GCP figure is one n2-standard-8 VM in us-central1. OpenClaw figure + assumes lightweight reasoning multiplexed across 2 pods. Substrate enables + us to host 1.5x more agents on the same hardware without losing state. + +
+ +
+
+

Tasks

+
+ + + +
+ +
+

+ Click Give a task to assign a randomly chosen task to a + randomly chosen agent. Each task moves through three states — + queued + while the agent is suspended, + running + while it owns a worker pod, and + completed + once it finishes and substrate suspends it again. +

+
+
no tasks yet — click Give a task to start
+
+
+ +
+
+

Worker pods

+

+ The pool of substrate-managed pods that actually host running agents. The + WorkerPool is configured with 2 replicas for + 3 agents, so substrate is forced to share — at any + moment at most 2 agents own pods, and the third is suspended waiting its + turn. Substrate rotates ownership as agents transition between active and + idle phases. +

+
loading…
+
+
+

Actors & templates

+

+ ActorTemplates define each agent (container image, + prompt, idle interval). Actors are the live instances + bound — or not — to a worker pod. Watch the + phase: Running means the actor is on a pod + executing right now; Suspended means its state is stored and + it’s waiting for a free pod; Resuming / + Suspending are the substrate transitions in between. +

+
loading…
+
+
+ +

Live logs (per pod, last 25 lines)

+

+ Each card below tails the logs of one worker pod. Because substrate moves + agents between pods, the log stream you see in a single card switches + ownership over time. +

+
+ +
+ Google Claw v2026.3.14 + Agent Substrate PoC +
+ + + + + `); +}); + +// API Implementation + +app.get("/api/pods", async (c) => { + return c.json({ pods: clusterState.pods }); +}); + +app.get("/api/actors", async (c) => { + return c.json({ actors: clusterState.actors }); +}); + +app.get("/api/logs/:pod", async (c) => { + const pod = c.req.param("pod"); + if (podLogLocks[pod]) return c.json({ logs: "(fetching...)" }); + podLogLocks[pod] = true; + try { + const logs = await runCmd("kubectl logs -n " + DEMO_NAMESPACE + " " + pod + " --tail=25"); + return c.json({ logs: logs }); + } catch (e: any) { + return c.json({ logs: "(error fetching logs)" }); + } finally { + podLogLocks[pod] = false; + } +}); + +app.get("/api/task-status", async (c) => { + return c.json({ assignments: [...assignments].reverse() }); +}); + +app.post("/api/reset", async (c) => { + assignments = []; + // Proactively suspend all actors via the authenticated service account + for (const id of ["agent-luna", "agent-mars", "agent-nova"]) { + enqueueLifecycleCmd(`kubectl-ate suspend actor ${id}`).catch(() => {}); + } + return c.json({ success: true }); +}); + +app.post("/api/give-task", async (c) => { + try { + const agentDisplays = Object.keys(AGENT_META); + const targetAgentDisplay = agentDisplays[taskCursor % agentDisplays.length]; + taskCursor++; + const targetTask = predefinedTasks[Math.floor(Math.random() * predefinedTasks.length)]; + const durationSec = Math.floor(Math.random() * 4) + 3; // Random 3-6s + const asg: Assignment = { + id: "asg-" + Date.now() + "-" + Math.floor(Math.random()*1000), + agent: targetAgentDisplay, + task: targetTask, + state: "queued", + durationSec: durationSec, + created_at: nowSec(), + }; + assignments.push(asg); + if (assignments.length > 50) assignments.shift(); + return c.json(asg); + } catch (e: any) { + return c.json({ error: e.message }, 500); + } +}); + +const port = process.env.PORT ? parseInt(process.env.PORT) : 8090; +serve({ fetch: app.fetch, port }); diff --git a/demos/openclaw/workload/agent.ts b/demos/openclaw/workload/agent.ts new file mode 100644 index 00000000..76d9ba36 --- /dev/null +++ b/demos/openclaw/workload/agent.ts @@ -0,0 +1,90 @@ +import { Hono } from "hono"; +import { serve } from "@hono/node-server"; + +/** + * OpenClaw Stateful Agent + * + * This class represents the logical agent. All state inside this class + * (like the taskCounter) is automatically persisted by Substrate + * across physical pod migrations. + */ +class ClawAgent { + private taskCounter: number = 0; + private readonly actorId: string; + + constructor() { + this.actorId = process.env.ATE_ACTOR_ID || "unknown"; + console.log(`[ClawAgent] Identity ${this.actorId} initialized.`); + } + + public async performTask(durationMs: number) { + this.taskCounter++; + console.log(`[ClawAgent] Starting task. Counter: ${this.taskCounter}. Working for ${durationMs}ms...`); + await new Promise((resolve) => setTimeout(resolve, durationMs)); + console.log(`[ClawAgent] Task completed.`); + return { success: true, count: this.taskCounter }; + } + + public getSecret(body: string) { + this.taskCounter++; + const identity = `AGENT-${this.actorId.slice(0, 4).toUpperCase()}`; + return `Identity: ${identity} | Session: "${this.actorId}" | TaskCount: ${this.taskCounter} | Input: ${body}\n`; + } + + public getStatus() { + return { + actorId: this.actorId, + taskCounter: this.taskCounter, + uptime: Math.floor(process.uptime()), + status: "healthy", + }; + } + + public incrementCounter() { + this.taskCounter++; + return this.taskCounter; + } +} + +const agent = new ClawAgent(); +const app = new Hono(); + +// --- Substrate Demo API --- + +// T1: Standard Counter Demo +app.get("/v1/counter", (c) => { + const count = agent.incrementCounter(); + return c.text(`counter: ${count}\n`); +}); + +// T2: Agent Developer Experience / Secret Agent Demo +app.post("/v1/agent-secret", async (c) => { + const body = await c.req.text(); + return c.text(agent.getSecret(body)); +}); + +// --- Lifecycle & Health Endpoints --- + +app.get("/state", (c) => { + return c.json(agent.getStatus()); +}); + +app.post("/task", async (c) => { + const body = await c.req.json(); + const result = await agent.performTask(body.durationMs || 1000); + return c.json({ ...result, actorId: agent.getStatus().actorId }); +}); + +const port = process.env.PORT ? parseInt(process.env.PORT) : 8080; +console.log(`[agent] OpenClaw Actor starting on port ${port}`); + +serve({ + fetch: app.fetch, + port, +}); + +// Periodic heartbeat +setInterval(() => { + const status = agent.getStatus(); + console.log(`[agent] Heartbeat: count=${status.taskCounter}, uptime=${status.uptime}s`); +}, 10000); diff --git a/hack/install-ate.sh b/hack/install-ate.sh index 8e043e85..826cf9ea 100755 --- a/hack/install-ate.sh +++ b/hack/install-ate.sh @@ -44,6 +44,7 @@ source "${ROOT}"/hack/install-demo-counter.sh source "${ROOT}"/hack/install-demo-sandbox.sh source "${ROOT}"/hack/install-demo-claude-code-multiplex.sh source "${ROOT}"/hack/install-demo-agent-secret.sh +source "${ROOT}"/hack/install-demo-openclaw.sh # ANSI color codes for prettier output COLOR_CYAN='\033[1;36m' diff --git a/hack/install-demo-openclaw.sh b/hack/install-demo-openclaw.sh new file mode 100644 index 00000000..1104b0ce --- /dev/null +++ b/hack/install-demo-openclaw.sh @@ -0,0 +1,87 @@ +# Copyright 2026 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This is sourced as part of install-ate.sh. Do not run directly. + +ATE_DEMOS+=(demo-openclaw) # register demo-openclaw + +demo-openclaw_cmdline() { + case "${1}" in + --deploy-demo-openclaw) demo-openclaw_deploy ;; + --delete-demo-openclaw) demo-openclaw_delete ;; + *) + return 1 + ;; + esac + return 0 +} + +# Build the unified image (UI + Workload), push to ${KO_DOCKER_REPO}, and echo +# the resolved digest-pinned reference. +# This is a TypeScript application, so it uses docker buildx rather than ko. +demo-openclaw_build_image() { + local repo="${KO_DOCKER_REPO}/openclaw-demo" + local stage_tag="${repo}:build-$(date +%s)" + docker buildx build \ + --platform=linux/amd64 \ + --push \ + -t "${stage_tag}" \ + demos/openclaw >&2 + local digest + digest=$(docker buildx imagetools inspect "${stage_tag}" --format '{{json .}}' \ + | jq -r '.manifest.digest') + if [[ -z "${digest}" || "${digest}" == "null" ]]; then + echo "Failed to resolve openclaw image digest from ${stage_tag}" >&2 + return 1 + fi + echo "${repo}@${digest}" +} + +demo-openclaw_deploy() { + log_step "demo-openclaw_deploy" + if [[ -z "${BUCKET_NAME:-}" ]]; then + echo "BUCKET_NAME must be set" >&2 + return 1 + fi + if [[ -z "${KO_DOCKER_REPO:-}" ]]; then + echo "KO_DOCKER_REPO must be set (see hack/ate-dev-env.sh.example)" >&2 + return 1 + fi + + local openclaw_image + openclaw_image=$(demo-openclaw_build_image) + if [[ -z "${openclaw_image}" ]]; then + return 1 + fi + log_step " openclaw image: ${openclaw_image}" + + sed -e "s|\${BUCKET_NAME}|${BUCKET_NAME}|g" \ + -e "s|\${OPENCLAW_IMAGE}|${openclaw_image}|g" \ + demos/openclaw/openclaw-multiplex.yaml.tmpl \ + | run_kubectl apply -f - +} + +demo-openclaw_delete() { + log_step "demo-openclaw_delete" + sed -e "s|\${BUCKET_NAME}|${BUCKET_NAME:-placeholder}|g" \ + -e "s|\${OPENCLAW_IMAGE}|placeholder|g" \ + demos/openclaw/openclaw-multiplex.yaml.tmpl \ + | run_kubectl delete --ignore-not-found -f - +} + +demo-openclaw_usage() { + echo "" + echo " Required env: BUCKET_NAME, KO_DOCKER_REPO" + echo " See demos/openclaw/README.md for the walkthrough." +} diff --git a/internal/ateompath/ateompath.go b/internal/ateompath/ateompath.go index a0a7ba66..80349105 100644 --- a/internal/ateompath/ateompath.go +++ b/internal/ateompath/ateompath.go @@ -22,7 +22,7 @@ import ( const ( // The base path. This is both the path of the root shared folder on the // host filesystem, and when it is mounted into ateom and atelet containers. - BasePath = "/run/ateom-gvisor" + BasePath = "/var/lib/ateom-gvisor" ) var ( diff --git a/internal/controllers/utils.go b/internal/controllers/utils.go index ea275e5c..aa074bb3 100644 --- a/internal/controllers/utils.go +++ b/internal/controllers/utils.go @@ -71,7 +71,7 @@ func createActorDeploymentSpec(name string, replicas int32, wpName string, ateom VolumeMounts: []corev1.VolumeMount{ { Name: "run-ateom", - MountPath: "/run/ateom-gvisor", + MountPath: "/var/lib/ateom-gvisor", }, }, }, @@ -85,7 +85,7 @@ func createActorDeploymentSpec(name string, replicas int32, wpName string, ateom Name: "run-ateom", VolumeSource: corev1.VolumeSource{ HostPath: &corev1.HostPathVolumeSource{ - Path: "/run/ateom-gvisor", + Path: "/var/lib/ateom-gvisor", Type: ptr.To(corev1.HostPathDirectoryOrCreate), }, }, diff --git a/manifests/ate-install/atelet.yaml b/manifests/ate-install/atelet.yaml index def5f21c..0938e9df 100644 --- a/manifests/ate-install/atelet.yaml +++ b/manifests/ate-install/atelet.yaml @@ -90,9 +90,9 @@ spec: protocol: TCP volumeMounts: - name: run-ateom - mountPath: /run/ateom-gvisor + mountPath: /var/lib/ateom-gvisor volumes: - name: run-ateom hostPath: - path: /run/ateom-gvisor + path: /var/lib/ateom-gvisor type: DirectoryOrCreate \ No newline at end of file