Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ This demo highlights the core developer experience and "Agentic Infrastructure"
2. **State Persistence:** Persistent working memory (volatile RAM) and filesystem state preserved perfectly across hibernation cycles via full-state snapshots.
3. **Agent Swarm Multiplexing:** Demonstrates 30x+ oversubscription by "juggling" a large registry of stateful actors onto a small pool of shared physical pods.

To reproduce this demo in your own cluster, please refer to the detailed walkthroughs in the **[Counter Demo](demos/counter/README.md)** and **[Secret Agent Demo](demos/agent-secret/README.md)**.
To reproduce this demo in your own cluster, please refer to the detailed walkthroughs in the **[Counter Demo](demos/counter/README.md)** and **[Secret Agent Demo](demos/agent-secret/README.md)**, and **[OpenClaw Multiplexing](demos/openclaw/README.md)**.

For more videos and walkthroughs, visit our YouTube channel: **[agent-substrate](https://www.youtube.com/channel/UCN9PPqlTtVxlcpbQ-NWpfZQ)**.

Expand All @@ -53,6 +53,7 @@ Agent Substrate is designed to be **framework and agent harness agnostic**. Beca
* **Agent Development Kit (ADK):** Native support for ADK-compatible session identity and persistent working memory.
* **LangChain:** Ideal execution environment for long-running, stateful LangChain agents and sandboxed tool-calling.
* **Claude Code & CodeX:** Support for high-density, stateful coding environments that preserve terminal and filesystem state across sessions.
* **OpenClaw (Google Claw):** Optimized for multiplexing stateful TypeScript agents with persistent in-memory reasoning and conversation history.
* **Model Context Protocol (MCP):** Deploy secure, sandboxed MCP servers as Substrate Actors to provide durable tools for any LLM.

## Ecosystem & Examples
Expand Down Expand Up @@ -188,6 +189,7 @@ We provide several sample applications demonstrating Agent Substrate's capabilit
2. **[Sandbox Demo (Antigravity)](demos/sandbox/README.md)**: A secure, sandboxed execution environment (running Alpine Linux) that allows arbitrary shell execution while preserving filesystem state across sessions.
3. **[Claude Code Multiplex](demos/claude-code-multiplex/README.md)**: Demonstrates oversubscribing physical hardware by multiplexing multiple Claude Code agents onto a limited pool of workers.
4. **[Secret Agent](demos/agent-secret/README.md)**: Highlights Substrate's "Zero-Idle" self-suspension and re-animation of volatile process memory.
5. **[OpenClaw Multiplexing](demos/openclaw/README.md)**: Showcases 1.5x hardware oversubscription using Google Claw agents, demonstrating stateful rehydration and rapid agent rotation across physical pods.

### Documentation & Guides
* [API Configuration Guide](docs/api-guide.md): Detailed reference for configuring WorkerPools, ActorTemplates, Secrets, and Volumes.
Expand Down
51 changes: 51 additions & 0 deletions demos/openclaw/DEMO_SCRIPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# OpenClaw on Substrate: "Liquid Hardware" Demo Script

This document provides a structured narrative for recording the OpenClaw-on-Substrate PoC demonstration.

## **Metadata**
* **Environment**: `http://<YOUR_DASHBOARD_IP>`
* **Logical Identities**: Claw-Luna (Blue 🟦), Claw-Mars (Pink 🟪), Claw-Nova (Gold 🟨)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who creates these 3 actors? is there a missing script?

* **Physical Constraint**: 2 Worker Pods (Replica Pool)
* **Core Value**: 1.5x Hardware Oversubscription without state loss.

---

## **Phase 1: The Static Constraint (Setup)**
* **Action**: Open the dashboard. Ensure history is clear (Click **Reset Dashboard** if needed).
* **Narrative**:
> "Welcome to the OpenClaw Substrate PoC. Today we're demonstrating the next evolution of AI infrastructure: **Liquid Hardware**.
>
> Look at the bottom of the screen. We have **three logical agents**—Luna, Mars, and Nova—but we're only paying for **two physical worker pods**. In a traditional cloud setup, one agent would be permanently offline or require a slow cold-boot. With Substrate, hardware flows where the tasks are."
## **Phase 2: Individual Process Rehydration**
* **Action**: Click **Give a task**. Wait for the agent to transition to `RESUMING`.
* **Narrative**:
> "I'll assign a task to Claw-Luna. Watch the 'Actors' panel. Luna is currently **RESUMING**.
>
> Substrate is reaching into Google Cloud Storage, pulling Luna's exact memory snapshot, and rehydrating it into one of our two worker pods. This isn't just starting a container—it's restoring a live process state in about 5 seconds."
* **Action**: Wait for task to move to `RUNNING`. Point to the **Live Logs**.
> "Now Luna is **RUNNING**. You can see the live telemetry in the pod log. Once the task completes, Substrate will automatically checkpoint the state and free the pod for the next agent."
## **Phase 3: High-Concurrency Contention (The Pulse)**
* **Action**: Click **Pulse (10 Tasks)**.
* **Narrative**:
> "Now, let's put the system under pressure. I'm assigning 10 parallel tasks across all three agents.
>
> Watch the dashboard come alive. With 3 agents fighting for 2 slots, Substrate is performing a high-speed multiplex. Luna, Mars, and Nova are constantly swapping positions. When one agent finishes a short 3-second job, Substrate immediately 'hot-swaps' it for a queued agent."
* **Visual Cue**: Point out the **`SUSPENDING` (Orange)** and **`RESUMING` (Yellow)** badges flashing as the rotation happens.

## **Phase 4: Latency & Cost Efficiency**
* **Action**: Scroll to the **Approximate Cost** card.
* **Narrative**:
> "This fluidity is made possible by our snapshot performance. We're currently seeing a **1.2-second suspend latency**. While resume is currently 5 seconds from a cold GCS fetch, moving this to a local SSD cache would bring us to sub-second rehydration.
>
> The business impact is clear: We are hosting **1.5x more agents** on the same physical hardware, reducing our simulated OpenClaw infrastructure costs by 33% while maintaining 100% state persistence.
>
> This is Liquid Hardware. This is OpenClaw on Substrate."
---

## **Recording Tips**
1. **Cursor Movement**: Use slow, deliberate mouse movements to highlight the panels you are discussing.
2. **Timing**: Don't rush Phase 2. Let the viewer see the `RESUMING` -> `RUNNING` transition clearly before hitting the Pulse.
3. **The Reveal**: Ensure the **Live Logs** are visible during the Pulse so the viewer sees the agent ownership (telemetry) switching on the same pod name.
73 changes: 73 additions & 0 deletions demos/openclaw/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Google Claw on Agent Substrate PoC
# Portable Dockerfile for OSS Substrate Migration

# Stage 1: Build the standalone bundles
FROM node:22-slim AS builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
&& rm -rf /var/lib/apt/lists/*

# Copy standalone package files
COPY package.json ./
# Use npm install for simplicity and portability in the standalone package
RUN npm install

# Copy source code
COPY src/ ./src/

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i needed to do

--- a/demos/openclaw/Dockerfile
+++ b/demos/openclaw/Dockerfile
@@ builder stage @@
 # Copy source code
-COPY src/ ./src/
+COPY workload/ ./workload/
+COPY ui/ ./ui/

 # Build zero-dependency bundles
-RUN ./node_modules/.bin/esbuild src/agent.ts \
+RUN ./node_modules/.bin/esbuild workload/agent.ts \
     --bundle --platform=node --target=node22 \
     --outfile=dist/agent.js --external:node:*

-RUN ./node_modules/.bin/esbuild src/demo-ui.ts \
+RUN ./node_modules/.bin/esbuild ui/demo-ui.ts \
     --bundle --platform=node --target=node22 \
     --outfile=dist/demo-ui.js --external:node:*


# Build zero-dependency bundles
RUN ./node_modules/.bin/esbuild src/agent.ts \
--bundle \
--platform=node \
--target=node22 \
--outfile=dist/agent.js \
--external:node:*

RUN ./node_modules/.bin/esbuild src/demo-ui.ts \
--bundle \
--platform=node \
--target=node22 \
--outfile=dist/demo-ui.js \
--external:node:*

# Stage 2: Final Production Image
FROM node:22-slim AS runner

WORKDIR /app

# Copy the entire context to check for local binaries
COPY . .

# Install runtime dependencies (tini for signal forwarding, kubectl for dashboard sync)
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
tini \
&& curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" \
&& chmod +x kubectl \
&& mv kubectl /usr/local/bin/ \
&& rm -rf /var/lib/apt/lists/*

# Copy built assets
COPY --from=builder /app/dist/ ./dist/
# Copy kubectl-ate binary if it exists in context, otherwise download it
# This makes the Dockerfile portable across environments
RUN if [ -f "./kubectl-ate" ]; then \
mv ./kubectl-ate /usr/local/bin/kubectl-ate; \
else \
curl -L -o /usr/local/bin/kubectl-ate https://github.com/agent-substrate/substrate/releases/latest/download/kubectl-ate-linux-amd64; \
fi && chmod +x /usr/local/bin/kubectl-ate

# Create a /pause hook for Substrate rehydration
RUN echo '#!/bin/sh' > /pause && \
echo 'echo "[pause] Starting Google Claw agent..."' >> /pause && \
echo 'exec /usr/bin/tini -- /usr/local/bin/node /app/dist/agent.js' >> /pause && \
chmod +x /pause

# Default entrypoint (can be overridden by deployment to run demo-ui)
ENTRYPOINT ["/usr/bin/tini", "--", "node", "dist/agent.js"]
213 changes: 213 additions & 0 deletions demos/openclaw/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# OpenClaw on Agent Substrate: Multiplexing Demo

A high-density demonstration of three stateful **OpenClaw** agents (`Claw-Luna`, `Claw-Mars`, `Claw-Nova`) sharing two physical **Agent Substrate** worker pods. This PoC showcases **Liquid Hardware**: Substrate automatically suspends idle agents and rehydrates them on-demand, allowing a cluster to host significantly more logical agents than physical compute slots.

**Live Demo URL:** [http://136.119.224.22](http://136.119.224.22) (Internal/GCP)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop these?


> [!NOTE]
> This demo intentionally provisions **two pods for three agents** to force hardware contention. Substrate manages the state teleportation (checkpointing to GCS), ensuring that process memory (task counters) survives migration between physical pods.

## System Information

- **Google Claw Version**: `2026.3.14`
- **Substrate Mode**: Multi-Actor Multiplexing (1.5x oversubscription)
- **Runtime**: Node.js 22 (Debian Slim)
- **Isolation**: gVisor (runsc)

## What this shows

- **High-Density Multiplexing**: Three logical OpenClaw identities running on only two physical pods (1.5x oversubscription).
- **State Persistence**: A `taskCounter` maintained in the Node.js process memory survives multiple suspend/resume cycles.
- **Dynamic Rotation**: Agents finish work at different times (3-6s), forcing Substrate to constantly rotate pod ownership.
- **Visual Identity Tracking**: Color-coded agents (Blue/Pink/Gold) and live log tailing to make infrastructure sharing intuitively obvious.

## Audience

This guide is intended for engineers exploring Agent Substrate for hosting large-scale agentic workloads where cost-efficiency and stateful rehydration are critical.

## Prerequisites

- **Agent Substrate Cluster**: A Kubernetes cluster with Substrate installed.
- **Docker**: For building and pushing the unified actor/UI image.
- **GCS Bucket**: Configured for Substrate state snapshots (e.g., `gs://snapshot-substrate-gke-ai-eco-dev/`).
- **kubectl & kubectl-ate**: The Substrate CLI tool for managing logical actors.

## Components

| Path | Purpose |
|---|---|
| `substrate/src/agent.ts` | The workload: A Hono server with persistent memory state. |
| `substrate/src/demo-ui.ts` | The dashboard: A Node.js backend providing live logs, task queueing, and visual tracking. |
| `substrate/manifests/worker-pool.yaml` | The physical pool configuration (2 replicas). |
| `substrate/manifests/actor-template.yaml` | The logical identity definition (snapshots, container spec). |
| `substrate/manifests/valkey-init.yaml` | Utility Job for re-initializing the Valkey metadata store. |
| `substrate/Dockerfile` | Unified OCI image containing both the actor workload and the dashboard UI. |
| `substrate/DEMO_SCRIPT.md` | The narrative script for the demonstration recording. |

## How to Run

### 1. Provision Hardware
Scale the physical `WorkerPool` to the desired replica count (2 for this demo):
```bash
kubectl apply -f substrate/manifests/worker-pool.yaml
```

### 2. Deploy logical Agents
Create the three "fun-named" actors using the Substrate CLI.
```bash



```

### 3. Launch the Dashboard
The dashboard runs as a standard Kubernetes Deployment with a LoadBalancer.
```bash
kubectl apply -f substrate/manifests/demo-ui.yaml
```

## Drive the Demo

Open the dashboard and use the following interaction patterns:

- **Pulse (10 Tasks)**: The primary demo button. It parallelizes 10 tasks across the registry. Watch the **colored icons** rapidly cycle through the 2 worker slots.
- **Live Logs**: Observe the pod log cards. You will see different Agent IDs appearing in the **same log stream**, proving that physical hardware is being recycled in real-time.

## Integrating a Real LLM API

Integrating an LLM into an OpenClaw logical actor is straightforward. Because Substrate persists the **entire process memory**, any in-memory conversation history or KV-cache will survive multiple suspend/resume cycles without requiring an external database.

### 1. Add the LLM SDK
Add your preferred SDK (e.g., OpenAI or Anthropic) to the `substrate/package.json`:
```bash
npm install openai
```

### 2. Update the Actor Logic
Modify `substrate/src/agent.ts` to initialize the client and maintain a local chat history:
```typescript
import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.LLM_API_KEY });
let history: any[] = []; // This array will survive Substrate snapshots!

app.post("/v1/chat", async (c) => {
const { message } = await c.req.json();
history.push({ role: "user", content: message });

const response = await openai.chat.completions.create({
model: "gpt-4",
messages: history,
});

const aiMessage = response.choices[0].message;
history.push(aiMessage);
return c.json(aiMessage);
});
```

### 3. Provide the API Key
Add the credential to the environment variables in `substrate/manifests/actor-template.yaml`:
```yaml
spec:
containers:
- name: agent
env:
- name: LLM_API_KEY
value: "sk-proj-..." # Or use a Kubernetes Secret reference
```

### 4. Rebuild & Deploy
Rebuild the image and Substrate will automatically pick up the new logic for any resumed actors.

## Teardown

```bash
kubectl delete -f substrate/manifests/demo-ui.yaml

kubectl delete -f substrate/manifests/worker-pool.yaml
```

## Nuances & Workarounds

This demo handles several environment-specific challenges to ensure stable multiplexing:

- **Debian-Based Runtime**: Both the builder and runner use `node:22-slim` to ensure `glibc` parity during gVisor checkpointing. Alpine/Musl images are avoided to prevent snapshot corruption.
- **Tini Wrapper**: The `/pause` hook and the Node.js process are wrapped in `tini` to ensure signals are forwarded correctly, preventing zombie processes during the gVisor freeze cycle.
- **Valkey Recovery**: In the event of a "split-brain" cluster state (where Substrate loses track of free workers), the `valkey-init.yaml` Job is provided to reset the metadata hash slots.
- **Hermetic Bundling**: `esbuild` is used to create zero-dependency binaries for the actor and UI, ensuring that rehydration doesn't fail due to missing `node_modules` in the restored process tree.

## Project Structure

This folder is a standalone Node.js package, decoupled from the main Google Claw repository for easy migration to the [OSS Substrate repository](https://github.com/agent-substrate/substrate).

```text
substrate/
├── src/ # Standalone Hono source code (Actor & UI)
├── manifests/ # Kubernetes & Agent Substrate YAMLs
├── scripts/ # Environment-agnostic deployment utilities
├── demo/OpenClaw/ # High-fidelity recording script
├── Dockerfile # Self-contained build definition
├── package.json # Decoupled dependencies (Hono, esbuild)
├── tsconfig.json # Independent TypeScript configuration
└── README.md # Integrated documentation & System Info
```
## The Claw Agent Pattern

The core of this demo is the `ClawAgent` class found in `workload/agent.ts`. This class demonstrates the "stateful actor" pattern:

1. **Native State**: The agent logic and state (like `taskCounter`) live in standard TypeScript variables.
2. **Infrastructure Rehydration**: Substrate transparently snapshots the entire process memory to GCS. When an agent is resumed on a different physical pod, this memory is rehydrated exactly as it was.
3. **No External DB Required**: Reasoning history, LLM context, and local state survive without the need for an external database or state-management code.

## Code Navigation

The OpenClaw demo is organized into specialized subdirectories to separate the agent logic from the demonstration infrastructure:

- **`workload/`**: Contains the core agent logic.
- `agent.ts`: The stateful Node.js (Hono) server that runs inside the logical actors. This is where you implement reasoning logic and in-memory state management.
- **`ui/`**: Contains the demonstration dashboard.
- `demo-ui.ts`: The backend logic for the real-time dashboard, including the "Proactive Preemption" scheduler and state synchronization.
- **`manifests/`**: Kubernetes and Agent Substrate resource definitions.
- `actor-template.yaml`: Defines the logical agent identity, including container images and state storage locations.
- `worker-pool.yaml`: Configures the physical compute pool (Pods) that host the actors.
- **`scripts/`**: Automation for deployment and testing.
- `deploy-substrate-poc.sh`: A unified script for provisioning the environment.

## Setup & Reproduction Guide

To reproduce this demo in your own cluster, follow these steps:

### 1. Build the Unified Image
The Dockerfile is self-contained and builds both the actor workload and the dashboard UI.
\`\`\`bash
cd demos/openclaw
docker build -t <YOUR_IMAGE_TAG> .
docker push <YOUR_IMAGE_TAG>
\`\`\`

### 2. Configure Manifests
Update the image field in \`manifests/actor-template.yaml\` and \`manifests/demo-ui.yaml\` to point to your built image. Also, ensure the \`location\` field in \`actor-template.yaml\` points to a valid GCS/S3 bucket for state storage.

### 3. Deploy the Environment
\`\`\`bash
# 1. Provision the worker pool (2 pods)
./hack/install-ate.sh --deploy-demo-openclaw

# 2. Define the agent template


# 3. Create 3 logical agents




# 4. Launch the dashboard

\`\`\`

### 4. Verify Multiplexing
- Access the dashboard via the LoadBalancer IP.
- Click **Pulse (10 Tasks)**.
- Observe the **Worker Pods** section; you will see 3 agents rotating through 2 available slots.
- Check the **Live Logs**; logs from different Agent IDs will appear in the same pod log stream, proving stateful rehydration.
Loading
Loading