Skip to content

feat: add distributed mode#9124

Draft
mudler wants to merge 8 commits intomasterfrom
feat/distributed-mode
Draft

feat: add distributed mode#9124
mudler wants to merge 8 commits intomasterfrom
feat/distributed-mode

Conversation

@mudler
Copy link
Owner

@mudler mudler commented Mar 23, 2026

Description

The objective of this PR is to make LocalAI scalable horizontally, and delegate processing to remote gRPC LocalAI workers.

Distributed mode enables horizontal scaling of LocalAI across multiple machines using PostgreSQL for state and node registry, and NATS for real-time coordination. Unlike P2P mode, distributed mode is designed for production deployments and Kubernetes environments where you need centralized management, health monitoring, and deterministic routing. To enable this, you have to pass --distributed to LocalAI. A docker compose file is provided as well to start quickly the full stack with a single command.

Note: unlike other ways to run LocalAI, distributed mode requires authentication enabled with a PostgreSQL database — SQLite is not supported. This is because the node registry, job store, and other distributed state are stored in PostgreSQL tables.

Architecture:

                    ┌─────────────────┐
                    │   Load Balancer  │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
      ┌───────▼──────┐ ┌────▼─────┐ ┌─────▼──────┐
      │  Frontend #1 │ │ Frontend │ │ Frontend #N│
      │  (LocalAI)   │ │  #2      │ │  (LocalAI) │
      └──────┬───────┘ └────┬─────┘ └─────┬──────┘
             │              │              │
     ┌───────▼──────────────▼──────────────▼───────┐
     │              PostgreSQL + NATS               │
     │  (node registry, jobs, coordination)         │
     └───────┬──────────────┬──────────────┬───────┘
             │              │              │
      ┌──────▼──────┐ ┌────▼─────┐ ┌─────▼──────┐
      │  Worker #1  │ │ Worker   │ │ Worker #N   │
      │  (generic)  │ │ #2       │ │  (agent)  │
      └─────────────┘ └──────────┘ └────────────┘

Frontends are stateless LocalAI instances that receive API requests and route them to worker nodes via the SmartRouter. All frontends share state through PostgreSQL and coordinate via NATS.

Workers are generic processes that self-register with a frontend. They don't have a fixed backend type — the SmartRouter dynamically installs the required backend via NATS backend.install events when a model request arrives.

Scheduling Algorithm

The SmartRouter uses idle-first scheduling:

  1. If the model is already loaded on a node → use it (least in-flight)
  2. If no node has the model → prefer truly idle nodes (zero models, zero in-flight), trying to fit in nodes reported free VRAM/RAM

Nodes page:

Screenshot 2026-03-24 at 00-20-06 LocalAI

Notes for Reviewers

TODO:

  • Make sure we sync to nodes also files that are mentioned inside options (this is a bit more challenging) and mmproj files
  • re-use the vram detection logic to route models more efficiently to the nodes that have free vram, not only on capacity
  • Add hints in the UI on how to start workers
  • Backend management in distributed mode (should be able to install/delete backends as well)
  • Model management (if a model is deleted from the frontend, should be removed from the nodes too)
  • Dynamic auth tokens for nodes (Currently user have to specify a registration token manually in the frontend and in the workers have to be the same) -> went with approval/auto-approval mode
  • Distributed inferencing
  • Distributed quantization
  • Distributed fine-tuning
  • Agents in distributed mode
  • Skills in distributed mode
  • MCP jobs in distributed mode
  • Memory in distributed mode

Follow-ups:

  • The S3 implementation is provided as-is and I did not tested it (as requires another layer of machinery to be introduced here) however wirings are already in place - and it is of merely two files that are implementers of the relative interfaces, so the impact is really minimum. I'll make sure to mark it as experimental and gather feedback once merged in master.

Signed commits

  • Yes, I signed my commits.

@netlify
Copy link

netlify bot commented Mar 23, 2026

Deploy Preview for localai failed.

Name Link
🔨 Latest commit 318814a
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/69c5d73ce8a1a1000856d101

@mudler mudler force-pushed the feat/distributed-mode branch 3 times, most recently from 23f3831 to 5aa34de Compare March 24, 2026 22:53
@mudler mudler changed the title feat: add distributed mode (experimental) feat: add distributed mode Mar 24, 2026
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/distributed-mode branch from 5aa34de to f3db5fd Compare March 25, 2026 13:50
mudler added 3 commits March 25, 2026 22:53
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/distributed-mode branch 3 times, most recently from 274cbed to 3bc78b0 Compare March 26, 2026 08:51
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/distributed-mode branch from 3bc78b0 to 715aace Compare March 26, 2026 09:44
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/distributed-mode branch 6 times, most recently from 935cf29 to e35dbea Compare March 27, 2026 00:56
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/distributed-mode branch from e35dbea to b5dafe8 Compare March 27, 2026 01:00
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant