Skip to content

Latest commit

 

History

History
307 lines (249 loc) · 12.5 KB

File metadata and controls

307 lines (249 loc) · 12.5 KB

Providers & Factory Integration

Audience: contributors/operator tweaking model registry and provider configs.
Nav: Docs index · Quickstart · Safety · AI SDK

Maestro loads model/provider metadata from multiple locations so you can mix built-in configs with Factory CLI settings. This page clarifies the resolution order and how to customize providers.

Config Sources

src/models/registry.ts builds the registry from:

  1. Built-in defaults (shipped with Maestro)
  2. Factory data:
    • ~/.factory/config.json
    • ~/.factory/settings.json
  3. Maestro config:
    • ~/.maestro/models.json (legacy path)
    • ~/.maestro/config.json (via MAESTRO_CONFIG)
  4. Env overrides:
    • MAESTRO_MODELS_FILE=/path/to/custom.json

Paths are read in that order, later entries overriding earlier ones.

The Rust control plane keeps the same local fallback, and can also hydrate GET /api/models from the llm-gateway model catalog before applying local Maestro overrides:

  • MAESTRO_LLM_GATEWAY_MODELS_URL points directly at the catalog endpoint.
  • MAESTRO_LLM_GATEWAY_URL derives the catalog URL as <base>/v1/models.
  • MAESTRO_LLM_GATEWAY_TOKEN is sent as a bearer token when set.
  • MAESTRO_LLM_GATEWAY_ORG_ID is sent as X-Organization-ID when set.
  • MAESTRO_LLM_GATEWAY_TIMEOUT_MS defaults to 2500.

If the gateway URL is unset, unavailable, or returns invalid JSON, Maestro falls back to the built-in models and ~/.maestro/models.json.

Format

Custom config files accept:

{
  "providers": [
    {
      "id": "anthropic",
      "name": "Anthropic",
      "baseUrl": "https://proxy.example.com/v1/messages",
      "headers": { "X-Proxy-User": "alice" }
    },
    {
      "id": "my-provider",
      "name": "My Provider",
      "api": "openai-responses",
      "baseUrl": "https://api.example.com/v1",
      "apiKeyEnv": "MY_PROVIDER_API_KEY",
      "models": [
        {
          "id": "my-model",
          "name": "My Model",
          "reasoning": false,
          "contextWindow": 128000,
          "maxTokens": 4096
        }
      ]
    }
  ],
  "aliases": {
    "fast": "anthropic/claude-haiku"
  }
}

Factory files follow their own schema; Maestro maps Factory model IDs to providers internally (see factoryDataCache.modelProviderMap).

OpenAI-compat overrides

Some OpenAI-compatible vendors require small request-shape tweaks (token field, developer role support, tool result quirks, etc.). You can override Maestro’s auto-detection per model via compat:

{
  "providers": [
    {
      "id": "mistral",
      "name": "Mistral",
      "api": "openai-completions",
      "baseUrl": "https://api.mistral.ai/v1",
      "models": [
        {
          "id": "mistral-large",
          "name": "Mistral Large",
          "contextWindow": 128000,
          "maxTokens": 8192,
          "compat": {
            "maxTokensField": "max_tokens",
            "requiresToolResultName": true,
            "requiresThinkingAsText": true,
            "requiresMistralToolIds": true
          }
        }
      ]
    }
  ]
}

Supported compat fields:

  • supportsStore (bool) – whether to send store: false (OpenAI only).
  • supportsDeveloperRole (bool) – if false, Maestro uses system instead.
  • supportsReasoningEffort (bool) – gates reasoning_effort.
  • supportsResponsesApi (bool) – allow openai-responses against this endpoint.
  • maxTokensField"max_tokens" vs "max_completion_tokens".
  • requiresToolResultName (bool) – include name on tool result messages.
  • requiresAssistantAfterToolResult (bool) – insert a synthetic assistant bridge.
  • requiresThinkingAsText (bool) – wraps thinking blocks into <thinking> text.
  • requiresMistralToolIds (bool) – normalize tool call IDs to Mistral’s 9‑char form.

Common OpenAI-compatible defaults:

  • OpenAI: supportsStore=true, supportsDeveloperRole=true, supportsReasoningEffort=true, maxTokensField="max_completion_tokens"
  • Azure/OpenRouter/Groq/Cerebras: supportsStore=false, supportsDeveloperRole=false, supportsReasoningEffort=false, maxTokensField="max_tokens"

Override-only providers

If a provider entry omits models, it is treated as an override for built-in providers (matched by id). In this mode, baseUrl and headers are applied to all built-in models for that provider. Provider headers are merged with model headers (model-specific headers win).

Provider Loaders

Some providers need runtime detection (API keys, regions). The PROVIDER_LOADERS map injects defaults:

Provider Behavior
anthropic Adds anthropic-beta: prompt-caching-2024-07-31 header
bedrock Uses AWS_PROFILE / AWS_ACCESS_KEY_ID to toggle enabled
vertex-ai Reads GOOGLE_CLOUD_PROJECT/GCP_PROJECT for base URL
groq Auto-enables when GROQ_API_KEY is present
... (See src/models/registry.ts for the full list)

Chinese Model Providers (DeepSeek, Kimi, Qwen, MiniMax, GLM)

Maestro ships built-in support for the major Chinese frontier providers. All of them expose OpenAI-compatible Chat Completions endpoints, so they use api: "openai-completions" and the standard OpenAI request shape (no custom compat flags required). Reasoning models that stream a reasoning_content field (DeepSeek Reasoner, Kimi Thinking, MiniMax M-series, GLM) surface their chain-of-thought automatically.

Provider provider id Default base URL (international) China-mainland base URL API key env
DeepSeek deepseek https://api.deepseek.com/v1 (same) DEEPSEEK_API_KEY
Moonshot (Kimi) moonshot https://api.moonshot.ai/v1 https://api.moonshot.cn/v1 MOONSHOT_API_KEY (or KIMI_API_KEY)
Alibaba Qwen (DashScope) dashscope https://dashscope-intl.aliyuncs.com/compatible-mode/v1 https://dashscope.aliyuncs.com/compatible-mode/v1 DASHSCOPE_API_KEY (or QWEN_API_KEY)
MiniMax minimax https://api.minimax.io/v1 https://api.minimaxi.com/v1 MINIMAX_API_KEY
Z.ai (Zhipu GLM) zai https://api.z.ai/api/coding/paas/v4 https://open.bigmodel.cn/api/paas/v4 ZAI_API_KEY

Representative built-in models:

  • DeepSeek: deepseek-chat (non-thinking), deepseek-reasoner (thinking), deepseek-v4-flash, deepseek-v4-pro. deepseek-chat / deepseek-reasoner are stable aliases DeepSeek keeps pointed at the latest weights.
  • Moonshot: kimi-k2.6, kimi-k2.5, kimi-k2-thinking, kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-latest, moonshot-v1-128k.
  • Qwen: qwen3-max, qwen-max, qwen-plus, qwen-turbo, qwen3-coder-plus, qwen3-coder-flash, qwq-32b, qwen-vl-max.
  • MiniMax: MiniMax-M2, MiniMax-M2.5, MiniMax-M2.7, MiniMax-Text-01.
  • GLM: glm-4.6, glm-4.5, glm-4.5-air, glm-4.5v, glm-4.5-flash.

Usage:

export DEEPSEEK_API_KEY=sk-...
maestro --model deepseek/deepseek-reasoner

export MOONSHOT_API_KEY=sk-...      # KIMI_API_KEY also works
maestro --model moonshot/kimi-k2.6

To point a provider at its mainland (or a self-hosted) endpoint without editing the registry, add an override-only entry in ~/.maestro/config.json:

{
  "providers": [
    { "id": "moonshot", "name": "Moonshot", "baseUrl": "https://api.moonshot.cn/v1" }
  ]
}

The override baseUrl is applied to every built-in model for that provider. Both Moonshot/DeepSeek/MiniMax and GLM additionally offer Anthropic-compatible endpoints (/anthropic); to use those, define a custom provider with api: "anthropic-messages" pointed at that base URL.

Built-in Overlays (Responses API)

Maestro seeds a few Responses-capable models that aren’t yet emitted by the generator, so you can use them out of the box:

  • OpenRouter (Responses API): openai/o4, openai/o4-mini, and their :online variants, all routed to https://openrouter.ai/api/v1/responses.
  • Groq (Responses API): openai/gpt-oss-20b, openai/gpt-oss-120b, routed via Groq’s OpenAI-compatible endpoint https://api.groq.com/openai/v1/responses.
  • OpenAI Codex (Codex app-server + ChatGPT sign-in): gpt-5.1, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2, gpt-5.2-codex, gpt-5.3-codex, gpt-5.3-codex-spark, gpt-5.4, gpt-5.4-mini, and gpt-5.5 under the openai-codex provider. These use api: "openai-codex-app-server" and require maestro codex login to Sign in with ChatGPT through Codex app-server. Published Maestro installs use the packaged @openai/codex app-server first and source checkouts fall back to a codex binary on PATH, so codex login and maestro codex login share the same Codex-owned CODEX_HOME auth state. Use maestro codex status to inspect the current Codex-owned sign-in, maestro codex login --force to refresh it, and maestro codex login --device-auth for remote/headless machines.

To add more Responses-capable models (or override these), drop them into .maestro/config.json with api: "openai-responses"; Maestro will normalize the base URL to /responses automatically.

Codex models are deliberately separated from the regular openai provider. openai uses Platform API keys or OpenAI Platform OAuth exchange, while openai-codex uses Codex app-server account/read, account/login/start, thread/start, and turn/start so Codex owns ChatGPT OAuth refresh and local thread execution. Maestro should not copy Codex ChatGPT tokens into its normal provider key store for app-server runs.

Legacy custom models that explicitly use api: "openai-codex-responses" still need stored ChatGPT OAuth credentials for direct backend Responses calls. Use /login openai-codex:responses for that compatibility path; the default /login openai-codex and maestro codex login continue to use Codex app-server.

Responses API Compatibility Notes (Tools)

When api: "openai-responses" is enabled for a model, Maestro must filter tool definitions to match Responses API schema constraints.

In particular, Maestro filters out any tool whose parameters JSON Schema contains these keywords at the top level:

  • oneOf, anyOf, allOf
  • enum
  • not

This filtering is implemented in filterResponsesApiTools() (src/agent/providers/openai.ts). When tools are filtered, Maestro logs a warning listing the affected tool names (src/agent/providers/openai-responses-sdk.ts).

Background:

  • OpenAI’s Structured Outputs docs describe the supported JSON Schema subset and the requirement that the root schema not be anyOf and that some keywords (including allOf / not) are not supported. See: https://platform.openai.com/docs/guides/structured-outputs/supported-schemas and https://platform.openai.com/docs/guides/structured-outputs/some-type-specific-keywords-are-not-yet-supported

Workaround: wrap constrained values inside an object schema (nest under properties) so the top-level schema remains an object:

// ❌ filtered (top-level enum)
{ "enum": ["a", "b", "c"] }

// ✅ compatible (enum nested under properties)
{
  "type": "object",
  "properties": { "value": { "enum": ["a", "b", "c"] } }
}

Note: ChatGPT Codex subscription access is for personal subscription use. Production and organization workflows should prefer OpenAI Platform or the EvalOps managed gateway.

For EvalOps managed gateway models, run maestro evalops login locally. The login flow uses the Identity Google callback, stores the returned organization metadata with the local OAuth credential, and then routes models such as evalops/gpt-4o-mini through MAESTRO_LLM_GATEWAY_URL.

Factory Commands

  • /import factory or npm run factory:import – copies ~/.factory config + provider metadata into Maestro’s store. Handy after updating models in Factory CLI.
  • /export factory or npm run factory:export – push Maestro’s provider data back to Factory files.

These commands ensure both CLIs stay in sync while still allowing standalone configs.

Tips

  • Use maestro models list (or /models) to inspect the final registry, including custom entries and their providers.
  • Keep secrets out of repo files; rely on MAESTRO_MODELS_FILE plus env vars for headers.
  • When troubleshooting, LOG_MAESTRO_MODELS=1 (future flag) could dump the path resolution order—until then, add debug logs around getRegisteredModels().