Skip to content

feat: secret rotation#909

Open
rohan-chaturvedi wants to merge 133 commits into
mainfrom
feat--secret-rotation
Open

feat: secret rotation#909
rohan-chaturvedi wants to merge 133 commits into
mainfrom
feat--secret-rotation

Conversation

@rohan-chaturvedi

@rohan-chaturvedi rohan-chaturvedi commented Jun 7, 2026

Copy link
Copy Markdown
Member

Summary

Adds Rotating Secrets: credentials for third-party services that Phase mints, exposes, and revokes automatically on a schedule. The active credential surfaces alongside regular secrets in an Environment — SDKs, CLI, REST clients, and sync integrations pick up rotated values with no application-side changes.

Initial providers: LiteLLM virtual keys and OpenAI project service-account keys. The architecture is provider-agnostic; adding a new provider is a single file + a one-line registry entry.

SSE must be enabled on the App (Phase needs to encrypt the minted value with the Environment key on the server).

Motivation

Users currently manage third-party credentials (API keys, gateway keys) as static secrets and rotate them by hand. That means cron jobs in some bespoke infra, or just not rotating at all. This PR lets the Console handle the full lifecycle: mint → expose → expire → revoke, with audit trails and provider-side cleanup.

The rotated value flows through the same paths a normal secret does, so nothing in your application needs to know it's rotating — just point at the env var and Phase keeps the value fresh.

How it works

Each Rotating Secret is a config row (provider, root credentials, schedule, output key map) plus a stream of minted Credentials:

  1. Mint — at the configured cadence, an RQ job calls the provider to create a fresh credential. Values are encrypted with the Environment key and stored on RotatingSecretCredential.encrypted_values.
  2. Expose — the active credential is merged into the standard env-secret read path at 5 sites (GraphQL env resolver, folder resolver, REST E2EE view, REST public view, sync fetcher). No Secret rows are materialised — synthetic instances are constructed at read time and slot into the existing serializers.
  3. Expire — the previous credential enters its Expiring window. It remains valid at the provider until the revocation delay elapses, giving consumers time to pick up the new value.
  4. Revoke — a scheduled job calls the provider's revoke endpoint. The credential transitions to Revoked. A 404 is treated as success (idempotent — handles out-of-band cleanup).

Every state transition writes a RotatingSecretEvent with actor, IP, user agent, and a sanitised provider response excerpt. Reads of the active value flow through the standard SecretEvent audit trail.

The engine uses select_for_update() row locks on the parent row inside transaction.atomic() so a manual rotate can't race a scheduled rotation, and a manual revoke can't race the scheduled-revoke job.

Failure handling

  • Initial mint is synchronous inside CreateRotatingSecretMutation. If it fails, the transaction rolls back and the user sees the provider's error — nothing partial is left behind.
  • Scheduled mints retry on transient errors with exponential backoff (60s → 5m → 30m → 2h). After 5 consecutive transient failures, or any auth/config/quota error, the schedule pauses and health = FAILED.
  • Revoke retries on transient errors up to 24 hours total; auth/config errors mark the credential REVOKE_FAILED immediately.
  • Partial mints (DB write or encryption fails after the provider returned a credential) trigger a compensating revoke; if that also fails an ORPHANED_CREDENTIAL event is recorded with the provider-side id for manual cleanup.
  • Pre-save credential validation — the create-credentials UI calls validate_root_credentials against the provider before persisting, so bad keys never make it into the rotation config.

Supported providers

LiteLLM

Mints virtual keys via POST /key/generate, revokes via POST /key/delete. Supports the full LiteLLM policy surface (models, budgets, rate limits, metadata, aliases, permissions). The Create dialog has an Import config tab that fetches an existing key's policy via /key/info and shows it as editable JSON so users can mirror an existing key.

Recommends scoped management keys (proxy-admin user with allowed_routes whitelist) over the master key.

OpenAI

Mints project service-account keys via POST /v1/organization/projects/{project_id}/service_accounts — minted keys are scoped to one project, not org-wide. The create dialog calls /organization/projects to render a project picker so users select by name instead of pasting a proj_xxx.

Revoke deletes the service account (no separate key-revoke endpoint exists in OpenAI's API). Rotation creates a new SA each cycle and deletes the prior one — naming uses a deterministic phase-rs-<rs_id>-<epoch> template so orphans are traceable.

Permissions

Two separate resources, intentionally:

  • Secrets — gates reading the rotated value (the synthetic row goes through the normal secret-read path).
  • RotatingSecrets — gates managing the config: create, edit, pause/resume, delete, manual rotate, revoke credential.

This lets you grant Secrets:read broadly without letting every Developer pause/edit/delete rotation configs (which calls real provider APIs and can incur cost). Defaults: Owner/Admin/Manager full, Developer/Service read-only.

UI

  • Env page: rotating credentials render as normal secret rows inside a subtle group container with the rotating secret's name, provider icon, and a Manage button. Buttons appear on hover. Rows are read-only with full-contrast text (not the disabled dimming).
  • Manage dialog (3 tabs):
    • Status: interval, revocation delay, health, next rotation, Rotate now, Pause/Resume, Edit, Delete.
    • Credentials: every credential with status badge (Active/Expiring in <relative time>/Revoking/Revoked/Mint failed/Revoke failed), provider id, timestamps, and Revoke now for non-terminal non-active credentials. Revoked credentials roll into an accordion.
    • Events: chronological lifecycle log with actor avatar.
  • Edit dialog: name, description, schedule, root credentials. Provider-specific config (e.g. OpenAI project) is not editable in-place — users create a new rotating secret for a different project.
  • Cross-env page: rotating rows show the rotation icon in place of the index, and the secret-type toggle is disabled.
  • Dynamic Secret row: restyled to match the rotating group's pattern (header bar, hover-only actions, indexed key rows). "Configure" button renamed to "Manage" for consistency.
  • Audit Logs: new "Rotating Secrets" tab; events appear under resource_type=rs.

Migrations

  • 0130_rotating_secrets — three new models (RotatingSecret, RotatingSecretCredential, RotatingSecretEvent).
  • 0131_rotation_status_renames — status terminology cleanup.
  • 0132_secret_event_rotating_fk — adds SecretEvent.rotating_secret_credential (FK has db_index=False because >99.99% of rows will be NULL on this column; a partial index can be added later if a per-credential read-history query lands).
  • 0133_auditevent_rotating_secret_resource_type — registers 'rs' as an AuditEvent.resource_type choice.

All are metadata-only on PG 11+ (no table rewrites, no extended locks).

Tests

53 backend unit tests under backend/tests/ee/integrations/secrets/rotation/:

  • Both providers: mint/revoke/validate happy paths, 401/5xx/network error classification, response-shape regression guards, schema-to-impl field-name guards.
  • OpenAI: project-id codec roundtrip, pagination + archived-project filtering, missing api_key object handling.
  • LiteLLM: import-template config-shape handling, phase-managed-field stripping.
  • Sanitiser: provider-response credential redaction.
  • Exposure helpers: synthetic-row construction, env+path scoping, active-credential filtering.
  • Audit-logging routing for synthetic rotating rows.

Preview

Screenshot From 2026-06-08 00-40-09 Screenshot From 2026-06-08 00-40-38 Screenshot From 2026-06-08 00-41-07 Screenshot From 2026-06-08 00-41-40

Comment thread backend/ee/integrations/secrets/rotation/engine.py Dismissed
nimish-ks and others added 17 commits June 12, 2026 01:00
…atus on hover and poll for engine-driven updates
…provider match on create, scope actor lookup to the rotating secret org
… pill open when attention needed, label retries plainly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend enhancement New feature or request frontend Change in frontend code updates migrations This PR adds new migrations that update the database schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants