Skip to content

SQLite-backed Local Registry foundation with OCI Image / Artifact Manifest coexistence#864

Draft
termoshtt wants to merge 19 commits intomainfrom
codex/artifact-manifest-core
Draft

SQLite-backed Local Registry foundation with OCI Image / Artifact Manifest coexistence#864
termoshtt wants to merge 19 commits intomainfrom
codex/artifact-manifest-core

Conversation

@termoshtt
Copy link
Copy Markdown
Member

@termoshtt termoshtt commented May 8, 2026

Summary

This PR establishes the v3 SQLite-backed Local Registry as the primary write/read store for OMMX artifacts, while preserving full read compatibility with the existing OCI directory layout. It also fixes the OCI identity invariant for the legacy uptake path and localises the ocipkg dependency to two well-defined import modules.

What lands in this PR

New SQLite Local Registry (write + read)

  • IndexStore (SQLite, Mutex<Connection> for Sync) and BlobStore (filesystem CAS, atomic publish via temp file + hard link).
  • LocalArtifactBuilder.build() and Artifact.load(image_name) route through the SQLite registry.
  • LocalArtifact holds Arc<LocalRegistry>; clones share one connection. get_manifest() memoises into an Arc<OnceLock<LocalManifest>> so layers() / annotations() / subject() don't re-read the manifest blob on each call.
  • Artifact.list() (Python) reads refs from the IndexStore.
  • Concurrent build / ref publish primitives (KeepExisting / Replace policies); the public outcome type OciDirImport surfaces the RefUpdate (Inserted / Unchanged / Replaced / Conflicted).

OCI Image Manifest and OCI Artifact Manifest coexist as distinct containers

  • v3 native build path (ArtifactBuilder.new(image)build()) produces an OCI Artifact Manifest (application/vnd.oci.artifact.manifest.v1+json).
  • Legacy OCI dir uptake stores the OCI Image Manifest bytes as-is.
  • Each manifest is identified by its bytes digest and treated as an independent container; both formats can coexist on the same image_name.
  • LocalArtifact::get_manifest() returns LocalManifest::{Image, Artifact} and dispatches on manifests.media_type so reads transparently surface a common layers / annotations / subject / artifact_type view.

Identity-preserving legacy import

  • ommx artifact import (CLI) and the underlying import_legacy_local_registry* API read the legacy index.json + manifest blob, validate the OMMX artifactType, and store the manifest bytes verbatim. The manifest digest is preserved end-to-end.
  • The Image Manifest's config blob is captured as a CAS object (kind=config) so re-export can rebuild a self-contained OCI Image Layout.
  • "import" is the OCI-Distribution-derived registry-side verb (receive a manifest and make it discoverable under a ref); commit / build() is the SDK-side verb. The previous code also used migrate_* for this path; v3 renames it to import_* and reserves convert for the separate operation that produces a new artifact under a new digest by reformatting Image Manifest into Artifact Manifest. This PR does not implement convert.

Module organization: storage vs import, ocipkg seam localised

local_registry:: is split into two clearly-named layers:

local_registry/
  index.rs / blob.rs / types.rs / registry.rs   ← Storage: SQLite + CAS, owns v3 state
  import/                                        ← Boundary: external sources → SQLite
    oci_dir.rs   ← single OCI Image Layout dir (ocipkg-free, oci-spec only)
    legacy.rs    ← v2 root tree, drives oci_dir (ocipkg-free)
    archive.rs   ← .ommx OCI archive (currently ocipkg-dependent stage 1)
    remote.rs    ← remote registry pull (currently ocipkg-dependent stage 1, feature-gated)

The directory format itself is OCI standard and is not legacy; only the v2 path/tag tree above it is. archive and remote are the explicit ocipkg seam — both are two-stage pipelines today (ocipkg → legacy OCI dir → import_oci_dir_as_ref into SQLite), and the follow-up that drops ocipkg is scoped to swapping the body of these two modules. The CLI's Command::Load and the Python Artifact.load(image) now call import_oci_archive / pull_image directly, so when ocipkg is replaced the call sites don't change.

Lazy auto-migration on Artifact.load(image) and ommx load

The v2 ergonomics of "load this artifact and it shows up in ommx list" are preserved without re-introducing a per-call legacy fallback. SQLite is the source of truth; on the first miss for a given image_name the pull_image path probes one path (get_image_dir(image)), pulls from remote into the legacy cache if absent, then runs a one-shot identity-preserving import into SQLite. Subsequent calls hit the SQLite fast path; nothing reads from legacy on the hot path. The probe is one Path::exists() (not a recursive root scan), so listing APIs (Artifact.list) still query SQLite only.

The legacy OciDir write produced by stage 1 of import_oci_archive / pull_image stays in place for now because ommx push / save and the Python archive read path still consume that form. The follow-up PR that ports those readers to the SQLite registry will drop the double-write so ommx load lands directly in v3 (and stage 1 becomes a native streamer).

Out of scope (deferred to follow-up PRs)

  • Replacing the ocipkg-based stage 1 inside import::archive and import::remote with native v3 streaming readers (the seam where this happens is already in place).
  • ocipkg removal from public Rust / Python API surfaces (ommx::ocipkg re-export, Descriptor / Digest / MediaType shapes).
  • Archive build path (ArtifactBuilder.new_archive*, temp(), Artifact.load_archive) is still on the ocipkg-based Image Manifest pipeline.
  • Remote push from the SQLite registry (bail! placeholder).
  • convert (Image Manifest → Artifact Manifest under new ref).
  • Lineage API (parent(), history(), diff()) and OTel trace layer.
  • ommx artifact gc.
  • rust/dataset/{miplib2017,qplib} packaging path (still uses the legacy OciDirBuilder).
  • Removing the legacy double-write from ommx load and Artifact.load(image) (waits for push / save to be ported and the native stage 1 to land).

What works end-to-end after this PR

  • Build → store → read fully on the SQLite Local Registry: ArtifactBuilder.new(image)add_*build()Artifact.load(image) → layer access.
  • ArtifactBuilder.for_github(...) follows the same path.
  • Artifact.list() lists SQLite refs.
  • ommx artifact import brings legacy OCI dir entries into the SQLite registry while preserving their manifest digest, with a reported LegacyImportReport.
  • ommx load <archive> followed by ommx list shows the loaded image (lazy auto-migration via import_oci_archive).
  • Artifact.load(image) with a SQLite miss pulls from remote via pull_image and shows up in subsequent Artifact.list() calls.
  • Loading an artifact whose manifest is stored as Image Manifest goes through LocalManifest::Image and exposes the same view shape as native v3 artifacts.

Validation

  • cargo test -p ommx --lib — 505 tests pass (20 in artifact::*, including OciDirImport.ref_update surfacing, import_legacy_ref_with_policy replace, LocalManifest cache pointer-equality across clones, and subject() round trip).
  • cargo check -p ommx --bin ommx --features remote-artifact,built,cli.
  • cargo clippy -p ommx --lib — 3 pre-existing warnings in mps/convert.rs and decision_variable/parse.rs (no new warnings, no large_enum_variant).
  • task python:test (linting + pyright + pytest across ommx-tests and the four adapters).
  • E2E reproduction of the failing CI load job (./ommx load <archive>./ommx list matches the regex) and the failing Python Artifact.load(image) doctest both pass on a clean local registry root.

Notes for reviewers

  • All v3 public structs that may grow fields (OciDirRef, OciDirImport, LegacyImportReport, BlobRecord, ManifestRecord, RefRecord, LayerRecord) carry #[non_exhaustive]; future column additions don't require a major bump.
  • LocalArtifactBuilder::stage() (was prepare()) returns a StagedArtifactManifest (was PreparedArtifactManifest) — Build / Seal / View vocabulary aligned with StagedArtifactBlob.
  • import_oci_archive and pull_image are the only places where ocipkg is used inside local_registry; the storage layer, import::oci_dir, and import::legacy are already ocipkg-free aside from the shared ImageName / oci_spec types.
  • The new public-surface types (LocalArtifact, LocalArtifactBuilder, LocalManifest, OciDirRef, OciDirImport, LegacyImportReport, RefConflictPolicy, RefUpdate) are deliberately small and compose with the still-present ocipkg types only at boundaries that follow-up PRs will replace.

@termoshtt termoshtt changed the title Add OMMX artifact manifest core Use OCI artifact manifest for local build foundation May 8, 2026
termoshtt and others added 3 commits May 8, 2026 17:34
Codify two design decisions in ARTIFACT_V3.md:

- SQLite Local Registry holds both OCI Image Manifest and OCI Artifact
  Manifest natively; each manifest is identified by its bytes digest as
  an independent container.
- Legacy OCI dir local registry uptake is "import" (manifest bytes and
  digest preserved); reformatting an Image Manifest as an Artifact
  Manifest is the separate explicit "convert" operation that produces a
  new artifact under a new digest / new ref.

Updates section 1 final policy (items 12-14), adds 5.5 manifest format
coexistence and 6.7 format convert, rewrites 6.5 / 6.6 import wording
(no auto-conversion side effect), notes format-crossing subject lineage
in 9.1, and refreshes the section 11 risk list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v3 Local Registry now keeps Image Manifest and Artifact Manifest as
distinct containers identified by their bytes digest, instead of
silently rewriting Image Manifest into Artifact Manifest at import
time. Auto-conversion changed the manifest digest, which broke the
OCI identity invariant (digest is the primary identifier) and would
have orphaned subject pointers from artifacts referencing the legacy
manifest.

Changes:
- LocalArtifact::get_manifest now returns a LocalManifest enum that
  dispatches on manifests.media_type, so reads transparently handle
  Image Manifest (legacy import) and Artifact Manifest (v3 native
  build) with a single layers / annotations / subject view.
- prepare_legacy_oci_dir reads the Image Manifest blob bytes verbatim
  and stores them with media type
  application/vnd.oci.image.manifest.v1+json. The manifest digest is
  preserved end to end. The Image Manifest's config blob is captured
  as a CAS object (kind=config) so re-export can still rebuild a
  self-contained OCI Image Layout, but it is not surfaced as a layer.
- Move OCI media type constants to media_types.rs and re-export them
  through ommx::artifact for downstream callers.
- Rename "migrate" to "import" across the legacy uptake path:
  migrate_legacy_local_registry* -> import_legacy_local_registry*,
  LegacyMigrationReport -> LegacyImportReport,
  LocalRegistry::migrate_legacy_layout* -> import_legacy_layout*,
  CLI subcommand "ommx artifact migrate" -> "ommx artifact import",
  and the load() error message that points users at it.
- Update tests to assert digest preservation and Image Manifest
  media_type after import, and to exercise LocalManifest::Image via
  the LocalArtifact common view.

Format conversion (Image Manifest -> Artifact Manifest) is documented
in ARTIFACT_V3.md 6.7 as a separate explicit operation that produces a
new artifact under a new digest / new ref. It is intentionally not
implemented in this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@termoshtt termoshtt changed the title Use OCI artifact manifest for local build foundation SQLite-backed Local Registry foundation with Image / Artifact Manifest coexistence May 8, 2026
termoshtt and others added 10 commits May 8, 2026 20:07
The previous shape (LocalArtifact owns LocalRegistry, Python wraps it
in Box<Mutex<LocalArtifact>>) had three smells:

- each LocalArtifact opened its own SQLite connection, even when many
  artifacts came from the same registry root
- the outer std::sync::Mutex existed only to satisfy pyclass Sync
  bounds (gil_used = false) since rusqlite::Connection is !Sync; it
  was never contended at runtime
- the variant size diff between Local (~300 bytes inline) and the
  Archive / Dir variants forced Box, plus clippy::large_enum_variant
  fired on LocalManifest because oci_spec's manifest types are large

This commit consolidates the synchronization to its proper place and
lets the registry handle be shared:

- SqliteIndexStore now wraps Connection in std::sync::Mutex<Connection>
  and acquires the lock per call via a small `lock()` helper. That
  makes SqliteIndexStore (and therefore LocalRegistry) Sync.
- LocalArtifact holds Arc<LocalRegistry>, so multiple artifacts opened
  from the same root share one Connection / FileBlobStore. The Arc
  derefs to LocalRegistry, so existing call sites are unchanged.
- LocalArtifactBuilder::build_in_registry takes Arc<LocalRegistry> by
  value (callers clone if they need to keep the Arc) and returns a
  LocalArtifact tied to that same Arc, so build / read share state
  without re-opening.
- PyO3 ArtifactInner::Local becomes a bare LocalArtifact (no Box,
  no Mutex). Comments explain why Archive still needs its Mutex
  (ocipkg readers carry mutable state) and why Dir does not.
- LocalManifest boxes both variants. ArtifactManifest is ~460 bytes
  and ImageManifest ~800 bytes; boxing both keeps the enum at a
  16-byte discriminant + pointer and silences clippy::large_enum_variant.

No public-API surface change for downstream Rust callers beyond the
build_in_registry / open_in_registry signature switch from
&LocalRegistry / LocalRegistry to Arc<LocalRegistry>; all in-tree
callers and tests are updated.

Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*),
cargo clippy -p ommx --lib (no large_enum_variant warning; remaining
warnings are pre-existing in mps/convert.rs and decision_variable/parse.rs),
cargo clippy -p _ommx_rust (no large_enum_variant warning), task python:sync
plus the previously failing test_descriptor.py / test_doctests.py both pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven items flagged during the PR-wide review.

1. Stable annotation JSON in legacy import path

   legacy.rs::annotations_json used raw serde_json::to_string and
   produced a HashMap-iteration-order-dependent encoding, while
   registry.rs::annotations_json went through stable_json_bytes.
   Rows in manifests.annotations_json / manifest_layers.annotations_json
   would not round-trip identically across the two code paths, which
   would bite the moment we query or dedupe on those columns.

   Promoted a single annotations_json helper into local_registry/mod.rs
   that always uses stable_json_bytes, and removed the duplicate
   functions from legacy.rs and registry.rs.

2. Merge LegacyOciDirImport into LegacyOciDirRef

   They had identical fields and identical semantics - the manifest
   digest preserved from the legacy index.json plus the ref-name
   annotation. Kept LegacyOciDirRef as the surviving name with a
   docstring that explains why both lookups and import operations
   return the same shape (v3 import is identity-preserving).

3. import_legacy_ref_with_policy

   import_legacy_layout had both bare and _with_policy forms;
   import_legacy_ref hardcoded KeepExisting. Added the missing
   _with_policy variant on LocalRegistry plus the underlying free
   function import_legacy_local_registry_ref_with_policy.

4. Cache LocalManifest reads in LocalArtifact

   layers() / annotations() / subject() each invoked get_manifest(),
   which read the manifest blob from the BlobStore, queried the
   IndexStore for the manifest record, and parsed the JSON every
   time. A typical caller reading all three triggered three full
   round trips.

   LocalArtifact now memoises the parsed manifest in an
   Arc<OnceLock<LocalManifest>>. The Arc is shared across clones
   (so any clone that warms the cache benefits the rest), the
   OnceLock keeps successful reads while letting failed reads
   retry, and get_manifest now returns &LocalManifest instead of
   an owned value to make the cached form usable without cloning.

5. Tighten LocalManifest::artifact_type

   parse() rejects manifests without an artifactType (see
   ensure_ommx_image_manifest / ensure_ommx_artifact_manifest), so
   the public method now returns &MediaType directly. The Image
   variant uses .expect(...) to spell out the invariant on the
   single line where Option-stripping happens.

6. Symmetric LocalManifest::Artifact dispatch assertion

   The legacy import test asserted that LocalArtifact::get_manifest
   returns LocalManifest::Image for legacy entries. Added the mirror
   assertion plus an artifact_type check on the v3 native build test
   so both dispatch arms are covered.

7. Bytes-identity assertion on legacy import

   imports_legacy_oci_dir_into_sqlite_registry_preserving_image_manifest
   compared digests but did not directly compare the manifest bytes.
   Now reads the original manifest blob out of the legacy
   blobs/<algorithm>/<encoded> path and asserts byte-for-byte equality
   with what BlobStore returns for the imported digest. SHA-256 makes
   this redundant in theory, but the explicit check pins the
   identity-preserving contract for future refactors.

Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*),
cargo clippy -p ommx --lib (3 pre-existing warnings, none new),
cargo clippy -p _ommx_rust (no large_enum_variant warning), task
python:sync + targeted doctest / test_descriptor pytest both pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename the in-memory staging types to make their role obvious and
their pair symmetric:

- PendingArtifactBlob   -> StagedArtifactBlob
- PreparedArtifactManifest -> StagedArtifactManifest
- LocalArtifactBuilder::prepare() -> stage()

Both types represent "computed-but-not-yet-published" content sitting
between the Build phase and the registry publish (Seal). The Git
analogue is an entry in the index after `git add` and the constructed-
but-not-yet-written tree / commit object: digest is fixed, bytes are
ready, ref has not moved. "Staged" reads more naturally than the
mismatched Pending / Prepared pair, and aligns the Rust types with the
Build / Seal / View vocabulary already used in ARTIFACT_V3.md 7.4.

Doc-comments on both types now spell out the staging role and point at
LocalRegistry::publish_artifact_manifest as the consumer.

Also document the meaning of "publish" in ARTIFACT_V3.md 6.4 and 7.4:
it is the OCI-Distribution-derived registry-side verb for "receive a
manifest and make it discoverable under a ref", as opposed to the
SDK-side commit / build verb. The Git analogy `git commit` user-side
vs ".git/objects/ + refs/heads/X update" internal-side is called out
explicitly so future code can keep the two layers separate (e.g. when
import / convert / build all funnel into the same publish primitive).

Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*),
cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous `legacy.rs` mixed two unrelated concerns under a single
"legacy" label:

1. Generic OCI Image Layout directory I/O (read `index.json`, parse
   `ImageManifest`, copy CAS blobs into the v3 registry).
2. v2 OMMX local registry layout helpers (recursive scan of a
   `<root>/<image_name>/<tag>/` tree, path-to-`ImageName` resolution,
   batch import with per-directory aggregation).

(1) is **not legacy**: an OCI Image Layout directory is the OCI
standard, produced by `oras` / `crane` / `skopeo` and used by v3
itself as the import / export interchange. (2) **is** legacy: the
"path/tag tree of OCI dirs" layout was OMMX v2's local registry shape
and v3 explicitly replaces it with the SQLite-backed registry.

Now reflected in the code:

- New module `oci_dir.rs` owns generic OCI Image Layout I/O —
  `import_oci_dir`, `import_oci_dir_with_policy`,
  `import_oci_dir_as_ref(_with_policy)`, `oci_dir_ref`,
  `oci_dir_image_name`, the `OciDirRef` identity tuple, the
  `RefConflictHandling` enum, and the `prepare_oci_dir` /
  `read_oci_dir_blob` / `oci_dir_blob_path` helpers. Module-level
  doc-comment makes the "not legacy" intent explicit.
- `legacy.rs` shrinks to v2-specific code only — `legacy_local_registry_path`,
  `gather_legacy_oci_dirs`, `legacy_import_image_name`,
  `import_legacy_local_registry(_ref)(_with_policy)` and the
  `LegacyImportReport` aggregator. It uses `oci_dir.rs` internally.
- `local_registry/mod.rs` re-exports both modules with a docstring
  that lays out the layering (`index` / `blob` / `types` →
  `registry` → `oci_dir` → `legacy`).
- Function and struct rename consequences from the previous "single
  OCI dir is not legacy" rename pass are now reflected in the file
  layout, not just identifier prefixes.

Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*),
cargo clippy -p ommx --lib unchanged (3 pre-existing warnings),
cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous layout placed `oci_dir.rs` and `legacy.rs` as siblings of
the storage modules (`index`, `blob`, `types`, `registry`) inside
`local_registry/`. That arrangement read as if the v3 Local Registry
had OCI-Image-Layout-related and v2-shaped storage facets, which is
not the design — the registry stores manifest bytes in a CAS plus
records in SQLite and never touches `oci-layout` / `index.json` /
`blobs/<algorithm>/<encoded>` internally. OCI dir is purely an
*import* boundary.

Move both import modules under `local_registry::import`:

- `local_registry/import/mod.rs` — module-level docstring stating that
  this is where external content gets adapted into the registry.
- `local_registry/import/oci_dir.rs` (was `local_registry/oci_dir.rs`)
  — single OCI Image Layout directory import.
- `local_registry/import/legacy.rs` (was `local_registry/legacy.rs`)
  — v2 OMMX local registry tree import; uses `import::oci_dir`
  internally.

Public re-exports under `local_registry::*` are unchanged so
downstream callers see no surface diff. The `local_registry/mod.rs`
docstring now contrasts the two layers (Storage vs. Import) and
explicitly states that the registry stores nothing in OCI Image
Layout form.

When future import sources land (`.ommx` archive ingest, remote
pull), they get a natural home as `import::archive`, `import::remote`,
etc., without polluting the storage namespace.

Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*),
cargo clippy -p ommx --lib unchanged (3 pre-existing warnings),
cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `cli` workflow chains `ommx load <archive>` and `ommx list`, and
the `Python` test workflow runs the doctest of `Artifact.load(image)`
twice (once on the class itself and once on the method). Both paths
relied on the v2 behaviour where loading an artifact made it
immediately discoverable through the same code that lists it; the v3
split introduced in this PR (legacy OciDir path is read-only,
`get_images()` and `Artifact.load(image)` go through the SQLite
IndexStore) broke that chain because nothing was bringing the freshly
loaded image into the SQLite registry.

Two fixes, both narrow:

1. `bin/ommx.rs` `Command::Load` now follows the legacy `Artifact::load()`
   write with `LocalRegistry::import_legacy_ref(&image_name)`, so the
   image is reachable through `ommx list` (which reads the SQLite
   IndexStore via `get_images()`).
2. `python/ommx/src/artifact.rs` `PyArtifact::load(image)` no longer
   bails when the image is only in the legacy OciDir cache. It pulls
   from remote when needed, runs the same `import_legacy_ref` to
   populate the SQLite registry, then opens the artifact through the
   v3 path. Subsequent calls hit the SQLite fast path. The "Run
   `ommx artifact import` once" hint is no longer needed for this
   entry point and is removed.

The legacy OciDir write itself stays in place for now because
`ommx push` / `save` and the Python archive read path still consume
that form. Removing the double-write is the next PR's job once those
callers are ported to read straight from the SQLite registry.

Validation: cargo test -p ommx --lib (501 passed), local reproduction
of the failing CI `load` job (`./ommx load ...; ./ommx list` matches
the expected regex), and the `test_doctests.py` /
`test_descriptor.py` pytest cases that mirror the `Python` CI test
job both pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit's auto-import on `Artifact.load(image)` SQLite
miss is *not* a per-call fallback — it's a one-shot lazy migration
adapter that drains v2 / remote-pulled content into the SQLite Local
Registry on first access and then leaves all subsequent calls on the
SQLite fast path. The probe is one `Path::exists()` for the specific
image, not a recursive scan of the legacy root.

This commit refines the framing without changing behaviour:

- ARTIFACT_V3.md §6.5 now distinguishes the two cases. The "no
  fallback on ref miss" rule still bans (a) recursive root scans and
  (b) read-side fallthrough that serves manifest bytes from legacy on
  the hot path. It explicitly allows lazy auto-migration: a single
  `Path::exists()` probe plus one identity-preserving import per
  image, after which SQLite is the source of truth. The carve-out is
  scoped to single-image lookups; listing APIs (`Artifact.list`)
  still query SQLite only.
- Code comments in `bin/ommx.rs Command::Load` and
  `python/ommx/src/artifact.rs PyArtifact::load(image)` now describe
  the operation as lazy auto-migration, point at §6.5, and call out
  why this is not a fallback (no read on the hot path; identity is
  preserved).

Validation: cargo check on the ommx binary and the _ommx_rust
crate, no behaviour change versus 5ed1313.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ARTIFACT_V3.md is a transitional design doc that gets deleted once
the implementation lands and its content is folded into the regular
Sphinx docs and API reference. Code comments that point at section
numbers there will become dangling links the moment the file goes
away, so inline the relevant explanation instead and rely on the
docstring / module-level prose for context.

Touched:
- `rust/ommx/src/bin/ommx.rs Command::Load`
- `python/ommx/src/artifact.rs PyArtifact::load(image)`
- `rust/ommx/src/artifact/local_registry/mod.rs` module doc
- `rust/ommx/src/artifact/local_registry/import/oci_dir.rs`
  module doc and `import_oci_dir` doc

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address the four outstanding items from the round-2 PR review.

A. Surface RefUpdate from import_*

   The public `import_oci_dir*` and `import_legacy_local_registry_ref*`
   functions used to discard the `Option<RefUpdate>` produced by their
   internal worker, leaving callers unable to distinguish "freshly
   inserted" from "idempotent verify of the same digest". A new
   `OciDirImport { manifest_digest, image_name, ref_update }` carries
   the SQLite ref outcome alongside the identity, so callers see all
   four `RefUpdate` cases (`Inserted` / `Unchanged` / `Replaced` /
   `Conflicted`) plus `None` when the directory had no
   `org.opencontainers.image.ref.name` annotation. `OciDirRef` stays
   as the lookup-only identity returned by `oci_dir_ref`.

   `LocalRegistry::import_legacy_ref(_with_policy)` and the standalone
   `import_legacy_local_registry_ref(_with_policy)` re-export the new
   shape; the v2 batch importer (`import_legacy_local_registry`) keeps
   returning the aggregate `LegacyImportReport`.

D. `#[non_exhaustive]` on growing public types

   `OciDirRef`, `OciDirImport`, `LegacyImportReport`, and the SQLite
   row record types (`BlobRecord`, `ManifestRecord`, `RefRecord`,
   `LayerRecord`) are now `#[non_exhaustive]`. v3 is still adding
   columns (typed `oci_spec::image::Digest`, byte counts, etc.); the
   attribute keeps in-crate struct literals working while letting us
   add fields without breaking downstream exhaustive destructuring.

🟨1. Tighten visibility of import helpers

   `ensure_image_ref_update_allowed`, `put_image_ref_with_conflict_handling`,
   and `import_oci_dir_with_policy_inner` are no longer reachable from
   `local_registry::import::legacy`; downgrade them from `pub(super)`
   to private. The remaining `pub(super)` items (`OciDirRef`,
   `OciDirImport`, `RefConflictHandling`,
   `import_oci_dir_as_ref_with_policy_inner`, `oci_dir_ref`,
   `oci_dir_image_name`) really do need cross-module visibility for
   the legacy batch importer.

🟨2. Reorder `Command::Load`

   Read `image_name` *before* `artifact.load()` runs the archive
   through `ocipkg::image::copy`, so the CLI doesn't depend on the
   archive reader still serving `get_name()` after extraction.

Tests added (round-2 review item 🟨4):

- `import_oci_dir_with_policy_surfaces_ref_update` — first import is
  `Inserted`, idempotent re-import is `Unchanged`, both via the
  public `OciDirImport`.
- `local_registry_import_legacy_ref_with_policy_replaces_existing` —
  exercises the newly-added `_with_policy` variant on
  `LocalRegistry::import_legacy_ref`, asserts the `Replaced { previous }`
  outcome and the post-state ref.
- `local_artifact_caches_manifest_across_clones` — pointer-equality
  check that `LocalArtifact::get_manifest()` reuses the
  `Arc<OnceLock<LocalManifest>>` cache across the original and a clone.
- `local_artifact_subject_round_trips` — `LocalArtifact::subject()`
  returns the `Descriptor` set via `LocalArtifactBuilder::set_subject`
  and `None` when no subject was set.

Validation: 505 tests pass (was 501, +4 new); cargo clippy unchanged
(3 pre-existing ommx warnings, 14 pre-existing _ommx_rust warnings,
none new); cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two-stage pipeline that turns external bytes into v3 SQLite
records used to live inline in the CLI (`bin/ommx.rs Command::Load`)
and the Python entry (`PyArtifact::load`):

  ocipkg::Artifact::from_oci_archive(path).load()
   → legacy OCI dir at get_image_dir(image_name)
   → import_oci_dir_as_ref into SQLite

  ocipkg::Artifact::from_remote(image).pull()
   → legacy OCI dir at get_image_dir(image_name)
   → import_oci_dir_as_ref into SQLite

Pull both bodies into `local_registry::import` as siblings of the
existing `oci_dir` and `legacy` modules:

- `import::archive::import_oci_archive(registry, path) -> OciDirImport`
- `import::remote::pull_image(registry, image_name) -> OciDirImport`
  (feature-gated behind `remote-artifact`)

Re-export both at the `local_registry::*` level for ergonomics. The
CLI's `Command::Load` and `PyArtifact::load` now just call into these
functions.

This makes the ocipkg seam explicit and localised. Removing ocipkg
from the SDK becomes scoped to swapping the body of these two
modules — for example, archive ingest can move to a native tar.gz
reader that streams blobs straight into FileBlobStore + SqliteIndexStore
without ever materialising a legacy OCI dir, and remote pull can move
to `oci-distribution` / `oci-client` without touching call sites.
The `import_oci_archive` / `pull_image` signatures (registry handle
in, OciDirImport out) are the SDK contract that follow-up will keep
stable.

The legacy OCI dir is still produced as an intermediate today because
`ommx push` / `save` and the Python archive read path read from it.
That double-write goes away in the same follow-up that drops the
ocipkg-based stage 1.

Validation: cargo test -p ommx --lib (505 passed, 20 in artifact::*),
local repro of the CI `load` job (`./ommx load` → `./ommx list` matches
the regex), task python:sync + test_doctests.py + test_descriptor.py
both pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@termoshtt termoshtt added python Changes in Python SDK rust Changes in Rust SDK labels May 8, 2026
@termoshtt termoshtt changed the title SQLite-backed Local Registry foundation with Image / Artifact Manifest coexistence SQLite-backed Local Registry foundation with OCI Image / Artifact Manifest coexistence May 8, 2026
termoshtt and others added 4 commits May 8, 2026 22:11
Two documentation fixes uncovered while sweeping ARTIFACT_V3.md
against the implementation:

1. Section 6.5 had two identical paragraphs about ref conflict
   policy on legacy import (a leftover from a previous edit).
   Remove the duplicate.

2. Section 6.6 split "import" (local sources) from "remote pull"
   into two separate bullets, but the v3 implementation treats both
   as identity-preserving inputs through `local_registry::import::*`
   (see `import::oci_dir`, `import::legacy`, `import::archive`,
   `import::remote`). Consolidate them into a single "import"
   bullet that enumerates the four supported sources, and explicitly
   call out the shared rule "manifest format is never converted;
   bytes and digest are preserved verbatim".

Doc-only change. No code touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six concrete findings from the round-3 Codex audit, fixed in one
commit because they overlap on the same code paths.

HIGH #1 — `import_oci_dir` rejected Artifact Manifest layouts.
  The importer always parsed the manifest as `ImageManifest`, hardcoded
  the stored media type, and required a config blob. A v3-native OCI
  Image Layout export (Artifact Manifest, layers in `blobs[]`, no
  config) would have been silently rejected.
  Fix: dispatch on the `index.json` descriptor's media type. New
  `PreparedManifest { format, layers, annotations, subject, image_config }`
  carries the format tag plus an optional Image config; the import
  flow stages whichever bytes are present and writes the manifest
  record under its true media type. New test
  `imports_oci_dir_with_artifact_manifest_layout` builds an Artifact
  Manifest OCI dir from scratch and asserts identity-preserving
  import + `LocalManifest::Artifact` dispatch on read.

HIGH #2 — Publish/import did not run the IndexStore writes atomically.
  Blob, manifest, and ref inserts were separate transactions, so a
  crash or late conflict between manifest commit and ref commit could
  leave manifest / blob rows pinned to a ref that was never updated.
  Fix: `SqliteIndexStore` exposes connection-scoped helpers
  (`put_blob_in`, `put_manifest_in`, `put_ref_with_policy_in`,
  `replace_ref_in`, `resolve_ref_in`) and a new public
  `publish_artifact_atomic(blobs, manifest, layers, image_name, policy)`
  that wraps the whole IndexStore side of a publish in one
  `BEGIN IMMEDIATE` transaction. CAS bytes are still written before
  the tx (idempotent), but every row that pins them to a ref now
  commits or rolls back together.
  - `LocalRegistry::publish_artifact_manifest` and
    `import_oci_dir_inner` both stage `BlobRecord`s and call the
    new atomic primitive.
  - A `KeepExisting` pre-check still runs before staging CAS to
    avoid wasted writes for the common single-writer case; the
    atomic publish re-validates inside the tx so concurrent racers
    can't slip past the pre-check.
  - Switched the registry's transactions to `BEGIN IMMEDIATE` and
    enabled best-effort WAL mode so two writers can't deadlock by
    each holding a SHARED lock and both trying to upgrade.
  - New test
    `concurrent_publish_different_digests_keeps_one_winner` races
    two builders publishing different digests under the same ref
    and asserts exactly one wins.

MEDIUM #3 — `ommx pull` bypassed the SQLite registry.
  Routed `Command::Pull` through `local_registry::pull_image` so
  remote pulls land in the v3 IndexStore and `ommx list` sees them.

MEDIUM #4 — `Command::Load` advertised "OCI directory" but only
  handled archives. Now branches on `path.is_dir()` and calls
  `import_oci_dir` for directories, `import_oci_archive` otherwise.

LOW #5 — `OciDirImport.image_name` lost the effective ref for
  unannotated dirs imported via `import_oci_dir_as_ref*`.
  Unified the import inner to take `target_image_name: Option<&ImageName>`,
  collapsed the as-ref-specific inner, and now reports the actually-
  written ref (caller-supplied if any, else the dir's annotation,
  else `None`). Trims `import_oci_dir_as_ref_with_policy_inner`,
  `ensure_image_ref_update_allowed`, and
  `put_image_ref_with_conflict_handling` along the way.

LOW #6 — Missing tests:
  Added the two tests called out under HIGH #1 and HIGH #2 above
  (Artifact Manifest layout import, concurrent same-ref different-
  digest publish race).

Validation: cargo test -p ommx --lib (507 passed, 22 in artifact::*),
cargo clippy -p ommx --lib unchanged (3 pre-existing warnings, none
new), cargo fmt clean. Local repro of the CI `load` job
(`./ommx load <archive>` → `./ommx list` matches) passes. CLI
`ommx pull` is feature-gated on remote-artifact and not exercised
in the smoke test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t fields

Two awkward types in `import::oci_dir`:

- `PreparedOciDir` and `PreparedManifest` both lived inside the same
  pipeline; "Prepared" was vague and the doubled prefix (a
  `PreparedOciDir.parsed: PreparedManifest`) read clumsily.
- The build path already uses "Staged" for the same concept
  (`StagedArtifactBlob`, `StagedArtifactManifest`,
  `LocalArtifactBuilder::stage()`), so the import path was using a
  different word for the same idea.

Two changes:

1. Inline `PreparedManifest`'s fields into the umbrella struct so
   there is one type instead of two: `StagedOciDir { manifest_digest,
   image_name, manifest_bytes, manifest_descriptor, format, layers,
   annotations, subject, image_config }`. `manifest_media_type()`
   moves to `StagedOciDir`; the descriptor media-type helper moves to
   `OciManifestFormat`.
2. Rename "Prepared" → "Staged" everywhere in this module, matching
   the build path's vocabulary: `prepare_oci_dir` → `stage_oci_dir`,
   `parse_oci_dir_image_manifest` / `parse_oci_dir_artifact_manifest`
   → `stage_image_manifest` / `stage_artifact_manifest` (they return
   the format-specific fields that get folded into a `StagedOciDir`).
   Local variable `prepared` → `staged`.

The two helpers now return a `StagedManifestFields` type alias for
the format-specific tuple. No type ever has both "Prepared" and
"Staged" in scope; the import side and the build side describe the
same lifecycle phase with the same word.

Validation: cargo test -p ommx --lib (507 passed), task python:sync
+ test_doctests.py / test_descriptor.py both pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two consistency-audit findings from the post-Codex review.

- `image_name_repository` was `pub` and re-exported through
  `local_registry::*`, but only ever used inside `index.rs`. Drop the
  re-export and downgrade the function to private. Public API surface
  shrinks; nothing downstream loses access (it had no callers).
- `import::remote::pull_image` is the one import-source entry point
  whose name doesn't fit the `import_<noun>` pattern of its siblings.
  That divergence is intentional — `pull` is the OCI Distribution
  Spec verb and matches `docker pull` / `oras pull` / `crane pull`,
  while the surrounding `import` namespace already conveys that the
  result lands in the v3 registry. Renaming to `import_remote` would
  lose that domain signal. Document the rationale in the module
  docstring so future readers don't try to "fix" the inconsistency.

Validation: cargo test -p ommx --lib (507 passed), cargo fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Changes in Python SDK rust Changes in Rust SDK

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant