SQLite-backed Local Registry foundation with OCI Image / Artifact Manifest coexistence#864
Draft
SQLite-backed Local Registry foundation with OCI Image / Artifact Manifest coexistence#864
Conversation
Codify two design decisions in ARTIFACT_V3.md: - SQLite Local Registry holds both OCI Image Manifest and OCI Artifact Manifest natively; each manifest is identified by its bytes digest as an independent container. - Legacy OCI dir local registry uptake is "import" (manifest bytes and digest preserved); reformatting an Image Manifest as an Artifact Manifest is the separate explicit "convert" operation that produces a new artifact under a new digest / new ref. Updates section 1 final policy (items 12-14), adds 5.5 manifest format coexistence and 6.7 format convert, rewrites 6.5 / 6.6 import wording (no auto-conversion side effect), notes format-crossing subject lineage in 9.1, and refreshes the section 11 risk list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v3 Local Registry now keeps Image Manifest and Artifact Manifest as distinct containers identified by their bytes digest, instead of silently rewriting Image Manifest into Artifact Manifest at import time. Auto-conversion changed the manifest digest, which broke the OCI identity invariant (digest is the primary identifier) and would have orphaned subject pointers from artifacts referencing the legacy manifest. Changes: - LocalArtifact::get_manifest now returns a LocalManifest enum that dispatches on manifests.media_type, so reads transparently handle Image Manifest (legacy import) and Artifact Manifest (v3 native build) with a single layers / annotations / subject view. - prepare_legacy_oci_dir reads the Image Manifest blob bytes verbatim and stores them with media type application/vnd.oci.image.manifest.v1+json. The manifest digest is preserved end to end. The Image Manifest's config blob is captured as a CAS object (kind=config) so re-export can still rebuild a self-contained OCI Image Layout, but it is not surfaced as a layer. - Move OCI media type constants to media_types.rs and re-export them through ommx::artifact for downstream callers. - Rename "migrate" to "import" across the legacy uptake path: migrate_legacy_local_registry* -> import_legacy_local_registry*, LegacyMigrationReport -> LegacyImportReport, LocalRegistry::migrate_legacy_layout* -> import_legacy_layout*, CLI subcommand "ommx artifact migrate" -> "ommx artifact import", and the load() error message that points users at it. - Update tests to assert digest preservation and Image Manifest media_type after import, and to exercise LocalManifest::Image via the LocalArtifact common view. Format conversion (Image Manifest -> Artifact Manifest) is documented in ARTIFACT_V3.md 6.7 as a separate explicit operation that produces a new artifact under a new digest / new ref. It is intentionally not implemented in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous shape (LocalArtifact owns LocalRegistry, Python wraps it in Box<Mutex<LocalArtifact>>) had three smells: - each LocalArtifact opened its own SQLite connection, even when many artifacts came from the same registry root - the outer std::sync::Mutex existed only to satisfy pyclass Sync bounds (gil_used = false) since rusqlite::Connection is !Sync; it was never contended at runtime - the variant size diff between Local (~300 bytes inline) and the Archive / Dir variants forced Box, plus clippy::large_enum_variant fired on LocalManifest because oci_spec's manifest types are large This commit consolidates the synchronization to its proper place and lets the registry handle be shared: - SqliteIndexStore now wraps Connection in std::sync::Mutex<Connection> and acquires the lock per call via a small `lock()` helper. That makes SqliteIndexStore (and therefore LocalRegistry) Sync. - LocalArtifact holds Arc<LocalRegistry>, so multiple artifacts opened from the same root share one Connection / FileBlobStore. The Arc derefs to LocalRegistry, so existing call sites are unchanged. - LocalArtifactBuilder::build_in_registry takes Arc<LocalRegistry> by value (callers clone if they need to keep the Arc) and returns a LocalArtifact tied to that same Arc, so build / read share state without re-opening. - PyO3 ArtifactInner::Local becomes a bare LocalArtifact (no Box, no Mutex). Comments explain why Archive still needs its Mutex (ocipkg readers carry mutable state) and why Dir does not. - LocalManifest boxes both variants. ArtifactManifest is ~460 bytes and ImageManifest ~800 bytes; boxing both keeps the enum at a 16-byte discriminant + pointer and silences clippy::large_enum_variant. No public-API surface change for downstream Rust callers beyond the build_in_registry / open_in_registry signature switch from &LocalRegistry / LocalRegistry to Arc<LocalRegistry>; all in-tree callers and tests are updated. Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*), cargo clippy -p ommx --lib (no large_enum_variant warning; remaining warnings are pre-existing in mps/convert.rs and decision_variable/parse.rs), cargo clippy -p _ommx_rust (no large_enum_variant warning), task python:sync plus the previously failing test_descriptor.py / test_doctests.py both pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven items flagged during the PR-wide review. 1. Stable annotation JSON in legacy import path legacy.rs::annotations_json used raw serde_json::to_string and produced a HashMap-iteration-order-dependent encoding, while registry.rs::annotations_json went through stable_json_bytes. Rows in manifests.annotations_json / manifest_layers.annotations_json would not round-trip identically across the two code paths, which would bite the moment we query or dedupe on those columns. Promoted a single annotations_json helper into local_registry/mod.rs that always uses stable_json_bytes, and removed the duplicate functions from legacy.rs and registry.rs. 2. Merge LegacyOciDirImport into LegacyOciDirRef They had identical fields and identical semantics - the manifest digest preserved from the legacy index.json plus the ref-name annotation. Kept LegacyOciDirRef as the surviving name with a docstring that explains why both lookups and import operations return the same shape (v3 import is identity-preserving). 3. import_legacy_ref_with_policy import_legacy_layout had both bare and _with_policy forms; import_legacy_ref hardcoded KeepExisting. Added the missing _with_policy variant on LocalRegistry plus the underlying free function import_legacy_local_registry_ref_with_policy. 4. Cache LocalManifest reads in LocalArtifact layers() / annotations() / subject() each invoked get_manifest(), which read the manifest blob from the BlobStore, queried the IndexStore for the manifest record, and parsed the JSON every time. A typical caller reading all three triggered three full round trips. LocalArtifact now memoises the parsed manifest in an Arc<OnceLock<LocalManifest>>. The Arc is shared across clones (so any clone that warms the cache benefits the rest), the OnceLock keeps successful reads while letting failed reads retry, and get_manifest now returns &LocalManifest instead of an owned value to make the cached form usable without cloning. 5. Tighten LocalManifest::artifact_type parse() rejects manifests without an artifactType (see ensure_ommx_image_manifest / ensure_ommx_artifact_manifest), so the public method now returns &MediaType directly. The Image variant uses .expect(...) to spell out the invariant on the single line where Option-stripping happens. 6. Symmetric LocalManifest::Artifact dispatch assertion The legacy import test asserted that LocalArtifact::get_manifest returns LocalManifest::Image for legacy entries. Added the mirror assertion plus an artifact_type check on the v3 native build test so both dispatch arms are covered. 7. Bytes-identity assertion on legacy import imports_legacy_oci_dir_into_sqlite_registry_preserving_image_manifest compared digests but did not directly compare the manifest bytes. Now reads the original manifest blob out of the legacy blobs/<algorithm>/<encoded> path and asserts byte-for-byte equality with what BlobStore returns for the imported digest. SHA-256 makes this redundant in theory, but the explicit check pins the identity-preserving contract for future refactors. Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*), cargo clippy -p ommx --lib (3 pre-existing warnings, none new), cargo clippy -p _ommx_rust (no large_enum_variant warning), task python:sync + targeted doctest / test_descriptor pytest both pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename the in-memory staging types to make their role obvious and their pair symmetric: - PendingArtifactBlob -> StagedArtifactBlob - PreparedArtifactManifest -> StagedArtifactManifest - LocalArtifactBuilder::prepare() -> stage() Both types represent "computed-but-not-yet-published" content sitting between the Build phase and the registry publish (Seal). The Git analogue is an entry in the index after `git add` and the constructed- but-not-yet-written tree / commit object: digest is fixed, bytes are ready, ref has not moved. "Staged" reads more naturally than the mismatched Pending / Prepared pair, and aligns the Rust types with the Build / Seal / View vocabulary already used in ARTIFACT_V3.md 7.4. Doc-comments on both types now spell out the staging role and point at LocalRegistry::publish_artifact_manifest as the consumer. Also document the meaning of "publish" in ARTIFACT_V3.md 6.4 and 7.4: it is the OCI-Distribution-derived registry-side verb for "receive a manifest and make it discoverable under a ref", as opposed to the SDK-side commit / build verb. The Git analogy `git commit` user-side vs ".git/objects/ + refs/heads/X update" internal-side is called out explicitly so future code can keep the two layers separate (e.g. when import / convert / build all funnel into the same publish primitive). Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*), cargo fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous `legacy.rs` mixed two unrelated concerns under a single "legacy" label: 1. Generic OCI Image Layout directory I/O (read `index.json`, parse `ImageManifest`, copy CAS blobs into the v3 registry). 2. v2 OMMX local registry layout helpers (recursive scan of a `<root>/<image_name>/<tag>/` tree, path-to-`ImageName` resolution, batch import with per-directory aggregation). (1) is **not legacy**: an OCI Image Layout directory is the OCI standard, produced by `oras` / `crane` / `skopeo` and used by v3 itself as the import / export interchange. (2) **is** legacy: the "path/tag tree of OCI dirs" layout was OMMX v2's local registry shape and v3 explicitly replaces it with the SQLite-backed registry. Now reflected in the code: - New module `oci_dir.rs` owns generic OCI Image Layout I/O — `import_oci_dir`, `import_oci_dir_with_policy`, `import_oci_dir_as_ref(_with_policy)`, `oci_dir_ref`, `oci_dir_image_name`, the `OciDirRef` identity tuple, the `RefConflictHandling` enum, and the `prepare_oci_dir` / `read_oci_dir_blob` / `oci_dir_blob_path` helpers. Module-level doc-comment makes the "not legacy" intent explicit. - `legacy.rs` shrinks to v2-specific code only — `legacy_local_registry_path`, `gather_legacy_oci_dirs`, `legacy_import_image_name`, `import_legacy_local_registry(_ref)(_with_policy)` and the `LegacyImportReport` aggregator. It uses `oci_dir.rs` internally. - `local_registry/mod.rs` re-exports both modules with a docstring that lays out the layering (`index` / `blob` / `types` → `registry` → `oci_dir` → `legacy`). - Function and struct rename consequences from the previous "single OCI dir is not legacy" rename pass are now reflected in the file layout, not just identifier prefixes. Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*), cargo clippy -p ommx --lib unchanged (3 pre-existing warnings), cargo fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous layout placed `oci_dir.rs` and `legacy.rs` as siblings of the storage modules (`index`, `blob`, `types`, `registry`) inside `local_registry/`. That arrangement read as if the v3 Local Registry had OCI-Image-Layout-related and v2-shaped storage facets, which is not the design — the registry stores manifest bytes in a CAS plus records in SQLite and never touches `oci-layout` / `index.json` / `blobs/<algorithm>/<encoded>` internally. OCI dir is purely an *import* boundary. Move both import modules under `local_registry::import`: - `local_registry/import/mod.rs` — module-level docstring stating that this is where external content gets adapted into the registry. - `local_registry/import/oci_dir.rs` (was `local_registry/oci_dir.rs`) — single OCI Image Layout directory import. - `local_registry/import/legacy.rs` (was `local_registry/legacy.rs`) — v2 OMMX local registry tree import; uses `import::oci_dir` internally. Public re-exports under `local_registry::*` are unchanged so downstream callers see no surface diff. The `local_registry/mod.rs` docstring now contrasts the two layers (Storage vs. Import) and explicitly states that the registry stores nothing in OCI Image Layout form. When future import sources land (`.ommx` archive ingest, remote pull), they get a natural home as `import::archive`, `import::remote`, etc., without polluting the storage namespace. Validation: cargo test -p ommx --lib (501 passed, 16 in artifact::*), cargo clippy -p ommx --lib unchanged (3 pre-existing warnings), cargo fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `cli` workflow chains `ommx load <archive>` and `ommx list`, and the `Python` test workflow runs the doctest of `Artifact.load(image)` twice (once on the class itself and once on the method). Both paths relied on the v2 behaviour where loading an artifact made it immediately discoverable through the same code that lists it; the v3 split introduced in this PR (legacy OciDir path is read-only, `get_images()` and `Artifact.load(image)` go through the SQLite IndexStore) broke that chain because nothing was bringing the freshly loaded image into the SQLite registry. Two fixes, both narrow: 1. `bin/ommx.rs` `Command::Load` now follows the legacy `Artifact::load()` write with `LocalRegistry::import_legacy_ref(&image_name)`, so the image is reachable through `ommx list` (which reads the SQLite IndexStore via `get_images()`). 2. `python/ommx/src/artifact.rs` `PyArtifact::load(image)` no longer bails when the image is only in the legacy OciDir cache. It pulls from remote when needed, runs the same `import_legacy_ref` to populate the SQLite registry, then opens the artifact through the v3 path. Subsequent calls hit the SQLite fast path. The "Run `ommx artifact import` once" hint is no longer needed for this entry point and is removed. The legacy OciDir write itself stays in place for now because `ommx push` / `save` and the Python archive read path still consume that form. Removing the double-write is the next PR's job once those callers are ported to read straight from the SQLite registry. Validation: cargo test -p ommx --lib (501 passed), local reproduction of the failing CI `load` job (`./ommx load ...; ./ommx list` matches the expected regex), and the `test_doctests.py` / `test_descriptor.py` pytest cases that mirror the `Python` CI test job both pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit's auto-import on `Artifact.load(image)` SQLite miss is *not* a per-call fallback — it's a one-shot lazy migration adapter that drains v2 / remote-pulled content into the SQLite Local Registry on first access and then leaves all subsequent calls on the SQLite fast path. The probe is one `Path::exists()` for the specific image, not a recursive scan of the legacy root. This commit refines the framing without changing behaviour: - ARTIFACT_V3.md §6.5 now distinguishes the two cases. The "no fallback on ref miss" rule still bans (a) recursive root scans and (b) read-side fallthrough that serves manifest bytes from legacy on the hot path. It explicitly allows lazy auto-migration: a single `Path::exists()` probe plus one identity-preserving import per image, after which SQLite is the source of truth. The carve-out is scoped to single-image lookups; listing APIs (`Artifact.list`) still query SQLite only. - Code comments in `bin/ommx.rs Command::Load` and `python/ommx/src/artifact.rs PyArtifact::load(image)` now describe the operation as lazy auto-migration, point at §6.5, and call out why this is not a fallback (no read on the hot path; identity is preserved). Validation: cargo check on the ommx binary and the _ommx_rust crate, no behaviour change versus 5ed1313. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ARTIFACT_V3.md is a transitional design doc that gets deleted once the implementation lands and its content is folded into the regular Sphinx docs and API reference. Code comments that point at section numbers there will become dangling links the moment the file goes away, so inline the relevant explanation instead and rely on the docstring / module-level prose for context. Touched: - `rust/ommx/src/bin/ommx.rs Command::Load` - `python/ommx/src/artifact.rs PyArtifact::load(image)` - `rust/ommx/src/artifact/local_registry/mod.rs` module doc - `rust/ommx/src/artifact/local_registry/import/oci_dir.rs` module doc and `import_oci_dir` doc No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address the four outstanding items from the round-2 PR review.
A. Surface RefUpdate from import_*
The public `import_oci_dir*` and `import_legacy_local_registry_ref*`
functions used to discard the `Option<RefUpdate>` produced by their
internal worker, leaving callers unable to distinguish "freshly
inserted" from "idempotent verify of the same digest". A new
`OciDirImport { manifest_digest, image_name, ref_update }` carries
the SQLite ref outcome alongside the identity, so callers see all
four `RefUpdate` cases (`Inserted` / `Unchanged` / `Replaced` /
`Conflicted`) plus `None` when the directory had no
`org.opencontainers.image.ref.name` annotation. `OciDirRef` stays
as the lookup-only identity returned by `oci_dir_ref`.
`LocalRegistry::import_legacy_ref(_with_policy)` and the standalone
`import_legacy_local_registry_ref(_with_policy)` re-export the new
shape; the v2 batch importer (`import_legacy_local_registry`) keeps
returning the aggregate `LegacyImportReport`.
D. `#[non_exhaustive]` on growing public types
`OciDirRef`, `OciDirImport`, `LegacyImportReport`, and the SQLite
row record types (`BlobRecord`, `ManifestRecord`, `RefRecord`,
`LayerRecord`) are now `#[non_exhaustive]`. v3 is still adding
columns (typed `oci_spec::image::Digest`, byte counts, etc.); the
attribute keeps in-crate struct literals working while letting us
add fields without breaking downstream exhaustive destructuring.
🟨1. Tighten visibility of import helpers
`ensure_image_ref_update_allowed`, `put_image_ref_with_conflict_handling`,
and `import_oci_dir_with_policy_inner` are no longer reachable from
`local_registry::import::legacy`; downgrade them from `pub(super)`
to private. The remaining `pub(super)` items (`OciDirRef`,
`OciDirImport`, `RefConflictHandling`,
`import_oci_dir_as_ref_with_policy_inner`, `oci_dir_ref`,
`oci_dir_image_name`) really do need cross-module visibility for
the legacy batch importer.
🟨2. Reorder `Command::Load`
Read `image_name` *before* `artifact.load()` runs the archive
through `ocipkg::image::copy`, so the CLI doesn't depend on the
archive reader still serving `get_name()` after extraction.
Tests added (round-2 review item 🟨4):
- `import_oci_dir_with_policy_surfaces_ref_update` — first import is
`Inserted`, idempotent re-import is `Unchanged`, both via the
public `OciDirImport`.
- `local_registry_import_legacy_ref_with_policy_replaces_existing` —
exercises the newly-added `_with_policy` variant on
`LocalRegistry::import_legacy_ref`, asserts the `Replaced { previous }`
outcome and the post-state ref.
- `local_artifact_caches_manifest_across_clones` — pointer-equality
check that `LocalArtifact::get_manifest()` reuses the
`Arc<OnceLock<LocalManifest>>` cache across the original and a clone.
- `local_artifact_subject_round_trips` — `LocalArtifact::subject()`
returns the `Descriptor` set via `LocalArtifactBuilder::set_subject`
and `None` when no subject was set.
Validation: 505 tests pass (was 501, +4 new); cargo clippy unchanged
(3 pre-existing ommx warnings, 14 pre-existing _ommx_rust warnings,
none new); cargo fmt clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two-stage pipeline that turns external bytes into v3 SQLite records used to live inline in the CLI (`bin/ommx.rs Command::Load`) and the Python entry (`PyArtifact::load`): ocipkg::Artifact::from_oci_archive(path).load() → legacy OCI dir at get_image_dir(image_name) → import_oci_dir_as_ref into SQLite ocipkg::Artifact::from_remote(image).pull() → legacy OCI dir at get_image_dir(image_name) → import_oci_dir_as_ref into SQLite Pull both bodies into `local_registry::import` as siblings of the existing `oci_dir` and `legacy` modules: - `import::archive::import_oci_archive(registry, path) -> OciDirImport` - `import::remote::pull_image(registry, image_name) -> OciDirImport` (feature-gated behind `remote-artifact`) Re-export both at the `local_registry::*` level for ergonomics. The CLI's `Command::Load` and `PyArtifact::load` now just call into these functions. This makes the ocipkg seam explicit and localised. Removing ocipkg from the SDK becomes scoped to swapping the body of these two modules — for example, archive ingest can move to a native tar.gz reader that streams blobs straight into FileBlobStore + SqliteIndexStore without ever materialising a legacy OCI dir, and remote pull can move to `oci-distribution` / `oci-client` without touching call sites. The `import_oci_archive` / `pull_image` signatures (registry handle in, OciDirImport out) are the SDK contract that follow-up will keep stable. The legacy OCI dir is still produced as an intermediate today because `ommx push` / `save` and the Python archive read path read from it. That double-write goes away in the same follow-up that drops the ocipkg-based stage 1. Validation: cargo test -p ommx --lib (505 passed, 20 in artifact::*), local repro of the CI `load` job (`./ommx load` → `./ommx list` matches the regex), task python:sync + test_doctests.py + test_descriptor.py both pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two documentation fixes uncovered while sweeping ARTIFACT_V3.md against the implementation: 1. Section 6.5 had two identical paragraphs about ref conflict policy on legacy import (a leftover from a previous edit). Remove the duplicate. 2. Section 6.6 split "import" (local sources) from "remote pull" into two separate bullets, but the v3 implementation treats both as identity-preserving inputs through `local_registry::import::*` (see `import::oci_dir`, `import::legacy`, `import::archive`, `import::remote`). Consolidate them into a single "import" bullet that enumerates the four supported sources, and explicitly call out the shared rule "manifest format is never converted; bytes and digest are preserved verbatim". Doc-only change. No code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six concrete findings from the round-3 Codex audit, fixed in one commit because they overlap on the same code paths. HIGH #1 — `import_oci_dir` rejected Artifact Manifest layouts. The importer always parsed the manifest as `ImageManifest`, hardcoded the stored media type, and required a config blob. A v3-native OCI Image Layout export (Artifact Manifest, layers in `blobs[]`, no config) would have been silently rejected. Fix: dispatch on the `index.json` descriptor's media type. New `PreparedManifest { format, layers, annotations, subject, image_config }` carries the format tag plus an optional Image config; the import flow stages whichever bytes are present and writes the manifest record under its true media type. New test `imports_oci_dir_with_artifact_manifest_layout` builds an Artifact Manifest OCI dir from scratch and asserts identity-preserving import + `LocalManifest::Artifact` dispatch on read. HIGH #2 — Publish/import did not run the IndexStore writes atomically. Blob, manifest, and ref inserts were separate transactions, so a crash or late conflict between manifest commit and ref commit could leave manifest / blob rows pinned to a ref that was never updated. Fix: `SqliteIndexStore` exposes connection-scoped helpers (`put_blob_in`, `put_manifest_in`, `put_ref_with_policy_in`, `replace_ref_in`, `resolve_ref_in`) and a new public `publish_artifact_atomic(blobs, manifest, layers, image_name, policy)` that wraps the whole IndexStore side of a publish in one `BEGIN IMMEDIATE` transaction. CAS bytes are still written before the tx (idempotent), but every row that pins them to a ref now commits or rolls back together. - `LocalRegistry::publish_artifact_manifest` and `import_oci_dir_inner` both stage `BlobRecord`s and call the new atomic primitive. - A `KeepExisting` pre-check still runs before staging CAS to avoid wasted writes for the common single-writer case; the atomic publish re-validates inside the tx so concurrent racers can't slip past the pre-check. - Switched the registry's transactions to `BEGIN IMMEDIATE` and enabled best-effort WAL mode so two writers can't deadlock by each holding a SHARED lock and both trying to upgrade. - New test `concurrent_publish_different_digests_keeps_one_winner` races two builders publishing different digests under the same ref and asserts exactly one wins. MEDIUM #3 — `ommx pull` bypassed the SQLite registry. Routed `Command::Pull` through `local_registry::pull_image` so remote pulls land in the v3 IndexStore and `ommx list` sees them. MEDIUM #4 — `Command::Load` advertised "OCI directory" but only handled archives. Now branches on `path.is_dir()` and calls `import_oci_dir` for directories, `import_oci_archive` otherwise. LOW #5 — `OciDirImport.image_name` lost the effective ref for unannotated dirs imported via `import_oci_dir_as_ref*`. Unified the import inner to take `target_image_name: Option<&ImageName>`, collapsed the as-ref-specific inner, and now reports the actually- written ref (caller-supplied if any, else the dir's annotation, else `None`). Trims `import_oci_dir_as_ref_with_policy_inner`, `ensure_image_ref_update_allowed`, and `put_image_ref_with_conflict_handling` along the way. LOW #6 — Missing tests: Added the two tests called out under HIGH #1 and HIGH #2 above (Artifact Manifest layout import, concurrent same-ref different- digest publish race). Validation: cargo test -p ommx --lib (507 passed, 22 in artifact::*), cargo clippy -p ommx --lib unchanged (3 pre-existing warnings, none new), cargo fmt clean. Local repro of the CI `load` job (`./ommx load <archive>` → `./ommx list` matches) passes. CLI `ommx pull` is feature-gated on remote-artifact and not exercised in the smoke test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t fields
Two awkward types in `import::oci_dir`:
- `PreparedOciDir` and `PreparedManifest` both lived inside the same
pipeline; "Prepared" was vague and the doubled prefix (a
`PreparedOciDir.parsed: PreparedManifest`) read clumsily.
- The build path already uses "Staged" for the same concept
(`StagedArtifactBlob`, `StagedArtifactManifest`,
`LocalArtifactBuilder::stage()`), so the import path was using a
different word for the same idea.
Two changes:
1. Inline `PreparedManifest`'s fields into the umbrella struct so
there is one type instead of two: `StagedOciDir { manifest_digest,
image_name, manifest_bytes, manifest_descriptor, format, layers,
annotations, subject, image_config }`. `manifest_media_type()`
moves to `StagedOciDir`; the descriptor media-type helper moves to
`OciManifestFormat`.
2. Rename "Prepared" → "Staged" everywhere in this module, matching
the build path's vocabulary: `prepare_oci_dir` → `stage_oci_dir`,
`parse_oci_dir_image_manifest` / `parse_oci_dir_artifact_manifest`
→ `stage_image_manifest` / `stage_artifact_manifest` (they return
the format-specific fields that get folded into a `StagedOciDir`).
Local variable `prepared` → `staged`.
The two helpers now return a `StagedManifestFields` type alias for
the format-specific tuple. No type ever has both "Prepared" and
"Staged" in scope; the import side and the build side describe the
same lifecycle phase with the same word.
Validation: cargo test -p ommx --lib (507 passed), task python:sync
+ test_doctests.py / test_descriptor.py both pass, cargo fmt clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two consistency-audit findings from the post-Codex review. - `image_name_repository` was `pub` and re-exported through `local_registry::*`, but only ever used inside `index.rs`. Drop the re-export and downgrade the function to private. Public API surface shrinks; nothing downstream loses access (it had no callers). - `import::remote::pull_image` is the one import-source entry point whose name doesn't fit the `import_<noun>` pattern of its siblings. That divergence is intentional — `pull` is the OCI Distribution Spec verb and matches `docker pull` / `oras pull` / `crane pull`, while the surrounding `import` namespace already conveys that the result lands in the v3 registry. Renaming to `import_remote` would lose that domain signal. Document the rationale in the module docstring so future readers don't try to "fix" the inconsistency. Validation: cargo test -p ommx --lib (507 passed), cargo fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR establishes the v3 SQLite-backed Local Registry as the primary write/read store for OMMX artifacts, while preserving full read compatibility with the existing OCI directory layout. It also fixes the OCI identity invariant for the legacy uptake path and localises the ocipkg dependency to two well-defined import modules.
What lands in this PR
New SQLite Local Registry (write + read)
IndexStore(SQLite,Mutex<Connection>forSync) andBlobStore(filesystem CAS, atomic publish via temp file + hard link).LocalArtifactBuilder.build()andArtifact.load(image_name)route through the SQLite registry.LocalArtifactholdsArc<LocalRegistry>; clones share one connection.get_manifest()memoises into anArc<OnceLock<LocalManifest>>solayers()/annotations()/subject()don't re-read the manifest blob on each call.Artifact.list()(Python) reads refs from the IndexStore.KeepExisting/Replacepolicies); the public outcome typeOciDirImportsurfaces theRefUpdate(Inserted/Unchanged/Replaced/Conflicted).OCI Image Manifest and OCI Artifact Manifest coexist as distinct containers
ArtifactBuilder.new(image)→build()) produces an OCI Artifact Manifest (application/vnd.oci.artifact.manifest.v1+json).image_name.LocalArtifact::get_manifest()returnsLocalManifest::{Image, Artifact}and dispatches onmanifests.media_typeso reads transparently surface a commonlayers/annotations/subject/artifact_typeview.Identity-preserving legacy import
ommx artifact import(CLI) and the underlyingimport_legacy_local_registry*API read the legacyindex.json+ manifest blob, validate the OMMXartifactType, and store the manifest bytes verbatim. The manifest digest is preserved end-to-end.configblob is captured as a CAS object (kind=config) so re-export can rebuild a self-contained OCI Image Layout.commit/build()is the SDK-side verb. The previous code also usedmigrate_*for this path; v3 renames it toimport_*and reservesconvertfor the separate operation that produces a new artifact under a new digest by reformatting Image Manifest into Artifact Manifest. This PR does not implementconvert.Module organization: storage vs import, ocipkg seam localised
local_registry::is split into two clearly-named layers:The directory format itself is OCI standard and is not legacy; only the v2 path/tag tree above it is.
archiveandremoteare the explicit ocipkg seam — both are two-stage pipelines today (ocipkg → legacy OCI dir → import_oci_dir_as_ref into SQLite), and the follow-up that drops ocipkg is scoped to swapping the body of these two modules. The CLI'sCommand::Loadand the PythonArtifact.load(image)now callimport_oci_archive/pull_imagedirectly, so when ocipkg is replaced the call sites don't change.Lazy auto-migration on
Artifact.load(image)andommx loadThe v2 ergonomics of "load this artifact and it shows up in
ommx list" are preserved without re-introducing a per-call legacy fallback. SQLite is the source of truth; on the first miss for a givenimage_namethepull_imagepath probes one path (get_image_dir(image)), pulls from remote into the legacy cache if absent, then runs a one-shot identity-preserving import into SQLite. Subsequent calls hit the SQLite fast path; nothing reads from legacy on the hot path. The probe is onePath::exists()(not a recursive root scan), so listing APIs (Artifact.list) still query SQLite only.The legacy OciDir write produced by stage 1 of
import_oci_archive/pull_imagestays in place for now becauseommx push/saveand the Python archive read path still consume that form. The follow-up PR that ports those readers to the SQLite registry will drop the double-write soommx loadlands directly in v3 (and stage 1 becomes a native streamer).Out of scope (deferred to follow-up PRs)
import::archiveandimport::remotewith native v3 streaming readers (the seam where this happens is already in place).ocipkgremoval from public Rust / Python API surfaces (ommx::ocipkgre-export,Descriptor/Digest/MediaTypeshapes).ArtifactBuilder.new_archive*,temp(),Artifact.load_archive) is still on the ocipkg-based Image Manifest pipeline.pushfrom the SQLite registry (bail!placeholder).convert(Image Manifest → Artifact Manifest under new ref).parent(),history(),diff()) and OTel trace layer.ommx artifact gc.rust/dataset/{miplib2017,qplib}packaging path (still uses the legacyOciDirBuilder).ommx loadandArtifact.load(image)(waits forpush/saveto be ported and the native stage 1 to land).What works end-to-end after this PR
ArtifactBuilder.new(image)→add_*→build()→Artifact.load(image)→ layer access.ArtifactBuilder.for_github(...)follows the same path.Artifact.list()lists SQLite refs.ommx artifact importbrings legacy OCI dir entries into the SQLite registry while preserving their manifest digest, with a reportedLegacyImportReport.ommx load <archive>followed byommx listshows the loaded image (lazy auto-migration viaimport_oci_archive).Artifact.load(image)with a SQLite miss pulls from remote viapull_imageand shows up in subsequentArtifact.list()calls.LocalManifest::Imageand exposes the same view shape as native v3 artifacts.Validation
cargo test -p ommx --lib— 505 tests pass (20 inartifact::*, includingOciDirImport.ref_updatesurfacing,import_legacy_ref_with_policyreplace,LocalManifestcache pointer-equality across clones, andsubject()round trip).cargo check -p ommx --bin ommx --features remote-artifact,built,cli.cargo clippy -p ommx --lib— 3 pre-existing warnings inmps/convert.rsanddecision_variable/parse.rs(no new warnings, nolarge_enum_variant).task python:test(linting + pyright + pytest acrossommx-testsand the four adapters).loadjob (./ommx load <archive>→./ommx listmatches the regex) and the failing PythonArtifact.load(image)doctest both pass on a clean local registry root.Notes for reviewers
OciDirRef,OciDirImport,LegacyImportReport,BlobRecord,ManifestRecord,RefRecord,LayerRecord) carry#[non_exhaustive]; future column additions don't require a major bump.LocalArtifactBuilder::stage()(wasprepare()) returns aStagedArtifactManifest(wasPreparedArtifactManifest) — Build / Seal / View vocabulary aligned withStagedArtifactBlob.import_oci_archiveandpull_imageare the only places whereocipkgis used insidelocal_registry; the storage layer,import::oci_dir, andimport::legacyare already ocipkg-free aside from the sharedImageName/oci_spectypes.LocalArtifact,LocalArtifactBuilder,LocalManifest,OciDirRef,OciDirImport,LegacyImportReport,RefConflictPolicy,RefUpdate) are deliberately small and compose with the still-presentocipkgtypes only at boundaries that follow-up PRs will replace.