feat: unified chunk grid with rectilinear chunk/shard support#3802
feat: unified chunk grid with rectilinear chunk/shard support#3802maxrjones wants to merge 96 commits intozarr-developers:mainfrom
Conversation
This reverts commit 9c0f582.
|
@ilan-gold could you have a look and see how exposed your |
|
On the one hand, the only change to the codec pipeline is something we haven't implemented anyway https://github.com/zarrs/zarrs-python/blob/91e36bac3bea7ff455bb4c1019a7fea689547ea9/python/zarrs/pipeline.py#L147-L151 On the other hand it doesn't appear that zarr-python calls this method anyway. What I'm more worried about is how the individual chunk requests look now. Offhand, everything seems to be encapsulate on a per-request level: https://github.com/zarrs/zarrs-python/blob/91e36bac3bea7ff455bb4c1019a7fea689547ea9/src/lib.rs#L329-L389 i.e., we don't declare the chunk shape up front: https://github.com/zarrs/zarrs-python/blob/91e36bac3bea7ff455bb4c1019a7fea689547ea9/src/lib.rs#L271-L281 So things are probably fine, but that is just a first guess. It's probably worth my trying this out. Is this in a state worth trying out? |
yeah, i had claude do a quick search through github and it couldn't find anyone actually using the
Yes, this is absolutely worth trying out. fwiw, I think lachy merged rectilinear chunk grid support in zarrs as well. so hopefully it isn't too much work on your side. |
Summary
This PR contains an alternative implementation of the rectilinear chunk grid extension, building on the work in #3534 (RLE helpers, validation logic, and test cases were directly adopted). While the core feature of variable-sized chunks is the same, the internal architecture differs in ways that impact extensibility, performance, and release safety.
I appreciate the patience of those who contributed to #3534, and everyone who's been waiting on this feature. I know it's frustrating to see a new PR after #3534 was so close. That PR provided fundamental components, and I hope people will see the value here. I really believe it is worth the churn for the following reasons:
Key differences from #3534
DimensionGridprotocol (FixedDimension,VaryingDimension). Adding a new dimension type (e.g.TiledDimensionfor periodic patterns like days-per-month) requires implementing that protocol — no changes to indexing, codecs, or theChunkGridclass. A prototype was built to verify this.VaryingDimensionuses precomputed prefix sums for O(log n) lookups via binary search. See https://github.com/maxrjones/zarr-chunk-grid-tests for a performance comparison.zarr.config.set({'array.rectilinear_chunks': True})(orZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives downstream libraries time to adapt before the API is finalized, and us an opportunity to gracefully finalize the API.Design document:
docs/design/chunk-grid.mdcovers the full design, rationale, and a suggested PR sequence for splitting this into reviewable increments, if needed.Downstream POCs (all passing):
TODO:
docs/user-guide/*.mdchanges/