HTTP server that exposes stores, arrays, groups#3732
HTTP server that exposes stores, arrays, groups#3732d-v-b wants to merge 10 commits intozarr-developers:mainfrom
Conversation
| @@ -19,47 +19,27 @@ | |||
| ZARR_PROJECT_PATH = Path(".").absolute() | |||
There was a problem hiding this comment.
changes in this file are simplifications to our examples testing infrastructure. Instead of re-writing the script header, we just override the declared zarr dep in the invocation of uv run ...
| @@ -0,0 +1,178 @@ | |||
| """Utilities for determining the set of valid store keys for zarr nodes. | |||
There was a problem hiding this comment.
we eventually need to find a more natural place for this code. I'm not sure which module it should live in.
Didn't end up doing these things. We can add them later if people are interested. |
|
here's a demo of this functionality: |
|
Does this really need to belong in the core python library? Is there any advantage to experimenting with it in this repository? |
IMO yes.
|
|
and putting this in |
|
another important use case: in zarr-python today, if you create a custom store, there is currently no way in zarr python to expose that store as a writable endpoint to a zarr-aware client. This PR enables this functionality. |
|
Love this, as someone who regularly uses @manzt https://github.com/manzt/simple-zarr-server |
|
thanks @psobolewskiPhD, given your experience with other tools let me know if there are any features missing from this PR and I can add them. |
I don't think this isn't a good argument. many zarr datasets are too big to compute on; should we vendor cubed/dask too? That said, i don't plan on helping maintain it so ... no skin off my back hehe |
I think visualizing zarr data is far more basic than doing distributed compute on it. It's very common to use data visualization as a basic sanity check when reading or writing data. And the server is less than 500 lines of code. I have not checked but I suspect this is a bit smaller than dask or cubed. |
|
Very cool! Works quite nicely! then this works from my CLI: (napari readers use extensions) |
Good idea, I can add this |
|
I did run into one issues, maybe PBCAK, but when the remote file being served was zarr2, if I didn't specify zarr_version=2 in my local (client) zarr.open things didn't work -- i couldn't get the arrays from a group. I happened to know it was zarr2, but in principle that might not be the case? |
that's a Zarr Python API thing -- we default to Zarr V3. we should have a less ambiguous / better API for this. |
|
Here's my serve script btw: Also, probably unrelated to this? but I tried, on a lark, to use obstore thinking this is an obstore use case: Builder errors |
|
AH, so I think i conflated things and this http server of course isn't object storage. |
#3698 would change that, although there are open discussions about how to map obspec to aiohttp - developmentseed/obspec-utils#65. |
There was a problem hiding this comment.
I'm on board with adding this to experimental, especially given the positive responses in this thread/on zulip.
I have a couple questions:
- We'll find out if something's broken via issues, but how will we find out if people love the feature such that it should be moved into the stable API? Should we include an ExperimentalWarning that links to a discussion where people can comment and advocate for the feature's elevation?
- Should we keep the functionality in
core.keysinside src/zarr/experimental/ for now, until it's used by part of the stable functionality? I'd want it to be as simple as possible to yank this stuff out if we decide not to keep it, so to me it seems helpful to keep experimental internals isolated.

This PR adds an experimental http server in
experimental.serve. This server can expose stores over http. It can also expose arrays and groups over http. Exposing a store means exposing the entire key: value space of the store. Exposing an array means only exposing the metadata + chunks. Exposing a group means only exposing sub-groups and sub-arrays. See #3731 for more on this distinction.The server is an optional dependency implemented via starlette. It handles byte-range reads and other http methods. CORS headers and allowed methods can be configured. I'm considering handling prefix requests like
foo/barby returning a simple HTML document that lists the visible keys underfoo/bar/, for user-friendliness and to aid httpstore readers that use such responses for listing contents. I'd also like to implement convenience functions for kicking off a server from jupyter and a CLI.Opening as a draft while I work on this.