High level C Scan API by myrrc · Pull Request #32 · vortex-data/rfcs

myrrc · 2026-03-20T17:40:42Z

High level API draft: vortex-data/vortex#7212
Duckdb integration PoC: https://github.com/vortex-data/vortex/tree/myrrc/scan-api-duckdb

connortsui20

Is it possible to follow the template a bit more closely? It's less about following the exact template and more that it's a bit hard to understand what it being proposed and the different tradeoffs (alternatives) without some of the sections in the template. Basically, when reading through this I'm trying to figure out the "why" via the "what", and it should be the other way around.

Additionally, there is quite a lot of code here that is paired with high-level description. I think it would be helpful if some of the boilerplate-y code was removed so we could focus on things that are not obvious. That would help in making this easier to read as well.

Also as an aside: I'm not an FFI expert but I feel like there is a lack of ownership semantics described here. Shouldn't that be one of the main things we have to worry about with interop between Rust and C?

proposed/0030-c-scan-api.md

myrrc · 2026-03-31T11:01:49Z

Is this the way we want to expose a cache?

I've thought about this, and likely we don't want to expose (in-memory) caching as a separate implementation. If host provides us with malloc/free-compatible functions, we can build our cache inside Vortex and host will still have observability.

On persistent caching, this may possibly be useful for remote files, but in this case it should be host's responsibility to save the decoded or original file somewhere on the filesystem and provide us with a reference. This can already be done with file_open (host may get the reference to a cached file).

High level API is missing a way to read a file directly from a buffer, but I didn't add it as I'm not sure this would be used by anyone.

connortsui20

Looks good! I had a few comments related to some missing detail and structure (and there are some grammar errors that I just left out) but this looks like it is on the right track.

I think it is good we are splitting this out into multiple proposals, but do we have a timeline for when we want the other layers? My guess is as we go down it gets even harder than it already is so it might be worth designing those sooner than later.

connortsui20 · 2026-04-01T17:01:26Z

proposed/0030-c-scan-api.md

+Vortex manages the event loop by using a CurrentThreadRuntime runtime handle
+which in turn has a `smol::Executor<'static>` without a background thread pool.
+This means the event loop will run on host's thread pool scheduler (in other
+terms, use the calling thread) implicitly. In case of multiple threads, the
+underlying executor is shared between them, and drives from from all of these
+threads. The executor is initialized once on loading FFI library.


I think we might want more detail on this (unless this is outside of the scope of this RFC and deserves its own document?).

For example, what does "the underlying executor is shared between them, and drives from from all of these threads" actually mean? So would the host thread hold the executor and then spawn more threads as necessary, or is this actually just a single host thread registers(?) the executor and the other host threads somehow know about this execution thread? Or are all of the threads cooperatively synchronizing the driving of execution where thread A calling vx_partition_next can make progress on tasks spawned by thread B while blocked on IO (like how tokio moves threads around)?

Or does this just not matter at all for the "high-level" layer? In that case maybe this should be moved out of this RFC.

connortsui20 · 2026-04-01T17:05:36Z

proposed/0030-c-scan-api.md

+implemented separately. As this is a big change, this PR will from now on focus
+just on the high level scan API. Other levels may be implemented later on


Can you move this to the top so that readers know this is only talking about the high-level layer?

connortsui20 · 2026-04-01T17:08:23Z

proposed/0030-c-scan-api.md

+  Current implementation copy-pastes ArrowSchema and ArrowArrayStream and gates
+  them behind a macro.
+
+## High level API integration example: DuckDB


Now that this RFC just focuses on the top layer, could you move this to the beginning of the RFC? Also, there doesn't actually seem to be that much different before and after, is the only difference rust vtable vs c ffi? And what is the benefit of (what I am assuming is the case right now) dropping the cxx bindings here? Is it just simplicity and maintenance surface area?

connortsui20 · 2026-04-01T17:10:59Z

proposed/0030-c-scan-api.md

+    // special case: row_range_begin == 0 && row_range_end == 0 means "all rows"
+    uint64_t row_range_begin; // Inclusive
+    uint64_t row_range_end; // Exclusive


This seems like a footgun... What about schema-only queries? I think it would be safer to have a flag for all_rows or something

connortsui20 · 2026-04-01T17:14:07Z

proposed/0030-c-scan-api.md

+
+| Level | Allocations | IO | Event loop | Scan planning |
+| ----- | ----------- | -- | ---------- | ------------- |
+| High  | Host/Vortex | Host/Vortex | Vortex | Vortex |


AFAICT there isn't any description of what host vs vortex allocations and IO looks like in this RFC? A brief description somewhere would be nice. For example, what is the API in which the host choose to make allocations themselves vs Vortex? And I guess if the fs callback function pointers are null we know that Vortex has to manage IO?

Edit: I just saw the comment below where it says that custom memory allocators are outside the scope of this RFC, could you move that up here too?

C Scan API: initial

c1c2c5c

connortsui20 requested changes Mar 20, 2026

View reviewed changes

proposed/0030-c-scan-api.md Outdated Show resolved Hide resolved

proposed/0030-c-scan-api.md Outdated Show resolved Hide resolved

proposed/0030-c-scan-api.md Outdated Show resolved Hide resolved

proposed/0030-c-scan-api.md Outdated Show resolved Hide resolved

myrrc requested review from 0ax1, connortsui20, gatesn and joseph-isaacs March 31, 2026 11:02

split API into 3 layers

ae6ccfd

myrrc force-pushed the myrrc/scan-api-c branch from 0796aca to ae6ccfd Compare March 31, 2026 11:26

focus only on high level as for now

3e8a9a4

myrrc changed the title ~~C Scan API~~ High level C Scan API Apr 1, 2026

internal review

23c6fda

myrrc force-pushed the myrrc/scan-api-c branch from 1637b3b to 23c6fda Compare April 1, 2026 16:35

connortsui20 reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High level C Scan API#32

High level C Scan API#32
myrrc wants to merge 4 commits intodevelopfrom
myrrc/scan-api-c

myrrc commented Mar 20, 2026 •

edited

Loading

Uh oh!

connortsui20 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myrrc commented Mar 31, 2026 •

edited

Loading

Uh oh!

connortsui20 left a comment

Uh oh!

connortsui20 Apr 1, 2026

Uh oh!

connortsui20 Apr 1, 2026

Uh oh!

connortsui20 Apr 1, 2026

Uh oh!

connortsui20 Apr 1, 2026

Uh oh!

connortsui20 Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		implemented separately. As this is a big change, this PR will from now on focus
		just on the high level scan API. Other levels may be implemented later on

Conversation

myrrc commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortsui20 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myrrc commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortsui20 left a comment

Choose a reason for hiding this comment

Uh oh!

connortsui20 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

myrrc commented Mar 20, 2026 •

edited

Loading

myrrc commented Mar 31, 2026 •

edited

Loading