Columnar spilling merge batcher#741
Open
frankmcsherry wants to merge 7 commits intoTimelyDataflow:master-nextfrom
Open
Columnar spilling merge batcher#741frankmcsherry wants to merge 7 commits intoTimelyDataflow:master-nextfrom
frankmcsherry wants to merge 7 commits intoTimelyDataflow:master-nextfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A v0 of a spilling merge batcher for columnar data. Nothing specific to columnar, except that it serializes well and it happens to be off to the side where we can specialize an implementation. Follows the idioms of timely's pager, with copied traits that abstract the stashing and fetching of data. There is probably some deduplication to do between them, but we went with a second implementation here to see if they looked the same, without forcing it. Roughly!
The merging is lower throughput than you might like owing to it using binary merges, which it will need to move beyond. There's also the potential to use compression on the columnar layouts, as .. at least in the
columnar_spillexample, two of the columns compress pretty well (and macos's compressed memory is quite competitive there).An example that saturates the CPU (rather than disk) on my laptop, moving ~50GB through the system.