perf: sort-merge join (SMJ) batch deferred filtering and move mark joins to bitwise stream. Near-unique LEFT and FULL SMJ 20-50x faster by mbutrovich · Pull Request #21184 · apache/datafusion

mbutrovich · 2026-03-26T18:00:18Z

Which issue does this PR close?

Partially addresses #20910. Fixes #21197.

Rationale for this change

Sort-merge join with a filter on outer joins (LEFT/RIGHT/FULL) runs process_filtered_batches() on every key transition in the Init state. With near-unique keys (1:1 cardinality), this means running the full deferred filtering pipeline (concat + get_corrected_filter_mask + filter_record_batch_by_join_type) once per row — making filtered LEFT/RIGHT/FULL 55x slower than INNER for 10M unique keys.

Additionally, mark join logic in MaterializingSortMergeJoinStream materializes full (streamed, buffered) pairs only to discard most of them via get_corrected_filter_mask(). Mark joins are structurally identical to semi joins (one output row per outer row with a boolean result) and belong in BitwiseSortMergeJoinStream, which avoids pair materialization entirely using a per-outer-batch bitset.

What changes are included in this PR?

Three areas of improvement, building on the specialized semi/anti stream from #20806:

1. Move mark joins to BitwiseSortMergeJoinStream

Match on join type; emit_outer_batch() emits all rows with the match bitset as a boolean column (vs semi's filter / anti's invert-and-filter)
Route LeftMark/RightMark from SortMergeJoinExec::execute() to the bitwise stream
Remove all mark-specific logic from MaterializingSortMergeJoinStream (mark_row_as_match, is_not_null column generation, mark arms in filter correction)

2. Batch filter evaluation in freeze_streamed()

Split freeze_streamed() into null-joined classification + freeze_streamed_matched() for batched materialization
Collect indices across chunks, materialize left/right columns once using tiered Arrow kernels (slice → take → interleave)
Single RecordBatch construction and single expression.evaluate() per freeze instead of per chunk
Vectorize append_filter_metadata() using builder extend() instead of per-element loop

3. Batch deferred filtering in Init state (this is the big win for Q22 and Q23)

Gate process_filtered_batches() on accumulated rows >= batch_size instead of running on every Init entry
Accumulated data bounded to ~2×batch_size (one from freeze_dequeuing_buffered, one accumulating toward next freeze) — does not reintroduce unbounded buffering fixed by PR fix: SortMergeJoin don't wait for all input before emitting #20482
Exhausted state flushes any remainder

Cleanup:

Rename SortMergeJoinStream → MaterializingSortMergeJoinStream (materializes explicit row pairs for join output) and SemiAntiMarkSortMergeJoinStream → BitwiseSortMergeJoinStream (tracks matches via boolean bitset)
Consolidate semi_anti_mark_sort_merge_join/ into sort_merge_join/ as bitwise_stream.rs / bitwise_tests.rs; rename stream.rs → materializing_stream.rs and tests.rs → materializing_tests.rs
Consolidate SpillManager construction into SortMergeJoinExec::execute() (shared across both streams); move peak_mem_used gauge into BitwiseSortMergeJoinStream::try_new
MaterializingSortMergeJoinStream now handles only Inner/Left/Right/Full — all semi/anti/mark branching removed
get_corrected_filter_mask(): merge identical Left/Right/Full branches; add null-metadata passthrough for already-null-joined rows
filter_record_batch_by_join_type(): rewrite from filter(true) + filter(false) + concat to zip() for in-place null-joining — preserves row ordering and removes create_null_joined_batch() entirely; add early return for empty batches
filter_record_batch_by_join_type(): use compute::filter() directly on BooleanArray instead of wrapping in temporary RecordBatch

Benchmarks

cargo run --release --bin dfbench -- smj

Query	Join Type	Rows	Keys	Filter	Main (ms)	PR (ms)	Speedup
Q1	INNER	1M×1M	1:1	—	16.3	14.4	1.1x
Q2	INNER	1M×10M	1:10	—	117.4	120.1	1.0x
Q3	INNER	1M×1M	1:100	—	74.2	66.6	1.1x
Q4	INNER	1M×10M	1:10	1%	17.1	15.1	1.1x
Q5	INNER	1M×1M	1:100	10%	18.4	14.4	1.3x
Q6	LEFT	1M×10M	1:10	—	129.3	122.7	1.1x
Q7	LEFT	1M×10M	1:10	50%	150.2	142.2	1.1x
Q8	FULL	1M×1M	1:10	—	16.6	16.7	1.0x
Q9	FULL	1M×10M	1:10	10%	153.5	136.2	1.1x
Q10	LEFT SEMI	1M×10M	1:10	—	53.1	53.1	1.0x
Q11	LEFT SEMI	1M×10M	1:10	1%	15.5	14.7	1.1x
Q12	LEFT SEMI	1M×10M	1:10	50%	65.0	67.3	1.0x
Q13	LEFT SEMI	1M×10M	1:10	90%	105.7	109.8	1.0x
Q14	LEFT ANTI	1M×10M	1:10	—	54.3	53.9	1.0x
Q15	LEFT ANTI	1M×10M	1:10	partial	51.5	50.5	1.0x
Q16	LEFT ANTI	1M×1M	1:1	—	10.3	11.3	0.9x
Q17	INNER	1M×50M	1:50	5%	75.9	79.0	1.0x
Q18	LEFT SEMI	1M×50M	1:50	2%	50.2	49.0	1.0x
Q19	LEFT ANTI	1M×50M	1:50	partial	336.4	344.2	1.0x
Q20	INNER	1M×10M	1:100	GROUP BY	763.7	803.9	1.0x
Q21	INNER	10M×10M	1:1	50%	186.1	187.8	1.0x
Q22	LEFT	10M×10M	1:1	50%	10,193.8	185.8	54.9x
Q23	FULL	10M×10M	1:1	50%	10,194.7	233.6	43.6x
Q24	LEFT MARK	1M×10M	1:10	1%	FAILS	15.1	—
Q25	LEFT MARK	1M×10M	1:10	50%	FAILS	67.3	—
Q26	LEFT MARK	1M×10M	1:10	90%	FAILS	110.0	—

General workload (Q1-Q20, various join types/cardinalities/selectivities): no regressions.

Are these changes tested?

In addition to existing unit and sqllogictests:

I ran 50 iterations of the fuzz tests (modified to only test against hash join as the baseline because nested loop join takes too long) cargo test -p datafusion --features extended_tests --test fuzz -- join_fuzz
One new sqllogictest for bug: sort-merge join (SMJ) LeftMark join with join filter crashes on non-nullable columns #21197 that fails on main
Four new unit tests: three for full join with filter that spills
One new fuzz test to exercise full join with filter that spills
New benchmark queries Q21-Q23: 10M×10M unique keys with 50% join filter for INNER/LEFT/FULL — exercises the degenerate case this PR fixes
New benchmark queries Q24-Q26 duplicated Q11-Q13 but for Mark joins, showing that they have the same performance as other joins (LeftSemi) that use this stream

Are there any user-facing changes?

No.

mbutrovich · 2026-03-26T18:26:47Z

Tagging folks who had feedback on recent SMJ changes @comphead @rluvaton @stuhood. Thank you!

rluvaton · 2026-03-26T18:27:26Z

run benchmarks sort_merge_join

mbutrovich · 2026-03-26T18:29:37Z

run benchmarks sort_merge_join

Note that the 2 queries I expect a speedup on in the smj suite are new in this PR, so I don't think we'll see their performance against main. I had to hoist the benchmark to main and run it locally for the comparison in the PR description.

adriangbot · 2026-03-26T18:31:10Z

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4137272954-565-stzm5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing simplify_smj_full_opt (1c1bec5) to ba399a8 (merge-base) diff
BENCH_NAME=sort_merge_join
BENCH_COMMAND=cargo bench --features=parquet --bench sort_merge_join
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-26T18:31:16Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

Cloning into '/workspace/datafusion-branch'...
simplify_smj_full_opt
From https://github.com/apache/datafusion
 * [new ref]         refs/pull/21184/head -> simplify_smj_full_opt
 * branch            main                 -> FETCH_HEAD
Switched to branch 'simplify_smj_full_opt'
ba399a80f9ffcb0563adf2b67add13d0476f6291
Cloning into '/workspace/datafusion-base'...
HEAD is now at ba399a8 docs: add KalamDB to known users (#21181)
rustc 1.94.0 (4a4ef493e 2026-03-02)
1c1bec5e7a217c366e704d1fd5bf8594a9e9540e
ba399a80f9ffcb0563adf2b67add13d0476f6291
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
error: target `sort_merge_join` in package `datafusion-physical-plan` requires the features: `test_utils`
Consider enabling them by passing, e.g., `--features="test_utils"`

File an issue against this benchmark runner

mbutrovich · 2026-03-26T18:35:19Z

Also I'm now confused where I should add benchmarks. #20464 added Criterion SMJ benchmarks for sort-merge join , but it's missing scenarios from dfbench's smj benchmarks, which I further extend here. Any help?

Dandandan · 2026-03-26T18:52:35Z

adriangb/datafusion-benchmarking#2

datafusion/physical-plan/src/joins/sort_merge_join/filter.rs

mbutrovich · 2026-03-27T15:29:26Z

Updated performance numbers in the description to use the scaled benchmark from #21200, and added 3 Mark join benchmarks that match Q11-Q13 LeftSemi behavior. These actually crash on main which this PR fixes (#21197), so I can't compare against them. However, their performance matches Q11-Q13.

## Which issue does this PR close?  - Closes #. ## Rationale for this change  Our SMJ benchmark queries finish too quickly to demonstrate improvements that aren't massive. For example, I am working on an optimization that introduces `DynComparator` (part of #20910) and it's about a 10% improvement, but only when you actually make the queries run long enough. The new queries for #21184 are scaled enough to see improvements, but we need to scale the older queries. I am also continuing to see SMJ issues with Comet when running joins with billions (sometimes trillions) of rows. We can't do that for microbenchmarks, but we can at least start hitting millions of rows to look at more than a handful of batches. ## What changes are included in this PR?  Bring our SMJ queries into alignment with some of the newer ones (Q21-23) to demonstrate further performance wins. ## Are these changes tested?  I ran the benchmark. On my M3 Max, here's how long it takes: | Query | Join Type | Rows | Keys | Filter | Median (ms) | |-------|-----------|------|------|--------|-------------| | Q1 | INNER | 1M×1M | 1:1 | — | 16.3 | | Q2 | INNER | 1M×10M | 1:10 | — | 117.4 | | Q3 | INNER | 1M×1M | 1:100 | — | 74.2 | | Q4 | INNER | 1M×10M | 1:10 | 1% | 17.1 | | Q5 | INNER | 1M×1M | 1:100 | 10% | 18.4 | | Q6 | LEFT | 1M×10M | 1:10 | — | 129.3 | | Q7 | LEFT | 1M×10M | 1:10 | 50% | 150.2 | | Q8 | FULL | 1M×1M | 1:10 | — | 16.6 | | Q9 | FULL | 1M×10M | 1:10 | 10% | 153.5 | | Q10 | LEFT SEMI | 1M×10M | 1:10 | — | 53.1 | | Q11 | LEFT SEMI | 1M×10M | 1:10 | 1% | 15.5 | | Q12 | LEFT SEMI | 1M×10M | 1:10 | 50% | 65.0 | | Q13 | LEFT SEMI | 1M×10M | 1:10 | 90% | 105.7 | | Q14 | LEFT ANTI | 1M×10M | 1:10 | — | 54.3 | | Q15 | LEFT ANTI | 1M×10M | 1:10 | partial | 51.5 | | Q16 | LEFT ANTI | 1M×1M | 1:1 | — | 10.3 | | Q17 | INNER | 1M×50M | 1:50 | 5% | 75.9 | | Q18 | LEFT SEMI | 1M×50M | 1:50 | 2% | 50.2 | | Q19 | LEFT ANTI | 1M×50M | 1:50 | partial | 336.4 | | Q20 | INNER | 1M×10M | 1:100 | GROUP BY | 763.7 | | Q21 | INNER | 10M×10M | 1:1 | 50% | 186.1 | | Q22 | LEFT | 10M×10M | 1:1 | 50% | 10,193.8 | | Q23 | FULL | 10M×10M | 1:1 | 50% | 10,194.7 | Note that Q22 and Q23 will be about 20x faster when #21184 merges, so taking 10 seconds to run is just a short-term issue. ## Are there any user-facing changes?   No.

mbutrovich · 2026-03-27T16:47:18Z

run benchmark smj

adriangbot · 2026-03-27T16:50:04Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4143917115-582-xvlqq 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing simplify_smj_full_opt (768529e) to 37c1b75 (merge-base) diff using: smj
Results will be posted here when complete

File an issue against this benchmark runner

Dandandan · 2026-03-27T17:05:40Z

run benchmarks

env:
   DATAFUSION_OPTIMIZER_PREFER_HASH_JOIN: false

adriangbot · 2026-03-27T17:05:51Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and simplify_smj_full_opt
--------------------
Benchmark smj.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃                 simplify_smj_full_opt ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 1  │           32.59 / 32.77 ±0.10 / 32.90 ms │        32.26 / 32.59 ±0.19 / 32.75 ms │      no change │
│ QQuery 2  │        180.91 / 185.16 ±3.12 / 190.56 ms │     184.79 / 190.88 ±5.61 / 201.26 ms │      no change │
│ QQuery 3  │        104.32 / 105.31 ±0.65 / 106.09 ms │     105.96 / 107.17 ±1.60 / 110.31 ms │      no change │
│ QQuery 4  │           29.97 / 30.34 ±0.36 / 30.97 ms │        29.06 / 30.03 ±0.99 / 31.81 ms │      no change │
│ QQuery 5  │           22.90 / 23.15 ±0.27 / 23.65 ms │        22.73 / 23.10 ±0.25 / 23.37 ms │      no change │
│ QQuery 6  │        189.25 / 192.76 ±3.19 / 198.37 ms │     182.94 / 188.79 ±5.99 / 200.34 ms │      no change │
│ QQuery 7  │        217.62 / 221.37 ±2.51 / 224.66 ms │    223.26 / 234.59 ±12.17 / 250.29 ms │   1.06x slower │
│ QQuery 8  │           22.78 / 23.48 ±0.45 / 24.15 ms │        23.31 / 24.39 ±1.25 / 26.83 ms │      no change │
│ QQuery 9  │        224.60 / 226.72 ±1.63 / 229.07 ms │     227.75 / 234.75 ±4.99 / 240.75 ms │      no change │
│ QQuery 10 │           79.90 / 83.68 ±4.14 / 89.42 ms │        81.92 / 86.07 ±3.68 / 90.20 ms │      no change │
│ QQuery 11 │           28.21 / 28.72 ±0.31 / 29.07 ms │        28.23 / 28.97 ±0.39 / 29.35 ms │      no change │
│ QQuery 12 │           74.40 / 77.34 ±2.28 / 81.41 ms │        75.76 / 77.42 ±1.90 / 81.07 ms │      no change │
│ QQuery 13 │        109.59 / 116.02 ±3.93 / 121.71 ms │     111.00 / 114.95 ±3.46 / 120.16 ms │      no change │
│ QQuery 14 │           77.87 / 82.03 ±3.45 / 87.40 ms │        81.34 / 85.03 ±3.22 / 89.32 ms │      no change │
│ QQuery 15 │           79.63 / 83.29 ±3.22 / 88.35 ms │        82.58 / 83.99 ±1.31 / 86.44 ms │      no change │
│ QQuery 16 │           15.35 / 16.03 ±0.40 / 16.50 ms │        15.87 / 16.14 ±0.20 / 16.50 ms │      no change │
│ QQuery 17 │        149.19 / 151.42 ±1.31 / 152.57 ms │     150.62 / 152.39 ±1.71 / 155.34 ms │      no change │
│ QQuery 18 │        110.17 / 111.40 ±0.88 / 112.70 ms │     110.76 / 111.87 ±0.84 / 112.83 ms │      no change │
│ QQuery 19 │        572.67 / 586.96 ±7.59 / 593.63 ms │    406.08 / 562.77 ±78.47 / 608.12 ms │      no change │
│ QQuery 20 │     1274.34 / 1280.20 ±4.73 / 1287.64 ms │ 1263.37 / 1280.64 ±11.12 / 1294.29 ms │      no change │
│ QQuery 21 │       374.91 / 382.80 ±11.23 / 405.01 ms │     376.03 / 389.35 ±9.87 / 400.23 ms │      no change │
│ QQuery 22 │ 6755.42 / 8393.91 ±1340.16 / 10340.80 ms │     383.48 / 389.55 ±6.18 / 401.24 ms │ +21.55x faster │
│ QQuery 23 │   7590.91 / 8646.70 ±672.32 / 9681.19 ms │     433.13 / 436.60 ±2.34 / 439.23 ms │ +19.80x faster │
└───────────┴──────────────────────────────────────────┴───────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 21081.57ms │
│ Total Time (simplify_smj_full_opt)   │  4882.03ms │
│ Average Time (HEAD)                  │   916.59ms │
│ Average Time (simplify_smj_full_opt) │   187.77ms │
│ Queries Faster                       │          2 │
│ Queries Slower                       │          1 │
│ Queries with No Change               │         20 │
│ Queries with Failure                 │          0 │
└──────────────────────────────────────┴────────────┘

Resource Usage

smj — base (merge-base)

Metric	Value
Wall time	105.7s
Peak memory	3.5 GiB
Avg memory	3.3 GiB
CPU user	1163.2s
CPU sys	7.3s
Disk read	0 B
Disk write	184.0 MiB

smj — branch

Metric	Value
Wall time	25.8s
Peak memory	3.5 GiB
Avg memory	3.2 GiB
CPU user	212.2s
CPU sys	5.5s
Disk read	0 B
Disk write	736.0 KiB

File an issue against this benchmark runner

adriangbot · 2026-03-27T17:06:17Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4144026714-583-n9c4k 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing simplify_smj_full_opt (768529e) to 37c1b75 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-27T17:08:12Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4144026714-584-4zlzv 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing simplify_smj_full_opt (768529e) to 37c1b75 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-27T17:08:21Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4144026714-585-dr5c5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing simplify_smj_full_opt (768529e) to 37c1b75 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-03-27T17:17:03Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and simplify_smj_full_opt
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃          simplify_smj_full_opt ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 46.43 / 47.63 ±0.80 / 48.75 ms │ 45.50 / 46.06 ±0.66 / 47.34 ms │    no change │
│ QQuery 2  │ 21.74 / 22.91 ±1.16 / 24.56 ms │ 21.27 / 21.96 ±0.68 / 23.27 ms │    no change │
│ QQuery 3  │ 32.18 / 32.36 ±0.19 / 32.60 ms │ 32.19 / 34.64 ±2.82 / 38.42 ms │ 1.07x slower │
│ QQuery 4  │ 20.71 / 22.20 ±0.83 / 23.01 ms │ 21.17 / 22.58 ±1.09 / 24.20 ms │    no change │
│ QQuery 5  │ 51.16 / 52.12 ±0.70 / 53.16 ms │ 49.29 / 51.54 ±1.69 / 54.53 ms │    no change │
│ QQuery 6  │ 17.28 / 17.49 ±0.20 / 17.84 ms │ 17.28 / 18.00 ±1.01 / 19.99 ms │    no change │
│ QQuery 7  │ 54.63 / 58.26 ±2.07 / 60.98 ms │ 54.05 / 55.75 ±1.60 / 58.70 ms │    no change │
│ QQuery 8  │ 48.62 / 49.80 ±1.85 / 53.49 ms │ 48.60 / 49.01 ±0.23 / 49.24 ms │    no change │
│ QQuery 9  │ 54.23 / 55.00 ±0.59 / 55.81 ms │ 53.84 / 55.69 ±1.29 / 57.74 ms │    no change │
│ QQuery 10 │ 70.56 / 72.89 ±1.84 / 75.33 ms │ 68.98 / 71.54 ±2.18 / 75.33 ms │    no change │
│ QQuery 11 │ 14.26 / 14.45 ±0.21 / 14.85 ms │ 14.25 / 14.46 ±0.23 / 14.87 ms │    no change │
│ QQuery 12 │ 28.25 / 29.77 ±1.49 / 32.08 ms │ 28.19 / 28.90 ±0.44 / 29.39 ms │    no change │
│ QQuery 13 │ 39.08 / 39.80 ±0.85 / 41.40 ms │ 38.64 / 39.15 ±0.52 / 40.09 ms │    no change │
│ QQuery 14 │ 28.49 / 28.82 ±0.37 / 29.53 ms │ 28.39 / 28.66 ±0.21 / 28.98 ms │    no change │
│ QQuery 15 │ 33.77 / 34.24 ±0.67 / 35.54 ms │ 33.31 / 34.20 ±0.72 / 35.19 ms │    no change │
│ QQuery 16 │ 16.04 / 16.44 ±0.29 / 16.82 ms │ 15.99 / 16.38 ±0.56 / 17.48 ms │    no change │
│ QQuery 17 │ 73.01 / 73.83 ±1.25 / 76.30 ms │ 72.71 / 73.45 ±0.60 / 74.24 ms │    no change │
│ QQuery 18 │ 78.18 / 79.49 ±0.75 / 80.13 ms │ 77.74 / 78.94 ±0.68 / 79.75 ms │    no change │
│ QQuery 19 │ 37.36 / 38.32 ±1.04 / 40.07 ms │ 37.58 / 37.87 ±0.17 / 38.03 ms │    no change │
│ QQuery 20 │ 39.85 / 40.47 ±0.45 / 41.19 ms │ 40.48 / 41.25 ±0.64 / 42.41 ms │    no change │
│ QQuery 21 │ 64.29 / 65.53 ±1.08 / 67.05 ms │ 64.75 / 66.16 ±0.88 / 67.14 ms │    no change │
│ QQuery 22 │ 17.88 / 18.18 ±0.25 / 18.57 ms │ 17.81 / 18.22 ±0.34 / 18.63 ms │    no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                    ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 910.01ms │
│ Total Time (simplify_smj_full_opt)   │ 904.38ms │
│ Average Time (HEAD)                  │  41.36ms │
│ Average Time (simplify_smj_full_opt) │  41.11ms │
│ Queries Faster                       │        0 │
│ Queries Slower                       │        1 │
│ Queries with No Change               │       21 │
│ Queries with Failure                 │        0 │
└──────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	4.8s
Peak memory	4.1 GiB
Avg memory	3.6 GiB
CPU user	33.8s
CPU sys	2.5s
Disk read	0 B
Disk write	140.0 KiB

tpch — branch

Metric	Value
Wall time	4.7s
Peak memory	4.0 GiB
Avg memory	3.6 GiB
CPU user	33.6s
CPU sys	2.7s
Disk read	0 B
Disk write	60.0 KiB

File an issue against this benchmark runner

adriangbot · 2026-03-27T17:22:05Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and simplify_smj_full_opt
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃                 simplify_smj_full_opt ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.37 / 4.68 ±6.47 / 17.62 ms │          1.37 / 4.71 ±6.51 / 17.73 ms │     no change │
│ QQuery 1  │        14.46 / 14.82 ±0.22 / 15.12 ms │        14.54 / 15.04 ±0.26 / 15.24 ms │     no change │
│ QQuery 2  │        44.06 / 44.57 ±0.28 / 44.92 ms │        44.71 / 45.26 ±0.41 / 45.81 ms │     no change │
│ QQuery 3  │        43.58 / 45.41 ±1.42 / 47.65 ms │        46.29 / 47.58 ±0.89 / 49.07 ms │     no change │
│ QQuery 4  │     297.04 / 303.80 ±3.71 / 307.71 ms │     308.77 / 315.44 ±6.07 / 326.86 ms │     no change │
│ QQuery 5  │     355.06 / 361.97 ±3.81 / 366.13 ms │     360.02 / 365.33 ±4.32 / 372.43 ms │     no change │
│ QQuery 6  │           5.12 / 5.79 ±0.49 / 6.44 ms │           5.44 / 6.39 ±1.04 / 8.40 ms │  1.10x slower │
│ QQuery 7  │        16.92 / 17.30 ±0.30 / 17.70 ms │        17.26 / 18.13 ±0.81 / 19.56 ms │     no change │
│ QQuery 8  │     425.62 / 436.22 ±5.45 / 440.41 ms │     436.00 / 443.50 ±4.33 / 447.69 ms │     no change │
│ QQuery 9  │     666.36 / 675.90 ±7.28 / 686.69 ms │    672.64 / 685.72 ±10.78 / 698.50 ms │     no change │
│ QQuery 10 │        91.66 / 94.70 ±2.42 / 99.04 ms │        92.22 / 96.43 ±2.52 / 99.71 ms │     no change │
│ QQuery 11 │     104.43 / 106.46 ±1.72 / 109.49 ms │     104.68 / 106.92 ±1.86 / 110.03 ms │     no change │
│ QQuery 12 │     349.52 / 353.84 ±3.43 / 359.26 ms │     357.11 / 364.30 ±4.73 / 370.40 ms │     no change │
│ QQuery 13 │    469.31 / 481.19 ±11.39 / 494.90 ms │    478.91 / 494.34 ±11.28 / 512.37 ms │     no change │
│ QQuery 14 │     358.15 / 367.33 ±6.49 / 374.31 ms │     362.39 / 367.80 ±5.05 / 377.07 ms │     no change │
│ QQuery 15 │    378.11 / 391.41 ±18.47 / 427.72 ms │    388.26 / 399.30 ±11.11 / 418.27 ms │     no change │
│ QQuery 16 │    735.69 / 765.57 ±31.82 / 823.68 ms │    742.59 / 760.02 ±17.37 / 792.02 ms │     no change │
│ QQuery 17 │     724.98 / 738.23 ±9.95 / 755.32 ms │    736.51 / 756.12 ±18.38 / 790.17 ms │     no change │
│ QQuery 18 │ 1470.47 / 1499.13 ±24.79 / 1544.34 ms │ 1449.00 / 1512.55 ±32.47 / 1537.84 ms │     no change │
│ QQuery 19 │       37.14 / 45.60 ±14.04 / 73.54 ms │        35.61 / 38.02 ±2.15 / 41.55 ms │ +1.20x faster │
│ QQuery 20 │    713.58 / 731.51 ±16.09 / 750.94 ms │    720.95 / 734.47 ±17.58 / 768.74 ms │     no change │
│ QQuery 21 │     765.35 / 772.53 ±5.35 / 781.13 ms │     760.25 / 771.89 ±6.40 / 778.88 ms │     no change │
│ QQuery 22 │ 1141.15 / 1152.69 ±12.07 / 1173.88 ms │  1140.13 / 1150.65 ±6.17 / 1158.27 ms │     no change │
│ QQuery 23 │ 3152.92 / 3198.70 ±28.78 / 3236.71 ms │ 3148.95 / 3163.55 ±11.51 / 3180.91 ms │     no change │
│ QQuery 24 │     104.76 / 106.43 ±2.76 / 111.93 ms │     102.40 / 105.75 ±3.36 / 111.97 ms │     no change │
│ QQuery 25 │     138.86 / 141.84 ±2.33 / 145.04 ms │     139.60 / 141.31 ±1.14 / 143.09 ms │     no change │
│ QQuery 26 │     100.03 / 104.32 ±4.12 / 110.68 ms │     102.11 / 105.02 ±2.68 / 109.06 ms │     no change │
│ QQuery 27 │    857.04 / 867.36 ±10.81 / 887.04 ms │     852.41 / 857.31 ±6.55 / 870.11 ms │     no change │
│ QQuery 28 │ 7774.93 / 7836.34 ±33.07 / 7872.38 ms │ 7771.57 / 7841.38 ±37.56 / 7880.82 ms │     no change │
│ QQuery 29 │        51.94 / 55.76 ±6.11 / 67.94 ms │        51.64 / 56.04 ±5.34 / 65.69 ms │     no change │
│ QQuery 30 │     384.85 / 392.60 ±5.26 / 399.33 ms │     382.52 / 388.17 ±4.34 / 394.46 ms │     no change │
│ QQuery 31 │     383.16 / 390.55 ±6.20 / 399.49 ms │     373.55 / 381.82 ±6.70 / 390.21 ms │     no change │
│ QQuery 32 │ 1104.99 / 1121.20 ±18.56 / 1156.20 ms │ 1065.87 / 1104.56 ±40.43 / 1171.28 ms │     no change │
│ QQuery 33 │ 1530.94 / 1584.28 ±31.65 / 1627.70 ms │  1507.50 / 1516.82 ±6.00 / 1524.86 ms │     no change │
│ QQuery 34 │  1521.11 / 1533.87 ±9.72 / 1548.12 ms │ 1490.69 / 1518.31 ±23.30 / 1550.85 ms │     no change │
│ QQuery 35 │     418.48 / 428.28 ±5.61 / 434.49 ms │     410.54 / 416.99 ±5.23 / 424.27 ms │     no change │
│ QQuery 36 │     116.07 / 123.76 ±5.19 / 130.52 ms │     114.36 / 121.70 ±5.85 / 129.37 ms │     no change │
│ QQuery 37 │        49.05 / 52.34 ±3.44 / 58.62 ms │        49.05 / 50.50 ±1.25 / 52.00 ms │     no change │
│ QQuery 38 │        76.42 / 77.90 ±1.56 / 80.42 ms │        75.93 / 77.58 ±1.35 / 79.41 ms │     no change │
│ QQuery 39 │     216.43 / 227.58 ±8.14 / 237.38 ms │     221.97 / 228.94 ±5.74 / 236.60 ms │     no change │
│ QQuery 40 │        25.20 / 27.26 ±1.96 / 29.69 ms │        24.58 / 26.30 ±1.60 / 29.09 ms │     no change │
│ QQuery 41 │        21.08 / 22.39 ±0.96 / 23.87 ms │        21.61 / 22.32 ±0.76 / 23.43 ms │     no change │
│ QQuery 42 │        20.23 / 20.88 ±0.40 / 21.44 ms │        20.06 / 20.81 ±0.59 / 21.66 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 27724.27ms │
│ Total Time (simplify_smj_full_opt)   │ 27645.09ms │
│ Average Time (HEAD)                  │   644.75ms │
│ Average Time (simplify_smj_full_opt) │   642.91ms │
│ Queries Faster                       │          1 │
│ Queries Slower                       │          1 │
│ Queries with No Change               │         41 │
│ Queries with Failure                 │          0 │
└──────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	139.8s
Peak memory	40.5 GiB
Avg memory	31.8 GiB
CPU user	1319.2s
CPU sys	90.8s
Disk read	0 B
Disk write	3.1 GiB

clickbench_partitioned — branch

Metric	Value
Wall time	139.4s
Peak memory	42.1 GiB
Avg memory	30.9 GiB
CPU user	1313.0s
CPU sys	91.8s
Disk read	0 B
Disk write	96.0 KiB

File an issue against this benchmark runner

adriangbot · 2026-03-27T17:22:40Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and simplify_smj_full_opt
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃                    simplify_smj_full_opt ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           44.56 / 45.32 ±0.66 / 46.48 ms │           43.87 / 44.65 ±0.86 / 45.88 ms │     no change │
│ QQuery 2  │        147.74 / 149.10 ±0.77 / 149.87 ms │        148.29 / 148.94 ±0.65 / 150.09 ms │     no change │
│ QQuery 3  │        114.96 / 115.68 ±0.77 / 116.99 ms │        116.46 / 117.40 ±0.92 / 119.08 ms │     no change │
│ QQuery 4  │    1351.57 / 1367.68 ±21.86 / 1410.58 ms │    1346.06 / 1396.59 ±38.81 / 1461.75 ms │     no change │
│ QQuery 5  │        174.68 / 175.56 ±0.78 / 176.73 ms │        174.41 / 175.49 ±0.76 / 176.64 ms │     no change │
│ QQuery 6  │     995.39 / 1020.82 ±23.13 / 1063.28 ms │    1033.38 / 1059.37 ±18.42 / 1085.18 ms │     no change │
│ QQuery 7  │        361.72 / 366.39 ±4.17 / 373.08 ms │        354.16 / 359.59 ±5.03 / 368.33 ms │     no change │
│ QQuery 8  │        117.49 / 119.28 ±1.64 / 121.71 ms │        115.88 / 119.05 ±2.30 / 121.73 ms │     no change │
│ QQuery 9  │        102.68 / 109.67 ±7.54 / 123.82 ms │        102.87 / 106.06 ±2.31 / 109.12 ms │     no change │
│ QQuery 10 │        109.81 / 111.15 ±0.81 / 112.16 ms │        109.08 / 110.03 ±0.88 / 111.51 ms │     no change │
│ QQuery 11 │    1020.26 / 1054.46 ±22.84 / 1085.64 ms │      934.30 / 996.42 ±48.35 / 1046.88 ms │ +1.06x faster │
│ QQuery 12 │           47.31 / 49.60 ±1.21 / 50.89 ms │           44.64 / 45.75 ±0.90 / 47.10 ms │ +1.08x faster │
│ QQuery 13 │        407.50 / 420.92 ±7.05 / 428.23 ms │        414.15 / 415.65 ±1.44 / 418.08 ms │     no change │
│ QQuery 14 │     1035.34 / 1044.79 ±6.18 / 1051.61 ms │     1038.64 / 1043.83 ±3.32 / 1048.51 ms │     no change │
│ QQuery 15 │           17.34 / 18.99 ±1.09 / 20.57 ms │           17.13 / 17.81 ±0.64 / 18.70 ms │ +1.07x faster │
│ QQuery 16 │           41.73 / 42.94 ±0.98 / 44.45 ms │           42.44 / 42.96 ±0.29 / 43.30 ms │     no change │
│ QQuery 17 │        251.98 / 253.83 ±1.42 / 255.86 ms │        249.10 / 251.00 ±1.46 / 252.85 ms │     no change │
│ QQuery 18 │        131.64 / 134.58 ±1.75 / 136.97 ms │        129.63 / 132.56 ±2.13 / 135.38 ms │     no change │
│ QQuery 19 │        155.56 / 159.47 ±2.90 / 162.96 ms │        156.28 / 157.49 ±0.93 / 159.08 ms │     no change │
│ QQuery 20 │           13.88 / 14.53 ±0.43 / 15.08 ms │           14.10 / 14.82 ±0.40 / 15.13 ms │     no change │
│ QQuery 21 │           19.42 / 20.21 ±0.42 / 20.56 ms │           20.36 / 20.69 ±0.34 / 21.27 ms │     no change │
│ QQuery 22 │        495.26 / 503.23 ±5.14 / 511.24 ms │        491.84 / 495.49 ±2.70 / 499.52 ms │     no change │
│ QQuery 23 │       893.41 / 904.68 ±11.43 / 923.53 ms │       920.37 / 949.68 ±19.36 / 980.20 ms │     no change │
│ QQuery 24 │        419.62 / 429.87 ±6.66 / 437.25 ms │        418.06 / 422.66 ±3.89 / 428.43 ms │     no change │
│ QQuery 25 │        366.56 / 369.05 ±2.04 / 372.27 ms │        355.40 / 360.14 ±2.47 / 362.17 ms │     no change │
│ QQuery 26 │           82.39 / 85.16 ±1.57 / 87.19 ms │           82.84 / 84.35 ±1.01 / 85.65 ms │     no change │
│ QQuery 27 │        349.36 / 353.90 ±3.17 / 359.09 ms │        348.28 / 353.62 ±3.69 / 359.57 ms │     no change │
│ QQuery 28 │        152.67 / 153.97 ±0.90 / 155.23 ms │        150.11 / 152.78 ±1.65 / 154.58 ms │     no change │
│ QQuery 29 │        307.17 / 313.18 ±3.71 / 317.84 ms │        300.18 / 303.71 ±2.62 / 308.21 ms │     no change │
│ QQuery 30 │           44.83 / 47.53 ±1.79 / 49.76 ms │           45.18 / 45.95 ±0.61 / 46.85 ms │     no change │
│ QQuery 31 │        177.29 / 178.74 ±1.12 / 180.31 ms │        176.25 / 179.02 ±2.23 / 182.18 ms │     no change │
│ QQuery 32 │           59.22 / 60.65 ±1.04 / 61.91 ms │           57.56 / 58.67 ±0.85 / 59.68 ms │     no change │
│ QQuery 33 │        143.19 / 146.79 ±2.48 / 150.92 ms │        143.06 / 145.05 ±1.54 / 146.72 ms │     no change │
│ QQuery 34 │        107.44 / 108.51 ±0.70 / 109.53 ms │        112.20 / 113.91 ±1.83 / 117.14 ms │     no change │
│ QQuery 35 │        109.79 / 112.48 ±2.66 / 117.30 ms │        109.40 / 111.60 ±1.48 / 113.94 ms │     no change │
│ QQuery 36 │        214.26 / 222.30 ±4.70 / 228.24 ms │        213.61 / 220.43 ±3.75 / 223.74 ms │     no change │
│ QQuery 37 │        180.02 / 182.13 ±1.66 / 185.00 ms │        179.76 / 182.50 ±2.15 / 185.04 ms │     no change │
│ QQuery 38 │           84.14 / 89.20 ±4.50 / 97.57 ms │           90.77 / 93.47 ±2.19 / 97.25 ms │     no change │
│ QQuery 39 │        126.21 / 131.37 ±3.33 / 135.23 ms │        133.62 / 137.01 ±2.10 / 139.57 ms │     no change │
│ QQuery 40 │        110.50 / 119.54 ±5.51 / 125.67 ms │        118.41 / 124.07 ±5.90 / 135.41 ms │     no change │
│ QQuery 41 │           14.22 / 14.97 ±0.59 / 16.03 ms │           15.32 / 16.05 ±0.67 / 17.25 ms │  1.07x slower │
│ QQuery 42 │        106.28 / 107.49 ±1.00 / 108.95 ms │        109.46 / 111.27 ±1.42 / 113.03 ms │     no change │
│ QQuery 43 │           85.18 / 86.01 ±1.20 / 88.36 ms │           86.04 / 87.01 ±0.82 / 88.16 ms │     no change │
│ QQuery 44 │           11.39 / 11.90 ±0.27 / 12.13 ms │           12.23 / 13.34 ±0.97 / 14.55 ms │  1.12x slower │
│ QQuery 45 │           52.92 / 53.81 ±0.66 / 54.63 ms │           55.12 / 56.08 ±0.75 / 57.18 ms │     no change │
│ QQuery 46 │        250.01 / 253.81 ±3.49 / 259.97 ms │        234.63 / 239.81 ±5.90 / 247.40 ms │ +1.06x faster │
│ QQuery 47 │       713.49 / 749.82 ±34.00 / 813.44 ms │       697.25 / 737.13 ±19.96 / 747.78 ms │     no change │
│ QQuery 48 │        299.74 / 302.70 ±2.87 / 307.27 ms │        288.65 / 298.20 ±6.54 / 307.41 ms │     no change │
│ QQuery 49 │        256.85 / 260.54 ±3.38 / 265.61 ms │        257.15 / 260.16 ±2.89 / 265.46 ms │     no change │
│ QQuery 50 │        230.75 / 238.63 ±8.63 / 254.26 ms │       230.97 / 242.24 ±12.42 / 264.12 ms │     no change │
│ QQuery 51 │        181.33 / 185.32 ±3.00 / 190.46 ms │        184.75 / 190.00 ±3.14 / 193.53 ms │     no change │
│ QQuery 52 │        108.02 / 109.07 ±0.61 / 109.70 ms │        109.70 / 110.83 ±0.93 / 112.49 ms │     no change │
│ QQuery 53 │        104.77 / 106.78 ±1.53 / 109.34 ms │        104.42 / 105.26 ±1.03 / 107.20 ms │     no change │
│ QQuery 54 │        149.74 / 152.58 ±2.14 / 155.31 ms │        149.02 / 152.15 ±2.74 / 156.93 ms │     no change │
│ QQuery 55 │        107.15 / 108.59 ±0.93 / 110.01 ms │        109.24 / 110.30 ±0.84 / 111.26 ms │     no change │
│ QQuery 56 │        144.27 / 145.94 ±0.93 / 147.04 ms │        141.69 / 144.44 ±1.92 / 147.60 ms │     no change │
│ QQuery 57 │        176.13 / 177.26 ±1.21 / 179.55 ms │        171.33 / 175.96 ±2.55 / 178.15 ms │     no change │
│ QQuery 58 │        297.35 / 305.73 ±6.59 / 312.91 ms │        301.00 / 307.69 ±4.14 / 311.48 ms │     no change │
│ QQuery 59 │        202.73 / 207.50 ±2.59 / 210.04 ms │        200.44 / 203.38 ±2.25 / 205.90 ms │     no change │
│ QQuery 60 │        148.18 / 149.44 ±0.97 / 150.79 ms │        144.94 / 147.44 ±1.37 / 149.08 ms │     no change │
│ QQuery 61 │        174.38 / 175.28 ±0.97 / 177.01 ms │        172.94 / 174.88 ±2.01 / 178.74 ms │     no change │
│ QQuery 62 │     942.74 / 1002.79 ±46.80 / 1065.36 ms │       918.41 / 950.93 ±27.70 / 989.31 ms │ +1.05x faster │
│ QQuery 63 │        106.95 / 111.21 ±3.03 / 116.01 ms │        106.70 / 111.65 ±4.35 / 117.93 ms │     no change │
│ QQuery 64 │        721.95 / 738.71 ±9.61 / 749.89 ms │        722.65 / 735.30 ±7.48 / 743.74 ms │     no change │
│ QQuery 65 │        259.05 / 264.95 ±4.19 / 271.90 ms │        271.66 / 276.88 ±3.63 / 282.44 ms │     no change │
│ QQuery 66 │        260.03 / 263.54 ±2.42 / 266.78 ms │        250.68 / 263.76 ±8.17 / 273.06 ms │     no change │
│ QQuery 67 │        322.26 / 326.28 ±3.35 / 330.55 ms │        334.80 / 342.43 ±7.28 / 354.38 ms │     no change │
│ QQuery 68 │        283.51 / 288.20 ±2.71 / 291.27 ms │        299.80 / 306.61 ±5.84 / 314.98 ms │  1.06x slower │
│ QQuery 69 │        104.38 / 106.19 ±1.32 / 108.35 ms │        107.13 / 107.97 ±0.72 / 109.07 ms │     no change │
│ QQuery 70 │        330.70 / 340.83 ±7.12 / 348.89 ms │        337.95 / 347.93 ±7.28 / 357.66 ms │     no change │
│ QQuery 71 │        133.36 / 135.65 ±1.83 / 138.94 ms │        136.45 / 137.62 ±1.86 / 141.32 ms │     no change │
│ QQuery 72 │       743.22 / 757.71 ±12.07 / 777.02 ms │        718.40 / 730.07 ±9.18 / 744.97 ms │     no change │
│ QQuery 73 │        108.33 / 109.43 ±0.86 / 110.82 ms │        107.11 / 109.96 ±1.97 / 112.76 ms │     no change │
│ QQuery 74 │       602.75 / 610.62 ±10.22 / 630.40 ms │       644.41 / 663.39 ±16.69 / 685.46 ms │  1.09x slower │
│ QQuery 75 │        272.68 / 280.46 ±4.68 / 286.93 ms │        277.88 / 281.94 ±2.86 / 285.28 ms │     no change │
│ QQuery 76 │        134.14 / 136.52 ±1.69 / 138.79 ms │        135.98 / 136.63 ±0.89 / 138.38 ms │     no change │
│ QQuery 77 │        186.91 / 190.65 ±2.01 / 192.90 ms │        190.52 / 192.27 ±1.12 / 193.79 ms │     no change │
│ QQuery 78 │        350.30 / 356.46 ±3.49 / 360.54 ms │        354.15 / 361.91 ±5.03 / 369.24 ms │     no change │
│ QQuery 79 │        233.64 / 240.09 ±8.84 / 257.27 ms │        240.29 / 246.00 ±9.04 / 263.98 ms │     no change │
│ QQuery 80 │        334.18 / 338.73 ±2.93 / 343.00 ms │        332.16 / 336.29 ±2.22 / 338.81 ms │     no change │
│ QQuery 81 │           28.24 / 29.11 ±0.60 / 29.84 ms │           27.34 / 28.52 ±0.95 / 29.99 ms │     no change │
│ QQuery 82 │        200.51 / 205.40 ±2.67 / 208.65 ms │        203.23 / 206.43 ±2.17 / 209.81 ms │     no change │
│ QQuery 83 │           41.52 / 43.39 ±1.52 / 45.99 ms │           40.55 / 42.22 ±1.01 / 43.33 ms │     no change │
│ QQuery 84 │           50.04 / 51.25 ±0.92 / 52.80 ms │           50.16 / 52.69 ±1.62 / 54.61 ms │     no change │
│ QQuery 85 │        152.03 / 153.56 ±0.99 / 154.75 ms │        152.28 / 154.98 ±2.61 / 158.92 ms │     no change │
│ QQuery 86 │           39.27 / 41.70 ±1.53 / 44.09 ms │           40.43 / 41.12 ±0.59 / 41.91 ms │     no change │
│ QQuery 87 │           88.27 / 91.77 ±3.68 / 98.84 ms │          93.27 / 98.25 ±2.90 / 101.90 ms │  1.07x slower │
│ QQuery 88 │        103.04 / 104.42 ±1.09 / 105.51 ms │        100.76 / 104.08 ±3.14 / 108.63 ms │     no change │
│ QQuery 89 │        119.82 / 122.12 ±1.35 / 123.45 ms │        118.80 / 120.69 ±0.99 / 121.55 ms │     no change │
│ QQuery 90 │           23.81 / 24.90 ±1.28 / 27.37 ms │           23.80 / 24.38 ±0.48 / 25.16 ms │     no change │
│ QQuery 91 │           66.37 / 68.51 ±1.59 / 70.75 ms │           65.52 / 67.07 ±1.22 / 68.59 ms │     no change │
│ QQuery 92 │           60.58 / 61.21 ±0.75 / 62.49 ms │           59.23 / 59.95 ±0.39 / 60.35 ms │     no change │
│ QQuery 93 │        201.86 / 205.80 ±2.40 / 209.41 ms │        199.63 / 201.77 ±2.30 / 206.14 ms │     no change │
│ QQuery 94 │           63.14 / 64.30 ±0.92 / 65.59 ms │           62.12 / 63.81 ±1.13 / 65.21 ms │     no change │
│ QQuery 95 │        138.32 / 140.31 ±1.40 / 142.61 ms │        138.62 / 139.48 ±0.87 / 141.16 ms │     no change │
│ QQuery 96 │           75.70 / 76.57 ±0.81 / 77.79 ms │           72.07 / 75.80 ±1.94 / 77.72 ms │     no change │
│ QQuery 97 │        133.84 / 135.63 ±2.00 / 139.30 ms │        136.41 / 139.88 ±1.91 / 141.54 ms │     no change │
│ QQuery 98 │        152.06 / 155.96 ±3.73 / 163.01 ms │        162.31 / 164.82 ±1.53 / 167.06 ms │  1.06x slower │
│ QQuery 99 │ 10848.57 / 10907.39 ±64.30 / 11030.00 ms │ 10855.52 / 10916.59 ±50.79 / 10993.97 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 34492.80ms │
│ Total Time (simplify_smj_full_opt)   │ 34535.90ms │
│ Average Time (HEAD)                  │   348.41ms │
│ Average Time (simplify_smj_full_opt) │   348.85ms │
│ Queries Faster                       │          5 │
│ Queries Slower                       │          6 │
│ Queries with No Change               │         88 │
│ Queries with Failure                 │          0 │
└──────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	172.8s
Peak memory	5.9 GiB
Avg memory	4.7 GiB
CPU user	276.7s
CPU sys	19.7s
Disk read	0 B
Disk write	702.9 MiB

tpcds — branch

Metric	Value
Wall time	173.0s
Peak memory	5.8 GiB
Avg memory	4.7 GiB
CPU user	277.8s
CPU sys	19.2s
Disk read	0 B
Disk write	148.0 KiB

File an issue against this benchmark runner

comphead · 2026-03-27T17:29:56Z

datafusion/core/tests/fuzz_cases/join_fuzz.rs

+                ),
+            ];
+
+            for batch_size in [2, 49, 100] {


can we use already existing set of batch sizes like in other tests?

batch_sizes: &[1, 2, 7, 49, 50, 51, 100],

The spill test is a bit slow already (it added about 20 seconds to what was already ~90 second test) because it's running with a memory limit and disk spilling across multiple join types and extra-column combos. The smaller set was chosen intentionally to keep the test run time reasonable. The spill test combinatorics (3 join types * 3 extra-column combos * N batch sizes) make the full set expensive, and these three cover small/medium/boundary cases. I can add them if we don't think these fuzz tests will become too expensive.

datafusion/physical-plan/src/joins/semi_anti_mark_sort_merge_join/stream.rs

datafusion/physical-plan/src/joins/sort_merge_join/filter.rs

datafusion/physical-plan/src/joins/sort_merge_join/stream.rs

…emiAntiMark) into the same SMJ folder.

mbutrovich · 2026-03-27T19:16:51Z

datafusion/physical-plan/src/joins/sort_merge_join/metrics.rs

    /// Calculated as sum of peak memory values across partitions
    peak_mem_used: Gauge,
-    /// Metrics related to spilling
-    spill_metrics: SpillMetrics,


Moved SpillMetrics construction from SortMergeJoinMetrics into exec.rs where the SpillManager is now built (shared across both streams). The metrics are still registered into the same ExecutionPlanMetricsSet and reported via metrics() — just constructed in a different place.

comphead

Thanks @mbutrovich numbers looks great, lets wait some time if anyone wants to take a look as well

mbutrovich added 11 commits March 25, 2026 16:43

Remove dead code.

974d305

More cleanup.

3de5c0c

move mark logic

a570c6c

move mark logic

e4fcdd1

add benchmark, optimize remaining smj stream

e66a44b

clean up, debug_asserts

640dddc

add a new test

f71e161

scale benchmark

cd799a6

Batch deferred filtering for outer joins with unique keys

d922b9b

add comments

14d9653

Merge branch 'main' into simplify_smj_full_opt

5779054

github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 26, 2026

clippy fix.

1c1bec5

mbutrovich requested review from comphead and rluvaton March 26, 2026 18:26

mbutrovich marked this pull request as ready for review March 26, 2026 18:29

mbutrovich mentioned this pull request Mar 26, 2026

[EPIC] Benchmark improvements #21165

Open

rluvaton reviewed Mar 26, 2026

View reviewed changes

datafusion/physical-plan/src/joins/sort_merge_join/filter.rs Outdated Show resolved Hide resolved

rluvaton reviewed Mar 26, 2026

View reviewed changes

datafusion/physical-plan/src/joins/sort_merge_join/filter.rs Show resolved Hide resolved

mbutrovich added 4 commits March 26, 2026 15:19

add clarifying comment

9fc21a0

remove booleans is_semi is_mark and just use JoinType enum.

66632ef

clean up redundant comment next to already-verbose unreachable! macro.

60127a7

clearer debug_assert messages

481753a

mbutrovich changed the title ~~perf: sort-merge join (SMJ) batch deferred filtering and move mark joins to specialized stream~~ perf: sort-merge join (SMJ) batch deferred filtering and move mark joins to specialized stream. Near-unique LEFT and FULL SMJ 20-50x faster, Mar 27, 2026

Merge branch 'main' into simplify_smj_full_opt

768529e