[spherical] Compression tweak and fix threadpool in benchmark#1137
[spherical] Compression tweak and fix threadpool in benchmark#1137arkrishn94 wants to merge 1 commit into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1137 +/- ##
==========================================
+ Coverage 89.36% 89.39% +0.02%
==========================================
Files 484 485 +1
Lines 91384 92102 +718
==========================================
+ Hits 81669 82335 +666
- Misses 9715 9767 +52
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR improves spherical quantization compression performance by optimizing the maximize_cosine_similarity sweep (precomputing and sorting critical values once, and avoiding per-step sqrt) and fixes the exhaustive spherical benchmark so compression uses a rayon threadpool sized by compression_threads instead of silently using the global pool.
Changes:
- Reworks
maximize_cosine_similarityto precompute all critical values into a flat buffer, sort once, then iterate linearly; also compares cosine similarity using squared quantities to avoidsqrt. - Updates exhaustive spherical benchmark compression to run
Store::newinside a dedicated Rayon threadpool sized bycompression_threads.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| diskann-quantization/src/spherical/quantizer.rs | Performance-focused rewrite of the critical-value sweep in maximize_cosine_similarity, including removing heap usage and avoiding sqrt. |
| diskann-benchmark/src/backend/exhaustive/spherical.rs | Ensures compression uses a Rayon threadpool sized by compression_threads (mirrors the search harness pattern). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let stop: usize = 1usize << (num_bits.get() - 1); | ||
| let visits_per_dim: usize = stop.max(2) - 1; | ||
| let total = v.len() * visits_per_dim; |
| // Sort critical values in ascending order so that walking the slice corresponds to | ||
| // sweeping `s` from `0` to `+inf`. `Pair`'s `Ord` impl is reversed so it's | ||
| // in ascending order. | ||
| crits.sort_unstable_by(|a, b| { | ||
| a.value | ||
| .partial_cmp(&b.value) | ||
| .unwrap_or(std::cmp::Ordering::Equal) | ||
| }); |
Some performance changes to
maximize_cosine_similaritysubroutine for spherical quantization (plus fix to the threadpool setup for exhasutive benchmark).maximize_cosine_similarity: replace the min-heap with a flat buffer of all critical values that is sorted once and walked linearly. The visit order is identical to before. Also drop thesqrtcomputation using some simple algebra.The recall is identical to
mainfor real datasets - 384 and 896 dimensional dense text embeddings.Store::newin arayon::ThreadPoolBuildersized by the configuredcompression_threads, mirroring the exhaustive-product harness. Without this the benchmark always silently used the global pool.Benchmarks
query_layouts = ["same_as_data"],transform_kind = "null",pre_scale = "none", recall@10/10.