Conversation
6d390d6 to
35ddb9f
Compare
Lossy quantization for vector data (e.g., embeddings) based on TurboQuant (https://arxiv.org/abs/2504.19874). Supports both MSE-optimal and inner-product-optimal (Prod with QJL correction) variants at 1-8 bits per coordinate. Key components: - Single TurboQuant array encoding with optional QJL correction fields, storing quantized codes, norms, centroids, and rotation signs as children. - Structured Random Hadamard Transform (SRHT) for O(d log d) rotation, fully self-contained with no external linear algebra library. - Max-Lloyd centroid computation on Beta(d/2, d/2) distribution. - Approximate cosine similarity and dot product compute directly on quantized arrays without full decompression. - Pluggable TurboQuantScheme for BtrBlocks, exposed via WriteStrategyBuilder::with_vector_quantization(). - Benchmarks covering common embedding dimensions (128, 768, 1024, 1536). Also refactors CompressingStrategy to a single constructor, and adds vortex_tensor::initialize() for session registration of tensor types, encodings, and scalar functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Will Manning <will@willmanning.io> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
We are going to implement this later as a separate encoding (if we decide to implement it at all because word on the street is that the MSE + QJL is not actually better than MSE on its own). Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
It doesn't really make a lot of sense for us to define this as an encoding for `FixedSizeList`. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
- Use ExecutionCtx in TurboQuant compress path and import ExecutionCtx - Extend dtype imports with Nullability and PType to support extension types - Wire in extension utilities: extension_element_ptype and extension_list_size for vector extensions - Remove dimension and bit_width from slice/take compute calls to rely on metadata - Update TurboQuant mod docs to mention VortexSessionExecute - Change scheme.compress to use the provided compressor argument (not _compressor) - Add an extensive TurboQuant test suite (roundtrip, MSE bounds, edge cases, f64 input, serde roundtrip, and dtype checks) - Align vtable imports to new metadata handling (remove unused DeserializeMetadata/SerializeMetadata references) Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
fb6bbcf to
3d7dfed
Compare
Merging this PR will degrade performance by 29.05%
Performance Changes
Comparing Footnotes
|
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
|
I would say this is ready for review now, but only with respect to the structure. I have yet to go through the implementation and make sure things make sense, but I have made it so the structure makes sense and we correctly handle the different floating point type inputs as well as null vectors. I think it would be good to get a review now, and maybe we should just merge this and iterate later. |
|
I'll rebase and fix the errors tomorrow |
Continuation of #7167, authored by @lwwmanning
Summary
Lossy quantization for vector data (e.g., embeddings) based on TurboQuant (https://arxiv.org/abs/2504.19874). Supports both MSE-optimal and inner-product-optimal (Prod with QJL correction) variants at 1-8 bits per coordinate.
Key components:
Also refactors CompressingStrategy to a single constructor, and adds vortex_tensor::initialize() for session registration of tensor types, encodings, and scalar functions.
API Changes
Adds a new
TurboQuantencoding + some other things. TODOTesting
TODO