Q2_0 group 64: CUDA backend by khosravipasha · Pull Request #43 · PrismML-Eng/llama.cpp

khosravipasha · 2026-06-10T23:04:15Z

DRAFT PR for testing and review

Copilot

Pull request overview

Adds CUDA backend support for GGML_TYPE_Q2_0 (group size 64) across the main CUDA execution paths (MMVQ mat-vec, MMQ mat-mat, row extraction, and dequantization/conversion), plus build/template plumbing to instantiate the needed kernels.

Changes:

Implement Q2_0×Q8_1 CUDA dot product and wire it into the MMVQ (mul_mat_vec_q) dispatch path.
Add MMQ (mul_mat_q) support for Q2_0 via new tile loader, type traits, and template instantiation generation.
Enable Q2_0 for CUDA getrows + conversion/dequantization utilities, and mark relevant ops as supported by the CUDA backend.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
ggml/src/ggml-cuda/vecdotq.cuh	Adds `vec_dot_q2_0_q8_1` and VDR macros for Q2_0.
ggml/src/ggml-cuda/mmvq.cu	Routes Q2_0 through MMVQ vec-dot dispatch and type switch.
ggml/src/ggml-cuda/mmq.cuh	Adds Q2_0 MMQ tile loading and type trait wiring; updates q8_1 ds layout selection.
ggml/src/ggml-cuda/mmq.cu	Enables Q2_0 in MMQ type dispatch and MMQ usage heuristic.
ggml/src/ggml-cuda/template-instances/mmq-instance-q2_0.cu	New generated MMQ instantiation TU for Q2_0.
ggml/src/ggml-cuda/template-instances/generate_cu_files.py	Includes Q2_0 in the MMQ instantiation generation list.
ggml/src/ggml-cuda/ggml-cuda.cu	Marks Q2_0 as supported for relevant CUDA ops in capability checks.
ggml/src/ggml-cuda/getrows.cu	Adds Q2_0 case to CUDA get_rows dispatch using `dequantize_q2_0`.
ggml/src/ggml-cuda/dequantize.cuh	Introduces `dequantize_q2_0` for CUDA dequantization kernels.
ggml/src/ggml-cuda/convert.cu	Enables Q2_0 conversions to fp16/fp32 (contiguous + non-contiguous) via new dequantizer.
ggml/src/ggml-cuda/common.cuh	Adds CUDA type traits (qk/qr/qi) for GGML_TYPE_Q2_0.
ggml/src/ggml-cpu/arch-fallback.h	Adds arch-fallback alias for `ggml_vec_dot_q2_0_q8_0_generic` on x86.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    // Q2_0: 128 elements with ONE scale, 2 bits per element (4 elements per byte)
+    // Q8_1: 32 elements per block with individual scales
+    // iqs selects which of the 4 chunks of 32 elements to process (0-3)
+


khosravipasha requested a review from Copilot June 10, 2026 23:04

Copilot started reviewing on behalf of khosravipasha June 10, 2026 23:04 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/vecdotq.cuh

Comment on lines +728 to +731

// Q2_0: 128 elements with ONE scale, 2 bits per element (4 elements per byte)

// Q8_1: 32 elements per block with individual scales

// iqs selects which of the 4 chunks of 32 elements to process (0-3)

khosravipasha force-pushed the pr/q2_0-cpu branch from 7c6c628 to 0f07ba4 Compare June 11, 2026 00:08

khosravipasha force-pushed the pr/q2_0-cuda branch from 81997c2 to 500613a Compare June 11, 2026 00:08

khosravipasha force-pushed the pr/q2_0-cpu branch from 0f07ba4 to a69cff5 Compare June 11, 2026 00:28

khosravipasha force-pushed the pr/q2_0-cuda branch from 500613a to 126d285 Compare June 11, 2026 00:28

Q2_0 group 64: CUDA backend

5a300e4

khosravipasha force-pushed the pr/q2_0-cpu branch from a69cff5 to dc7c932 Compare June 11, 2026 00:37

khosravipasha force-pushed the pr/q2_0-cuda branch from 126d285 to 5a300e4 Compare June 11, 2026 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q2_0 group 64: CUDA backend#43

Q2_0 group 64: CUDA backend#43
khosravipasha wants to merge 1 commit into
pr/q2_0-cpufrom
pr/q2_0-cuda

khosravipasha commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

khosravipasha commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants