Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
-
Updated
Mar 9, 2026 - Python
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
Accelerating MoE with IO and Tile-aware Optimizations
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepEP: an efficient expert-parallel communication library
Add a description, image, and links to the cross-ecosystem-custom-op-library topic page so that developers can more easily learn about it.
To associate your repository with the cross-ecosystem-custom-op-library topic, visit your repo's landing page and select "manage topics."