Skip to content

NVIDIA/daqiri

Repository files navigation

DAQIRI - Data Acquisition for Integrated Real-time Instruments

DAQIRI

Send and receive Ethernet packets into CPU and GPU memory at hundreds of Gbps per GPU with a simple API.

DAQIRI (Data Acquisition for Integrated Real-time Instruments) connects data acquisition systems to NVIDIA GPUs for real-time processing and AI, paving the way for autonomy of the next generation of scientific and industrial instruments.

DAQIRI provides direct NIC hardware access in userspace, bypassing the Linux kernel network stack to achieve the highest possible throughput and lowest latency for Ethernet frame transmission and reception. It targets NVIDIA ConnectX-6 Dx and later NICs and supports GPU direct memory access (GPUDirect) for zero-copy data paths between the NIC and GPU.

📖Docs & Website: nvidia.github.io/daqiri
Peak performance requires an NVIDIA SmartNIC (ConnectX-6 Dx or later) and a GPUDirect-capable NVIDIA GPU
🖥️Supported hardware: NVIDIA DGX Spark, NVIDIA IGX, and NVIDIA RTX Pro Servers
🔌Works with any NIC and NVIDIA GPU via DAQIRI's built-in Linux Sockets engine
🚀Getting Started: nvidia.github.io/daqiri/getting-started

Table of Contents

Features

  • High Throughput — Sustained line rate with proper hardware and tuning.
  • Low Latency — Direct access to NIC ring buffers; most latency is PCIe transit only.
  • GPUDirect — Receive data directly into GPU memory via two modes:
    • Header-Data Split: Headers to CPU, payload to GPU (recommended for most workloads).
    • Batched GPU: Entire packets to GPU memory (maximum bandwidth, GPU-side parsing required).
  • Burst file writes — Write received bursts as raw packet files or appendable PCAP captures. Host-backed buffers use POSIX writes; CUDA device-backed buffers can use cuFile/GDS.
  • S3 raw object writes — Optionally upload raw burst packets to Amazon S3 or an S3-compatible object store through the AWS SDK for C++.
  • Flow Steering — Configure the NIC's hardware flow engine to route packets by UDP source/destination port or flex-item payload fields. Raw RX flows can be configured statically in YAML or added/deleted dynamically after daqiri_init(). Per RX interface, use standard UDP/IP flows or flex-item flows, not both. Raw DPDK and raw ibverbs flows can also use hardware-only VLAN push/pop and VXLAN, GRE, or NVGRE encap/decap actions; socket/RDMA streams reject those tunnel actions.
  • RDMA — RDMA verbs (READ, WRITE, SEND) over RoCE on Ethernet NICs or InfiniBand.
  • Linux socket control — TCP/UDP socket streams expose connection IDs and socket_setsockopt() for native Linux setsockopt tuning without YAML option name mappings.
  • Optional OpenTelemetry metrics — Expose per-interface or per-queue packet, byte, and drop counters when built with DAQIRI_ENABLE_OTEL_METRICS=ON.

Benchmarking

Consult the Benchmarking overview to learn more about generating and optimizing benchmarking on the NVIDIA platform, including:

DGX Spark Result Summary

Stream / Protocol Best case Throughput Drops Notes
Raw Ethernet / GPUDirect 4 KB packet 105.5 ±0.9 Gb/s 0 98.5 Gb/s single-queue at the 8 KB native shape
Socket / RoCE (SEND) 8 MB message 102.2 ±0.3 Gb/s 0 Single QP, batch 1
Socket / TCP 8 KB × 4 pairs 97.2 ±2.8 Gb/s ~0 Flow-controlled (App TX = App RX)
Socket / UDP 8 KB × 4 pairs 29.8 ±0.2 Gb/s ~51% loss Receiver goodput; unpaced sender

Each transport at its best-case operation size on a single DGX Spark (GB10), driven over a physical cabled loopback on one ConnectX-7. Full methodology and per-transport breakdowns at Performance: DGX Spark. These tests were run using a 200G cable, which allowed transfers to reach PCIe limitations slightly over 100Gbps.

Documentation

Reference material for the DAQIRI codebase:

  • Getting Started — System requirements, build/install instructions, and CMake options
  • Concepts — Glossary of DAQIRI terminology (kernel bypass, GPUDirect, packet/burst/segment, flow/queue, memory region, zero-copy ownership, RX reorder). Meant to be opened in parallel with the rest of the docs.
  • API Guide — Six-step DAQIRI application lifecycle and configuration-first model
  • Configuration YAML Reference — Full YAML config reference for all engines
  • C++ API Usage — C++ RX/TX workflows, buffer lifecycle, file writing, utilities, and status codes
  • Python API Usage — Python bindings, workflow examples, enums, config classes, and helper functions
  • Performance: DGX Spark — Per-platform throughput, drop, and utilization numbers for stream/protocol combinations on DGX Spark
  • Contributing — Contribution guidelines, coding standards, DCO sign-off

Tutorials

Step-by-step walkthroughs to get hands-on:

License

Apache 2.0 — see LICENSE for details.