Skip to content

virtio-blk: add direct write mode#5910

Open
bacarrdy wants to merge 2 commits into
firecracker-microvm:mainfrom
bacarrdy:virtio-blk-direct-write
Open

virtio-blk: add direct write mode#5910
bacarrdy wants to merge 2 commits into
firecracker-microvm:mainfrom
bacarrdy:virtio-blk-direct-write

Conversation

@bacarrdy

@bacarrdy bacarrdy commented May 22, 2026

Copy link
Copy Markdown

Changes

  • Add optional direct_write to virtio-block drive configuration.
  • Open a second host file descriptor with O_DIRECT when direct_write is enabled.
  • Route aligned guest writes through the direct descriptor while keeping reads buffered.
  • Fall back to buffered writes for unaligned offset, length, or guest buffer cases.
  • Persist the setting across snapshot state and reject it for vhost-user block configs.
  • Update API schema, block I/O documentation, changelog, and tests.

Reason

Some deployments use local NVMe-backed storage, including file-backed or LVM-thin layouts, where the regular buffered write path can cap write throughput well below the backing device capability.

Local testing showed sequential guest writes around 500-600 MB/s on such a setup, while direct host writes reached the expected NVMe-backed throughput. This keeps the behavior opt-in: reads remain buffered, and unaligned writes automatically use the existing buffered path.

Validation

  • Ran tools/devtool checkstyle locally: passed.
  • Ran tools/devtool checkbuild --all locally: passed for x86_64 and aarch64.
  • Tested the behavior in a local Firecracker deployment with a write-heavy fio workload on local NVMe-backed storage.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@bacarrdy bacarrdy requested a review from Manciukic as a code owner May 22, 2026 18:43
Copilot AI review requested due to automatic review settings May 22, 2026 18:43
@bacarrdy bacarrdy requested a review from micz010 as a code owner May 22, 2026 18:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces an optional direct_write mode for virtio-block devices, enabling host direct I/O for aligned guest write requests while keeping reads buffered.

Changes:

  • Added direct_write to block device configs, API schema, docs, and changelog.
  • Implemented dual file descriptor support in virtio-block file engines (buffered + O_DIRECT) with alignment gating.
  • Extended snapshot/persisted state and updated tests to cover new config wiring.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/vmm/tests/integration_tests.rs Updates test config literals to include direct_write.
src/vmm/src/vmm_config/drive.rs Adds direct_write to BlockDeviceConfig and updates config-related tests.
src/vmm/src/resources.rs Updates resource tests to include direct_write.
src/vmm/src/devices/virtio/block/virtio/test_utils.rs Sets direct_write default in block test helper config.
src/vmm/src/devices/virtio/block/virtio/persist.rs Persists direct_write with backward-compatible serde default.
src/vmm/src/devices/virtio/block/virtio/io/sync_io.rs Adds optional direct FD selection logic for eligible writes in sync engine.
src/vmm/src/devices/virtio/block/virtio/io/mod.rs Defines eligibility rules/constants, plumbs optional direct FD through engines, adds unit tests.
src/vmm/src/devices/virtio/block/virtio/io/async_io.rs Registers optional direct FD in io_uring and selects it for eligible writes.
src/vmm/src/devices/virtio/block/virtio/device.rs Wires direct_write config into disk properties and file opening (O_DIRECT).
src/vmm/src/devices/virtio/block/vhost_user/device.rs Ensures direct_write is omitted for vhost-user-block configs and updates tests.
src/vmm/src/device_manager/mod.rs Updates device manager tests to include direct_write.
src/vmm/src/builder.rs Updates builder tests to include direct_write.
src/firecracker/swagger/firecracker.yaml Documents new direct_write field in API schema with default false.
src/firecracker/src/api_server/request/drive.rs Extends API request tests to include direct_write.
src/firecracker/src/api_server/parsed_request.rs Extends parsed request test payload to include direct_write.
docs/api_requests/block-io-engine.md Documents direct_write usage and behavior.
CHANGELOG.md Notes new direct_write support in Added section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/vmm/src/devices/virtio/block/virtio/io/async_io.rs Outdated
Comment thread docs/api_requests/block-io-engine.md Outdated
Comment thread src/vmm/src/devices/virtio/block/virtio/io/sync_io.rs
Comment thread src/vmm/src/devices/virtio/block/virtio/device.rs Outdated
@JackThomson2

Copy link
Copy Markdown
Contributor

Hi @bacarrdy can we please get a description of your changes and why you are making them? Currently it's unclear your use-case.

@bacarrdy bacarrdy force-pushed the virtio-blk-direct-write branch from e6bb465 to 056ccae Compare May 27, 2026 16:17
@bacarrdy

Copy link
Copy Markdown
Author

Hi @JackThomson2, updated the PR description with the changes, motivation, validation, and checklist.

The use-case is local NVMe-backed deployments where the regular buffered write path can be significantly slower than the backing device capability. In our setup with local NVMe-backed storage and LVM-thin/file-
backed images, guest sequential writes were around 500-600 MB/s through the regular path, while direct host writes reached the expected NVMe-backed throughput.

This PR keeps the behavior opt-in. Reads remain buffered, and writes only use the direct path when the request is aligned; otherwise they fall back to the existing buffered path.

I also rebased the branch on current main and reran local validation:

  • tools/devtool checkstyle: passed
  • tools/devtool checkbuild --all: passed for x86_64 and aarch64

@bacarrdy bacarrdy force-pushed the virtio-blk-direct-write branch from 056ccae to 10b5ed1 Compare May 27, 2026 16:59
@JackThomson2

Copy link
Copy Markdown
Contributor

Hi,

Thanks for opening the PR, I've been talking with the team and the idea of opening the FD twice is an interesting solution it's not something we'd want to have at the moment. We were wondering if VFIO may be an option for you once that lands if block throughput is key for you. Another option is full Read and Write O_DIRECT, that is something we would consider but I understand this will heavily affect read performance.

bacarrdy added 2 commits June 7, 2026 22:37
Add an optional direct_write mode for block devices.

Aligned guest writes use an O_DIRECT host file descriptor.

Reads stay buffered and unaligned writes use the regular path.

Keep direct writes opt-in for selected storage backends.

This avoids changing the default buffered path.

Signed-off-by: Jonas Savulionis <jonas@esnet.lt>
Signed-off-by: Jonas Savulionis <jonas@esnet.lt>
@bacarrdy bacarrdy force-pushed the virtio-blk-direct-write branch from 73f2b99 to 3ea3dcd Compare June 7, 2026 19:41
@bacarrdy

bacarrdy commented Jun 8, 2026

Copy link
Copy Markdown
Author

Hi,

Thanks for discussing this with the team.

For my use case, the current approach is intentional. I rely on a file/LVM-thin backed storage model where the host still owns the block-device lifecycle: fast cloning from templates, host-side snapshots/backups, restore, resize/rewrite/delete, discard/fstrim, and Firecracker virtio-blk rate limiting.

The reason I opened the backing file twice is that I need direct I/O for aligned writes, but I also need reads to stay buffered. If reads also use O_DIRECT, I lose the benefit of the host page cache for boot, repeated reads, package installs, and other read-heavy paths. Full read/write O_DIRECT is therefore not equivalent for this workload.

VFIO may be interesting in the future, and I can evaluate it separately in a development environment once it is usable for this path. But right now it does not look like a direct replacement for this PR, because it moves the storage path toward device assignment while this use case depends on keeping the existing file/LVM-thin backed lifecycle and Firecracker virtio-blk behavior.

Opening the file twice lets aligned writes avoid the page cache, while reads keep the current buffered behavior. Unaligned writes fall back to the normal path, and the feature is opt-in, so existing behavior remains unchanged unless explicitly enabled.

If this approach is not something Firecracker wants upstream at the moment, I understand. In that case I will have to keep carrying it as an out-of-tree patch, because without this behavior I lose properties that are required for this workload.

If useful, I can also benchmark and share numbers comparing:

  1. current buffered I/O,
  2. this direct-write-only approach,
  3. full read/write O_DIRECT,
  4. VFIO, once the implementation is ready enough to test for this use case.

@bacarrdy

bacarrdy commented Jun 8, 2026

Copy link
Copy Markdown
Author

A small follow-up after looking more closely at the current VFIO work in #5870.

VFIO is definitely interesting, but from what I can see it is not an equivalent replacement for this use case. The current VFIO path assigns a physical PCI device to the guest. That is a good fit for a dedicated device/performance-oriented setup, but it loses or makes much harder several properties I need from the virtio-blk/file-backed path:

  • host-side snapshots/backups/restore of the backing storage;
  • fast thin clones from templates;
  • simple host-side resize/rewrite/delete lifecycle;
  • keeping Firecracker virtio-blk rate limiting semantics;
  • using discard/fstrim with the same file/LVM-thin storage model;
  • keeping memory-management features such as ballooning/memory hotplug, which the current VFIO implementation treats as incompatible.

Those are not all strictly required for a single dedicated high-throughput device, but they are important properties for this storage model. VFIO would move the design toward device assignment, while this PR keeps the existing virtio-blk lifecycle and only changes the write path when the request is eligible.

So after looking at VFIO more carefully, I still do not think it replaces this PR. Full read/write O_DIRECT is also not equivalent because it gives up buffered reads. The direct-write-only approach is the narrow behavior I am trying to add: aligned writes can bypass the host page cache, while reads and ineligible writes keep the existing behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants