virtio-blk: add direct write mode#5910
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces an optional direct_write mode for virtio-block devices, enabling host direct I/O for aligned guest write requests while keeping reads buffered.
Changes:
- Added
direct_writeto block device configs, API schema, docs, and changelog. - Implemented dual file descriptor support in virtio-block file engines (buffered + O_DIRECT) with alignment gating.
- Extended snapshot/persisted state and updated tests to cover new config wiring.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/vmm/tests/integration_tests.rs | Updates test config literals to include direct_write. |
| src/vmm/src/vmm_config/drive.rs | Adds direct_write to BlockDeviceConfig and updates config-related tests. |
| src/vmm/src/resources.rs | Updates resource tests to include direct_write. |
| src/vmm/src/devices/virtio/block/virtio/test_utils.rs | Sets direct_write default in block test helper config. |
| src/vmm/src/devices/virtio/block/virtio/persist.rs | Persists direct_write with backward-compatible serde default. |
| src/vmm/src/devices/virtio/block/virtio/io/sync_io.rs | Adds optional direct FD selection logic for eligible writes in sync engine. |
| src/vmm/src/devices/virtio/block/virtio/io/mod.rs | Defines eligibility rules/constants, plumbs optional direct FD through engines, adds unit tests. |
| src/vmm/src/devices/virtio/block/virtio/io/async_io.rs | Registers optional direct FD in io_uring and selects it for eligible writes. |
| src/vmm/src/devices/virtio/block/virtio/device.rs | Wires direct_write config into disk properties and file opening (O_DIRECT). |
| src/vmm/src/devices/virtio/block/vhost_user/device.rs | Ensures direct_write is omitted for vhost-user-block configs and updates tests. |
| src/vmm/src/device_manager/mod.rs | Updates device manager tests to include direct_write. |
| src/vmm/src/builder.rs | Updates builder tests to include direct_write. |
| src/firecracker/swagger/firecracker.yaml | Documents new direct_write field in API schema with default false. |
| src/firecracker/src/api_server/request/drive.rs | Extends API request tests to include direct_write. |
| src/firecracker/src/api_server/parsed_request.rs | Extends parsed request test payload to include direct_write. |
| docs/api_requests/block-io-engine.md | Documents direct_write usage and behavior. |
| CHANGELOG.md | Notes new direct_write support in Added section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @bacarrdy can we please get a description of your changes and why you are making them? Currently it's unclear your use-case. |
e6bb465 to
056ccae
Compare
|
Hi @JackThomson2, updated the PR description with the changes, motivation, validation, and checklist. The use-case is local NVMe-backed deployments where the regular buffered write path can be significantly slower than the backing device capability. In our setup with local NVMe-backed storage and LVM-thin/file- This PR keeps the behavior opt-in. Reads remain buffered, and writes only use the direct path when the request is aligned; otherwise they fall back to the existing buffered path. I also rebased the branch on current
|
056ccae to
10b5ed1
Compare
|
Hi, Thanks for opening the PR, I've been talking with the team and the idea of opening the FD twice is an interesting solution it's not something we'd want to have at the moment. We were wondering if VFIO may be an option for you once that lands if block throughput is key for you. Another option is full Read and Write O_DIRECT, that is something we would consider but I understand this will heavily affect read performance. |
Add an optional direct_write mode for block devices. Aligned guest writes use an O_DIRECT host file descriptor. Reads stay buffered and unaligned writes use the regular path. Keep direct writes opt-in for selected storage backends. This avoids changing the default buffered path. Signed-off-by: Jonas Savulionis <jonas@esnet.lt>
Signed-off-by: Jonas Savulionis <jonas@esnet.lt>
73f2b99 to
3ea3dcd
Compare
|
Hi, Thanks for discussing this with the team. For my use case, the current approach is intentional. I rely on a file/LVM-thin backed storage model where the host still owns the block-device lifecycle: fast cloning from templates, host-side snapshots/backups, restore, resize/rewrite/delete, discard/fstrim, and Firecracker virtio-blk rate limiting. The reason I opened the backing file twice is that I need direct I/O for aligned writes, but I also need reads to stay buffered. If reads also use O_DIRECT, I lose the benefit of the host page cache for boot, repeated reads, package installs, and other read-heavy paths. Full read/write O_DIRECT is therefore not equivalent for this workload. VFIO may be interesting in the future, and I can evaluate it separately in a development environment once it is usable for this path. But right now it does not look like a direct replacement for this PR, because it moves the storage path toward device assignment while this use case depends on keeping the existing file/LVM-thin backed lifecycle and Firecracker virtio-blk behavior. Opening the file twice lets aligned writes avoid the page cache, while reads keep the current buffered behavior. Unaligned writes fall back to the normal path, and the feature is opt-in, so existing behavior remains unchanged unless explicitly enabled. If this approach is not something Firecracker wants upstream at the moment, I understand. In that case I will have to keep carrying it as an out-of-tree patch, because without this behavior I lose properties that are required for this workload. If useful, I can also benchmark and share numbers comparing:
|
|
A small follow-up after looking more closely at the current VFIO work in #5870. VFIO is definitely interesting, but from what I can see it is not an equivalent replacement for this use case. The current VFIO path assigns a physical PCI device to the guest. That is a good fit for a dedicated device/performance-oriented setup, but it loses or makes much harder several properties I need from the virtio-blk/file-backed path:
Those are not all strictly required for a single dedicated high-throughput device, but they are important properties for this storage model. VFIO would move the design toward device assignment, while this PR keeps the existing virtio-blk lifecycle and only changes the write path when the request is eligible. So after looking at VFIO more carefully, I still do not think it replaces this PR. Full read/write |
Changes
direct_writeto virtio-block drive configuration.O_DIRECTwhendirect_writeis enabled.Reason
Some deployments use local NVMe-backed storage, including file-backed or LVM-thin layouts, where the regular buffered write path can cap write throughput well below the backing device capability.
Local testing showed sequential guest writes around 500-600 MB/s on such a setup, while direct host writes reached the expected NVMe-backed throughput. This keeps the behavior opt-in: reads remain buffered, and unaligned writes automatically use the existing buffered path.
Validation
tools/devtool checkstylelocally: passed.tools/devtool checkbuild --alllocally: passed for x86_64 and aarch64.fioworkload on local NVMe-backed storage.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.