#207: Fix TX mempool exhaustion deadlock in tx_core_worker#208
#207: Fix TX mempool exhaustion deadlock in tx_core_worker#208lobolanja wants to merge 3 commits into
Conversation
Poll NIC TX completions via rte_eth_tx_descriptor_status() in the tx_core_worker idle path (empty ring). This triggers mlx5_tx_handle_completion() as a side effect, reclaiming mbufs so is_tx_burst_available() unblocks. Signed-off-by: Juanca-Lobolanja <lobolanja@gmail.com>
Add mlx5_tx_done_cleanup() to the mlx5 PMD (DPDK 25.11) as a thin wrapper around mlx5_tx_handle_completion(). Registers the callback in mlx5_dev_ops and mlx5_dev_ops_isolate so rte_eth_tx_done_cleanup() no longer returns -ENOTSUP. Signed-off-by: Juanca-Lobolanja <lobolanja@gmail.com>
Replace the rte_eth_tx_descriptor_status() side-effect workaround with the proper rte_eth_tx_done_cleanup() API, now that the mlx5 patch provides the callback. Signed-off-by: Juanca-Lobolanja <lobolanja@gmail.com>
|
| Filename | Overview |
|---|---|
| src/engines/dpdk/daqiri_dpdk_engine.cpp | Adds rte_eth_tx_done_cleanup() call in the tx_core_worker idle path to reclaim TX mbufs back to the pool when no packets are queued; fix is correctly scoped to the DpdkEngine implementation with no API or vtable changes. |
Reviews (1): Last reviewed commit: "#207 - Use rte_eth_tx_done_cleanup for i..." | Re-trigger Greptile
|
Hi @lobolanja, this appears to be a patch for DPDK. Should this be submitted to that project? |
|
Hi @cliffburdick, Yes, the DPDK patch (mlx5_tx_done_cleanup) is intended for upstream submission as well. I started here because I wanted to validate the approach with the DAQIRI team first: I wasn't fully confident this is a genuine gap rather than a misuse on mi side (e.g., maybe there's a different intended pattern for idle-path mbuf reclaim that I'm missing). I'm also happy to submit the mlx5 patch upstream to DPDK via their mailing list process, though that's a separate and longer cycle. In the meantime, carrying the patch in daqiri's dpdk_patches/ keeps things working without waiting on upstream. If the fix looks correct to you, I'll go ahead and open a patch on dpdk.org in parallel. |
|
Hi @lobolanja, we're trying to migrate most users to our ibverbs backend since it performs better than the DPDK backend. This means we're also trying to deprecate the patches. I don't know enough about the mlx5 tx freeing to know whether this is needed, so perhaps you could submit it there and see if it gets accepted. In my experience things get reviewed within a few days of you submitting the patches, and it would at least be on the dev branch of DPDK. |
|
That makes sense, thanks for the context on the ibverbs migration. In that case, would you be okay with keeping just commit 9f62fe2 (the rte_eth_tx_descriptor_status workaround)? It fixes the deadlock without adding any DPDK patch to maintain. I can rebase the PR to drop commits 2 and 3. In parallel, I'll submit the tx_done_cleanup patch upstream to DPDK. Once it lands there, the workaround here can be swapped for the proper API in a follow-up, or it may become a non-issue if the engine moves to ibverbs by then. |
|
Yes, I think that's reasonable |
Fixes #207
Problem
tx_core_workernever polls NIC TX completions while idling (empty ring), so mbufs are never reclaimed andis_tx_burst_available()blocks the producer permanently. Throughput collapses from >10 Mpps to ~2 Kpps.Fix
Poll completions in the
tx_core_workeridle path so mbufs are returned to the pool even when no new work is queued.Why the DPDK patch is needed
DPDK provides
rte_eth_tx_done_cleanup()as the standard API for explicitly reclaiming completed TX mbufs without sending new packets. However, the mlx5 PMD in DPDK 25.11 does not implement thetx_done_cleanupcallback — calling it returns-ENOTSUP.The only ways to trigger mlx5 completion processing (
mlx5_tx_handle_completion) are:rte_eth_tx_burst()rte_eth_tx_descriptor_status()mlx5_tx_handle_completion()as a side effect, but that's not the function's contract — it is meant to report descriptor state, not reclaim mbufs. Future DPDK versions could change this behavior.rte_eth_tx_done_cleanup()-ENOTSUPon mlx5The patch adds a trivial
mlx5_tx_done_cleanup()(~20 lines) that wraps the existingmlx5_tx_handle_completion()— no new completion logic is introduced. It simply exposes the internal function through the official DPDK callback sorte_eth_tx_done_cleanup()works as intended.Without this patch, the only alternative is the
rte_eth_tx_descriptor_statusside-effect workaround (commit 1), which is fragile and semantically incorrect.Commits
9f62fe2rte_eth_tx_descriptor_status(port, queue, 0)in idle path. On mlx5 this triggersmlx5_tx_handle_completion()as a side effect.19f5329mlx5_tx_done_cleanup()to mlx5 PMD as a wrapper aroundmlx5_tx_handle_completion. Registered inmlx5_dev_opsandmlx5_dev_ops_isolate.a7354dcrte_eth_tx_done_cleanup(port, queue, 0).Reproduction
See the self-contained test script in issue #207.
Testing