Track superseded mempool errors separately#4385
Conversation
a798fc3 to
9905e6e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
When `Mempools::execute()` runs mempools in parallel, errors from mempools
whose results were discarded after another mempool succeeded were still
recorded against `driver_mempool_submission`, biasing the per-mempool
success ratio with timing-dependent shadowed failures.
Replace `select_ok` with `FuturesUnordered` + manual loop so observation
runs in the consuming context. Errors that occur before another mempool
succeeds are now recorded under a new `Superseded` label via
`observe::mempool_superseded`, which also records the winning mempool in
the trace fields. Errors in the all-failed case keep their existing
labels (Revert / Expired / Other / Disabled).
Alert query update needed when deploying:
sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result="Success"}[2h]))
/
sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result!~"Disabled|Superseded"}[2h])) < 0.6
`mempool_executed` took a `Result<&SubmissionSuccess, &mempools::Error>` and re-matched the same discriminant several times to pick the log level, metric label, and block-passed labels. Replace it with two functions, `mempool_succeeded(&SubmissionSuccess)` and `mempool_failed(&mempools::Error)`, so each branch is straight-line and call sites pick the correct observer directly. Behavior and emitted metrics are unchanged.
9905e6e to
d9fb0cb
Compare
fleupold
left a comment
There was a problem hiding this comment.
Is there a reason you are not using the PR template for the description?
I agree with the change, however I'd like to suggest that we interpret "superseeded" events as success wrt. how you envision to change the metric. A superseeded submission should be considered a successful one.
This way we receive N (# of mempool) events in the happy case, and N events in the failure case allowing us to keep our alert metric as a ratio of successful to failed ones (otherwise failed events would be weighted N times more than successful ones).
Every loser in a mempool race is now marked Superseded, whether it failed before the winner finished or was still in flight when the winner landed. The old code only labelled already-failed losers as superseded and quietly dropped ones still in flight; the shadowed_errors accumulator that carried their errors across is gone. Minor cleanup: - Error::blocks_passed on the domain type returns the block delta from submission to the terminal event for variants that carry block-level timing. This replaces the inline match in mempool_failed. - error_label is shared between mempool_failed and the per-attempt counter so the Prometheus labels stay in sync. The all-failed path also swaps the expect for an explicit Error::Other fallback instead of panicking on the (currently unreachable) empty-errors case.
Apologies, I was in a rush and didn't account for that. I've updated the description to match the template.
Agree. I've adjusted the suggested metric. |
There was a problem hiding this comment.
Code Review
This pull request refactors the mempool execution logic to use FuturesUnordered, enabling more detailed tracking of success, failure, and superseded states. It also adds a blocks_passed method to the Error enum for improved block-level timing metrics. A high-severity logic error was identified where disabled mempools are incorrectly reported as 'Superseded' if another mempool wins the race, which would artificially inflate success rate metrics. A correction was suggested to preserve the 'Disabled' status during the racing process.
| .partition(|mempool| !self.is_disabled(mempool, settlement)); | ||
|
|
||
| for mempool in &disabled { | ||
| observe::mempool_failed(mempool, settlement, &Error::Disabled); |
There was a problem hiding this comment.
this + the required changes; IMO disabled is not failed
| observe::mempool_failed(mempool, settlement, &Error::Disabled); | |
| observe::mempool_disabled(mempool, settlement); |
Disabled is a configuration skip, not a submission failure. Split it into its own observer so failure-rate metrics aren't polluted.
Co-authored-by: José Duarte <15343819+jmg-duarte@users.noreply.github.com>
This reverts commit 013e884.
Destructure the inner `&Mempool` in the future builder so pending futures no longer borrow `enabled`, freeing it for mutation. Pop the winner with `swap_remove(idx)` and iterate the remainder to record superseded observations.
| let label = err.metric_label(); | ||
| metrics::get() | ||
| .mempool_submission | ||
| .with_label_values(&[mempool.to_string().as_str(), result]) | ||
| .with_label_values(&[mempool.to_string().as_str(), label]) |
There was a problem hiding this comment.
nit
| let label = err.metric_label(); | |
| metrics::get() | |
| .mempool_submission | |
| .with_label_values(&[mempool.to_string().as_str(), result]) | |
| .with_label_values(&[mempool.to_string().as_str(), label]) | |
| metrics::get() | |
| .mempool_submission | |
| .with_label_values(&[mempool.to_string().as_str(), err.metric_label()]) |
| mempools::Error::Disabled => { | ||
| tracing::debug!( | ||
| %mempool, | ||
| "sending transaction via mempool disabled", |
There was a problem hiding this comment.
this log doesnt make a lot of sense
| "sending transaction via mempool disabled", | |
| "mempool disabled, not sending transaction", |
MartinquaXD
left a comment
There was a problem hiding this comment.
The idea to fix the metrics makes sense to me but the logic seems more complicated than necessary.
Wouldn't it also work to construct a hashmap mapping all mempools to a submission outcome?
- initialize all mempools as superseeded
- run new
FuturesUnorderedlogic - match on each
submitresult and update the hashmap accordingly
At the end all submission futures that finished will have updated their own entry in the mapping and all futures that didn't finish were cancelled because some other pool was successful first so the original superseeded label is correct.
|
I agree this got more complex than it should. (some of which came from requirements shifting along the way). I'll try to propose a simpler alternative.
If I'm reading this right, I think this arrangement still leaves us with a possible final state of The wrinkle is that we have to account for the |
Description
Mempools::execute()runs all configured mempools concurrently and returns the first one that succeeds. Previously, errors from mempools that lost the race were counted as real failures, even though the overall submission was successful. Dropped mempools were never recorded.This skewed mempool counts by both:
This PR keeps the racing behavior but changes how observation works.
Changes
Behavior
mempool_succeededfor the winner andmempool_supersededfor every other configured mempool.mempool_failedfor each one of them.Code specific
select_okwithFuturesUnorderedinMempools::executeso the consumer can observe each completion (crates/driver/src/domain/mempools.rs).observe::mempool_executedintomempool_succeeded(&SubmissionSuccess)andmempool_failed(&mempools::Error), dropping theResult<&S, &E>indirection now that each call site already knows which branch it is on. Behavior and emitted metrics are unchanged by the split.mempool_superseded(&Mempool, winner: &Mempool, &Settlement)which incrementsdriver_mempool_submissionwithresult="Superseded".How to test
Existing driver unit tests cover the race semantics; this PR does not change the externally observable submission outcome, only how observation is sequenced and labeled. To verify manually:
result="Success"increment for the winner and oneresult="Superseded"increment for the loser; noRevert/Expired/Otherfrom the loser.Supersededfailure label.Alert query update needed when deploying
Per-mempool success counts both wins and races-lost (so happy and failure paths both emit N events for N configured mempools, keeping the ratio symmetric).
Supersededstays as a separate label so dashboards can still distinguish wins from race-losses per mempool.