Track superseded mempool errors separately by tilacog · Pull Request #4385 · cowprotocol/services

tilacog · 2026-05-05T19:37:57Z

Description

Mempools::execute() runs all configured mempools concurrently and returns the first one that succeeds. Previously, errors from mempools that lost the race were counted as real failures, even though the overall submission was successful. Dropped mempools were never recorded.

This skewed mempool counts by both:

counting errors alongside a successful submission, and
omitting counts for dropped mempools.

This PR keeps the racing behavior but changes how observation works.

Changes

Behavior

On first success, emit mempool_succeeded for the winner and mempool_superseded for every other configured mempool.
On all-failed, emit mempool_failed for each one of them.

Code specific

Replace select_ok with FuturesUnordered in Mempools::execute so the consumer can observe each completion (crates/driver/src/domain/mempools.rs).
Split observe::mempool_executed into mempool_succeeded(&SubmissionSuccess) and mempool_failed(&mempools::Error), dropping the Result<&S, &E> indirection now that each call site already knows which branch it is on. Behavior and emitted metrics are unchanged by the split.
Add mempool_superseded(&Mempool, winner: &Mempool, &Settlement) which increments driver_mempool_submission with result="Superseded".

How to test

Existing driver unit tests cover the race semantics; this PR does not change the externally observable submission outcome, only how observation is sequenced and labeled. To verify manually:

Run the driver against a config with at least two mempools.
Trigger a settlement that succeeds via the public mempool.
Confirm Prometheus shows one result="Success" increment for the winner and one result="Superseded" increment for the loser; no Revert/Expired/Other from the loser.
Trigger a settlement that fails on every mempool and confirm each mempool gets its own non-Superseded failure label.

Alert query update needed when deploying

Per-mempool success counts both wins and races-lost (so happy and failure paths both emit N events for N configured mempools, keeping the ratio symmetric). Superseded stays as a separate label so dashboards can still distinguish wins from race-losses per mempool.

sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result=~"Success|Superseded"}[2h]))
/
sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result!="Disabled"}[2h])) < 0.6

When `Mempools::execute()` runs mempools in parallel, errors from mempools whose results were discarded after another mempool succeeded were still recorded against `driver_mempool_submission`, biasing the per-mempool success ratio with timing-dependent shadowed failures. Replace `select_ok` with `FuturesUnordered` + manual loop so observation runs in the consuming context. Errors that occur before another mempool succeeds are now recorded under a new `Superseded` label via `observe::mempool_superseded`, which also records the winning mempool in the trace fields. Errors in the all-failed case keep their existing labels (Revert / Expired / Other / Disabled). Alert query update needed when deploying: sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result="Success"}[2h])) / sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result!~"Disabled|Superseded"}[2h])) < 0.6

`mempool_executed` took a `Result<&SubmissionSuccess, &mempools::Error>` and re-matched the same discriminant several times to pick the log level, metric label, and block-passed labels. Replace it with two functions, `mempool_succeeded(&SubmissionSuccess)` and `mempool_failed(&mempools::Error)`, so each branch is straight-line and call sites pick the correct observer directly. Behavior and emitted metrics are unchanged.

fleupold

Is there a reason you are not using the PR template for the description?

I agree with the change, however I'd like to suggest that we interpret "superseeded" events as success wrt. how you envision to change the metric. A superseeded submission should be considered a successful one.

This way we receive N (# of mempool) events in the happy case, and N events in the failure case allowing us to keep our alert metric as a ratio of successful to failed ones (otherwise failed events would be weighted N times more than successful ones).

Every loser in a mempool race is now marked Superseded, whether it failed before the winner finished or was still in flight when the winner landed. The old code only labelled already-failed losers as superseded and quietly dropped ones still in flight; the shadowed_errors accumulator that carried their errors across is gone. Minor cleanup: - Error::blocks_passed on the domain type returns the block delta from submission to the terminal event for variants that carry block-level timing. This replaces the inline match in mempool_failed. - error_label is shared between mempool_failed and the per-attempt counter so the Prometheus labels stay in sync. The all-failed path also swaps the expect for an explicit Error::Other fallback instead of panicking on the (currently unreachable) empty-errors case.

tilacog · 2026-05-08T19:33:23Z

Is there a reason you are not using the PR template for the description?

Apologies, I was in a rush and didn't account for that. I've updated the description to match the template.

I agree with the change, however I'd like to suggest that we interpret "superseeded" events as success wrt. how you envision to change the metric. A superseeded submission should be considered a successful one.

This way we receive N (# of mempool) events in the happy case, and N events in the failure case allowing us to keep our alert metric as a ratio of successful to failed ones (otherwise failed events would be weighted N times more than successful ones).

Agree. I've adjusted the suggested metric.

gemini-code-assist

Code Review

This pull request refactors the mempool execution logic to use FuturesUnordered, enabling more detailed tracking of success, failure, and superseded states. It also adds a blocks_passed method to the Error enum for improved block-level timing metrics. A high-severity logic error was identified where disabled mempools are incorrectly reported as 'Superseded' if another mempool wins the race, which would artificially inflate success rate metrics. A correction was suggested to preserve the 'Disabled' status during the racing process.

jmg-duarte · 2026-05-11T12:35:12Z

+            .partition(|mempool| !self.is_disabled(mempool, settlement));
+
+        for mempool in &disabled {
+            observe::mempool_failed(mempool, settlement, &Error::Disabled);


this + the required changes; IMO disabled is not failed

Suggested change

observe::mempool_failed(mempool, settlement, &Error::Disabled);

observe::mempool_disabled(mempool, settlement);

Handled in d61152e

Disabled is a configuration skip, not a submission failure. Split it into its own observer so failure-rate metrics aren't polluted.

Co-authored-by: José Duarte <15343819+jmg-duarte@users.noreply.github.com>

This reverts commit 013e884.

Destructure the inner `&Mempool` in the future builder so pending futures no longer borrow `enabled`, freeing it for mutation. Pop the winner with `swap_remove(idx)` and iterate the remainder to record superseded observations.

jmg-duarte · 2026-05-12T12:55:21Z

+    let label = err.metric_label();
    metrics::get()
        .mempool_submission
-        .with_label_values(&[mempool.to_string().as_str(), result])
+        .with_label_values(&[mempool.to_string().as_str(), label])


nit

Suggested change

let label = err.metric_label();

metrics::get()

.mempool_submission

.with_label_values(&[mempool.to_string().as_str(), result])

.with_label_values(&[mempool.to_string().as_str(), label])

metrics::get()

.mempool_submission

.with_label_values(&[mempool.to_string().as_str(), err.metric_label()])

Fixed in 501bff3

jmg-duarte · 2026-05-12T12:56:07Z

+        mempools::Error::Disabled => {
            tracing::debug!(
                %mempool,
                "sending transaction via mempool disabled",


this log doesnt make a lot of sense

Suggested change

"sending transaction via mempool disabled",

"mempool disabled, not sending transaction",

Reworded in 7af8c8b

MartinquaXD

The idea to fix the metrics makes sense to me but the logic seems more complicated than necessary.

Wouldn't it also work to construct a hashmap mapping all mempools to a submission outcome?

initialize all mempools as superseeded
run new FuturesUnordered logic
match on each submit result and update the hashmap accordingly

At the end all submission futures that finished will have updated their own entry in the mapping and all futures that didn't finish were cancelled because some other pool was successful first so the original superseeded label is correct.

tilacog · 2026-05-12T13:55:31Z

I agree this got more complex than it should. (some of which came from requirements shifting along the way).

I'll try to propose a simpler alternative.

At the end all submission futures that finished will have updated their own entry in the mapping and all futures that didn't finish were cancelled because some other pool was successful first so the original superseeded label is correct.

If I'm reading this right, I think this arrangement still leaves us with a possible final state of [winner(1), errored(N), superseeded(M)] (out of [1+N+M] mempools).

The wrinkle is that we have to account for the Disabled state as a "distinct form" from its simbling variants in mempool::Error, so we don't end up with skewed metrics.

tilacog force-pushed the mempool-metric-superseded branch 2 times, most recently from a798fc3 to 9905e6e Compare May 6, 2026 15:39

This comment was marked as outdated.

Sign in to view

tilacog added 7 commits May 6, 2026 12:54

Add comments to mempool race logic

fe5207d

minor adjustments

844a156

minor adjustments

d4d060d

fmt

0ed1753

Document dropped futures in mempool race

d9fb0cb

tilacog force-pushed the mempool-metric-superseded branch from 9905e6e to d9fb0cb Compare May 6, 2026 15:55

cowprotocol deleted a comment from github-actions Bot May 6, 2026

fleupold reviewed May 7, 2026

View reviewed changes

tilacog marked this pull request as ready for review May 8, 2026 19:33

tilacog requested a review from a team as a code owner May 8, 2026 19:33

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread crates/driver/src/domain/mempools.rs Outdated

tilacog added 3 commits May 8, 2026 16:56

extract is_disabled as a function

4a27081

Skip disabled mempools before race

80e7539

Merge branch 'main' into mempool-metric-superseded

7d2b845

tilacog enabled auto-merge May 11, 2026 11:45

jmg-duarte reviewed May 11, 2026

View reviewed changes

tilacog and others added 7 commits May 11, 2026 09:59

Filter disabled mempools inline

bc8b4c2

Clarify doc comments on mempools::is_disabled

605f650

Tag skipped mempools as Disabled, not Failed

d61152e

Disabled is a configuration skip, not a submission failure. Split it into its own observer so failure-rate metrics aren't polluted.

Add is_enabled helper for readability

da45a12

Rename variable: other -> mempool

013e884

Co-authored-by: José Duarte <15343819+jmg-duarte@users.noreply.github.com>

Fix stale variable names in mempool superseded loop

06eaad1

Reword doc-comment

5fbfefc

tilacog added 2 commits May 11, 2026 10:38

Rename variable: futures -> submission_futures

4108da9

This reverts commit 013e884.

tilacog requested a review from jmg-duarte May 11, 2026 13:51

Document swap_remove in mempool race success arm

5c3aac5

jmg-duarte reviewed May 11, 2026

View reviewed changes

tilacog and others added 5 commits May 11, 2026 13:20

Qualify mempools::Error variants instead of star import

d8c6d67

Move error_label to Error::metric_label method

14466fc

Route mempool_submission labels through enum + named constants

17685bd

Merge branch 'main' into mempool-metric-superseded

da82b09

re-generate contracts

f8c5d0a

tilacog requested review from MartinquaXD and jmg-duarte May 11, 2026 17:06

Merge branch 'main' into mempool-metric-superseded

0f34ec9

jmg-duarte approved these changes May 12, 2026

View reviewed changes

tilacog added this pull request to the merge queue May 12, 2026

jmg-duarte removed this pull request from the merge queue due to a manual request May 12, 2026

MartinquaXD reviewed May 12, 2026

View reviewed changes

tilacog added 2 commits May 12, 2026 10:28

Inline err.metric_label() calls at submission metric sites

501bff3

Reword mempool-disabled log to clarify no submission

7af8c8b

	observe::mempool_failed(mempool, settlement, &Error::Disabled);
	observe::mempool_disabled(mempool, settlement);

	"sending transaction via mempool disabled",
	"mempool disabled, not sending transaction",

Conversation

tilacog commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Behavior

Code specific

How to test

Alert query update needed when deploying

Uh oh!

This comment was marked as outdated.

fleupold left a comment

Choose a reason for hiding this comment

Uh oh!

tilacog commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

jmg-duarte May 11, 2026

Choose a reason for hiding this comment

Uh oh!

tilacog May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmg-duarte May 12, 2026

Choose a reason for hiding this comment

Uh oh!

tilacog May 12, 2026

Choose a reason for hiding this comment

Uh oh!

jmg-duarte May 12, 2026

Choose a reason for hiding this comment

Uh oh!

tilacog May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MartinquaXD left a comment

Choose a reason for hiding this comment

Uh oh!

tilacog commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tilacog commented May 5, 2026 •

edited

Loading