consensus: rejoin a lagging archive by syncing proposals from peers by dazthecorgi · Pull Request #570 · QuilibriumNetwork/monorepo

dazthecorgi · 2026-06-17T16:32:19Z

A global-consensus participant that falls behind — e.g. an archive isolated by a network partition — could not rejoin consensus. Orphan resolution and finalized-rank advancement happen only on the consensus message path (on_receive_proposal -> forks.add_validated_state -> drain orphans); a plain frame-store write never triggers them, and freshness gates drop a lagging node's traffic until its finalized rank advances. With no path to backfill the missed proposals, the node would orphan every later proposal forever — its frame store could catch up (via the read-only poller) while its consensus engine stayed permanently stuck.

This ports the Go node's catch-up path (SyncProvider / GetGlobalProposal -> AddProposal): when the engine orphans a proposal for a missing parent, pull the missing proposals from a peer and submit them into the consensus loop so the node finalizes them and resumes voting — rather than merely mirroring frames.

Changes:

Serve full proposals: implement GlobalService.GetGlobalProposal, assembling state + parent QC + prior TC + proposer vote from the clock store (FrameLookup::get_global_proposal). Previously stubbed to return nothing for non-genesis frames.
Persist the proposer vote at proposal ingest (keyed filter,rank,selector) so it can be served back; the store trait had the accessor but no producer wrote it. Add ProposalVote/TimeoutCertificate proto<->wire conversions and proto_proposal_to_signed (only QC had one).
Trigger: add Consumer::on_missing_parent, fired at the orphan-cache site, surfaced through GlobalConsumer and a required SyncTriggerHook on ConsensusActivationParams — required (not optional) so the recovery path cannot be silently left unwired.
Catch-up task (node): a Notify-driven task pulls proposals via ArchiveClient::get_global_proposal ascending from the engine's finalized frame (tracked separately from the poller-shared head) and submits them (submit_quorum_certificate / submit_timeout_certificate / submit_proposal).

Verification: the devnet rank-1 partition scenario (archive-4 isolated) now passes with the victim genuinely rejoining consensus — it orphans during the partition, then on heal the catch-up supplies the one missing parent frame and the engine finalizes frames 1..4 (state finalized, persisted candidate frames), orphaning stops, and 4/4 nodes reach the stop frame with the chain-safety check clean. Before this change the victim stayed at finalized=0, orphaning every rank.

A global-consensus participant that falls behind — e.g. an archive isolated by a network partition — could not rejoin consensus. Orphan resolution and finalized-rank advancement happen only on the consensus message path (on_receive_proposal -> forks.add_validated_state -> drain orphans); a plain frame-store write never triggers them, and freshness gates drop a lagging node's traffic until its finalized rank advances. With no path to backfill the missed proposals, the node would orphan every later proposal forever — its frame store could catch up (via the read-only poller) while its consensus engine stayed permanently stuck. This ports the Go node's catch-up path (SyncProvider / GetGlobalProposal -> AddProposal): when the engine orphans a proposal for a missing parent, pull the missing proposals from a peer and submit them into the consensus loop so the node finalizes them and resumes voting — rather than merely mirroring frames. Changes: - Serve full proposals: implement GlobalService.GetGlobalProposal, assembling state + parent QC + prior TC + proposer vote from the clock store (FrameLookup::get_global_proposal). Previously stubbed to return nothing for non-genesis frames. - Persist the proposer vote at proposal ingest (keyed filter,rank,selector) so it can be served back; the store trait had the accessor but no producer wrote it. Add ProposalVote/TimeoutCertificate proto<->wire conversions and proto_proposal_to_signed (only QC had one). - Trigger: add Consumer::on_missing_parent, fired at the orphan-cache site, surfaced through GlobalConsumer and a required SyncTriggerHook on ConsensusActivationParams — required (not optional) so the recovery path cannot be silently left unwired. - Catch-up task (node): a Notify-driven task pulls proposals via ArchiveClient::get_global_proposal ascending from the engine's finalized frame (tracked separately from the poller-shared head) and submits them (submit_quorum_certificate / submit_timeout_certificate / submit_proposal). Verification: the devnet rank-1 partition scenario (archive-4 isolated) now passes with the victim genuinely rejoining consensus — it orphans during the partition, then on heal the catch-up supplies the one missing parent frame and the engine finalizes frames 1..4 (state finalized, persisted candidate frames), orphaning stops, and 4/4 nodes reach the stop frame with the chain-safety check clean. Before this change the victim stayed at finalized=0, orphaning every rank. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* current state of v2.1.0.24 * add more test coverage, fix patch number sync with config, fix race where initial failout of sync dooms workers to idle forever (until reboot) * fix tests (#566) * fix test compilation * fix keypair_from_protobuf_encoding test * re-enable ed448-rust tests * enable pacemaker_advances_on_qc test * fix transaction safety for hypergraph store writes (#567) * fix vertex data roundtrip test and add new one * fix: use tx when saving vertex data Step 1 — Failing regression test (TDD) Added get_vertex_data_not_visible_when_commit_txn_aborted to crates/quil-rpc/tests/vertex_data_end_to_end.rs. It drives the real lazy-commit path into a transaction, aborts it, and asserts GetVertexData returns nothing. Confirmed it failed on current code before any fix. Step 2 — save_vertex_underlying is now transaction-aware - crates/quil-types/src/store.rs: added txn: &dyn Transaction to the trait method. - crates/quil-store/src/hypergraph.rs: added inherent save_vertex_underlying_txn / save_tree_blob_txn using the existing with_rocks_batch pattern (stage into the batch, fall back to a direct write for unrecognized txns); the trait impl delegates to the txn variant. Kept the original inherent direct-write methods intact so the ~25 concrete-typed callers are untouched. Step 3 — Threaded txn through the lazy-tree commit crates/quil-tries/src/lazy_tree.rs: walk_leaves_persist now takes txn, passes it recursively and into save_vertex_underlying; the call site passes the txn that commit already receives. Leaf writes now join the same batch as the tree nodes and shard commit. Step 4 — Sync-apply path made atomic crates/quil-rpc/src/hypergraph_sync_probe.rs: extracted a persist_shard_refresh helper that wraps the tree-blob write and the per-vertex loop in a single transaction (new_transaction → writes → commit); on any error the txn is dropped, so partial syncs don't persist. Used in both the fresh and incremental paths. Step 5 — Implementors updated & verified Added the _txn param to the four other HypergraphStore impls (BTreeStore, MemStore, E2EHgStore, the quil-execution test store). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: make LazyVectorCommitmentTree::commit retry-safe Make LazyVectorCommitmentTree::commit retry-safe by deferring dirty-state clearing until the surrounding transaction is durably committed. Problem: commit drained its dirty-node map and pending_deletions and cleared dirty_flag before returning — i.e. before the caller's txn.commit(). If the txn was aborted/failed/never committed, the store stayed unchanged while the in-memory tree (which is reused across frames) believed it had persisted, so a retry silently skipped re-staging the missing writes. Fix: - crates/quil-tries/src/lazy_tree.rs: commit now takes a non-draining snapshot of the dirty set and pending_deletions to stage into the txn, and no longer clears dirty_flag. Removed the dead latest map. Added mark_persisted() — the sole place dirty bookkeeping is cleared; must be called only after txn.commit() succeeds. Idempotent. - crates/quil-hypergraph/src/crdt.rs: HypergraphCRDT::commit collects the trees it staged and calls mark_persisted() on each only after txn.commit()? succeeds. A failed commit returns early, leaving every tree dirty for a safe retry. - Side benefit: compute_shard_root (which commits under a NoopTxn) no longer clears dirty state, so the subsequent real commit re-stages those nodes transactionally. Tests: - crates/quil-tries/tests/golden_lazy_tree.rs — added commit_defers_dirty_clear_until_mark_persisted: proves the tree stays dirty after commit, that a retry after a discarded txn re-stages node writes, and that a clean tree (post mark_persisted) stages nothing. Added kv_len/clear_kv test helpers. - crates/quil-rpc/tests/vertex_data_end_to_end.rs — folded the is_dirty() lifecycle into the existing real-RocksDB commit-path tests: - success path now asserts dirty after txn.commit(), clean after mark_persisted(); - abort path now asserts the tree stays dirty (retry-safe). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: make compute_shard_root read-only compute_shard_root claimed to read "without writing anything to the store" but computed its root via tree.commit(&NoopTxn, ...). Since NoopTxn isn't a RocksTxn, every write inside commit (tree nodes, deletions, and vertex underlying blobs) fell through to a direct db.put, persisting data outside any frame transaction for an uncommitted frame. This runs on the hot master-stream receive path (maybe_sync_before_global_frame), so a routine root comparison leaked writes to disk. - quil-tries: extract LazyVectorCommitmentTree::compute_root(prover), the read-only half of commit — recomputes commitments in memory, returns the root, writes nothing and touches no dirty/deletion/dirty-flag state. Refactor commit to delegate to it. - quil-hypergraph: compute_shard_root now calls tree.compute_root(); deleted the inline NoopTxn shim and the doubled doc comment. - Add quil-hypergraph test compute_shard_root_is_read_only (with MemStore node_count/per_vertex_count accessors and a dirty-state helper): asserts a real root is returned, nothing is persisted, the tree stays dirty, and the root matches what a real commit then produces. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor: require RocksTxn for hypergraph store writes With compute_shard_root no longer passing a NoopTxn, no caller ever hands a non-RocksTxn to a RocksHypergraphStore write, so the silent direct-write fallback only served to mask bugs like the one compute_shard_root hit. Remove the with_rocks_batch helper. Replace it with RocksTxn::from_dyn(&dyn Transaction) -> Result<&RocksTxn>, which downcasts once and errors loudly on an unrecognized txn instead of writing outside the transaction. Rewrite all writers (save_tree_blob_txn, save_vertex_underlying_txn, insert_node, save_root, delete_node, set_shard_commit, set_alt_shard_commit, track_change, untrack_change) to operate on the concrete RocksTxn batch directly. The HypergraphStore trait signatures stay &dyn Transaction for object-safety (five implementors, used as Arc<dyn HypergraphStore>). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * handle leaving scenario with store wipe * reduce score differential basis for flagging leave-to-join opportunities, extend scoring-based leave window to be a full cycle * adjust margins on decisions * adjust threshold for decides and joins * adjust snapshotting to use actual rocksdb snapshots * resolve unsynced leave issuance condition * reapply docker build optimizations to Dockerfile.source (#568) Commit 1bdc940 ("current state of v2.1.0.24") accidentally reverted the build optimizations introduced in PR #551 while landing unrelated changes (ubuntu:22.04 base, upstream protoc, the build-migrate-tool RocksDB stage, qns-api removal). This restores the optimizations on top of those changes: - Consolidate the seven parallel gen-* stages (each running generate.sh) into a single gen-rust stage: one cargo build for all leaf crates so shared workspace deps compile once instead of seven times. Artifacts are stashed in /out/ since target/ is a cache mount, and build-context now COPYs --from=gen-rust accordingly. - Restore cargo cache mounts (cargo-registry, cargo-git, cargo-target-node) on build-node. - Restore Go cache mounts (go-build, go-mod) on build-qclient so a build-context invalidation no longer forces a from-scratch Go rebuild. - Extend the same Go cache mounts to the new build-migrate-tool stage for consistency. - Restore the explanatory comments on COPY crates and build-qclient. This brings Dockerfile.source back in line with Dockerfile.sourceavx512, which was untouched by the revert and shares the cargo/go cache ids. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * consensus: rejoin a lagging archive by syncing proposals from peers (#570) A global-consensus participant that falls behind — e.g. an archive isolated by a network partition — could not rejoin consensus. Orphan resolution and finalized-rank advancement happen only on the consensus message path (on_receive_proposal -> forks.add_validated_state -> drain orphans); a plain frame-store write never triggers them, and freshness gates drop a lagging node's traffic until its finalized rank advances. With no path to backfill the missed proposals, the node would orphan every later proposal forever — its frame store could catch up (via the read-only poller) while its consensus engine stayed permanently stuck. This ports the Go node's catch-up path (SyncProvider / GetGlobalProposal -> AddProposal): when the engine orphans a proposal for a missing parent, pull the missing proposals from a peer and submit them into the consensus loop so the node finalizes them and resumes voting — rather than merely mirroring frames. Changes: - Serve full proposals: implement GlobalService.GetGlobalProposal, assembling state + parent QC + prior TC + proposer vote from the clock store (FrameLookup::get_global_proposal). Previously stubbed to return nothing for non-genesis frames. - Persist the proposer vote at proposal ingest (keyed filter,rank,selector) so it can be served back; the store trait had the accessor but no producer wrote it. Add ProposalVote/TimeoutCertificate proto<->wire conversions and proto_proposal_to_signed (only QC had one). - Trigger: add Consumer::on_missing_parent, fired at the orphan-cache site, surfaced through GlobalConsumer and a required SyncTriggerHook on ConsensusActivationParams — required (not optional) so the recovery path cannot be silently left unwired. - Catch-up task (node): a Notify-driven task pulls proposals via ArchiveClient::get_global_proposal ascending from the engine's finalized frame (tracked separately from the poller-shared head) and submits them (submit_quorum_certificate / submit_timeout_certificate / submit_proposal). Verification: the devnet rank-1 partition scenario (archive-4 isolated) now passes with the victim genuinely rejoining consensus — it orphans during the partition, then on heal the catch-up supplies the one missing parent frame and the engine finalizes frames 1..4 (state finalized, persisted candidate frames), orphaning stops, and 4/4 nodes reach the stop frame with the chain-safety check clean. Before this change the victim stayed at finalized=0, orphaning every rank. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Daz <daz_the_corgi@proton.me> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Re-derives the devnet integration-test harness (crates/devnet orchestrator + proxy) and its additive hooks onto the v2.1.0.25 base (709a20b), replacing the prior long-lived devnet fork that re-implemented v2.1.0.24 independently. The base already contains the equivalent of all the fork's pre-devnet commits (v2.1.0.24/25 + memory + OOM work, the #56x/QuilibriumNetwork#570 PRs, and the leave/join tuning), so only the genuinely-new "add devnet" delta is carried forward. Additive hooks (re-applied onto the base's versions): - quil-p2p behaviour.rs/node.rs: per-(source,target) gossip forward_filter + P2PCommand::SetForwardFilter / P2PHandle::set_forward_filter. - quil-rpc frame_sync.rs: parameterize ArchiveEndpointPool blacklist TTL (Duration::ZERO disables banning for instant partition recovery). - quil-config engine.rs/lib.rs: archiveBlacklistTtl engine field (-1 disables). - quil-node master_node/mod.rs: wire the configured TTL into the pool. - docker/Dockerfile.source: build-devnet-proxy / build-p2p-ping / devnet-proxy stages + p2p-ping in the node image. - proxy consensus_events.rs test: ProposalVote gained the base's `openings` field. Verified: full workspace builds; unit tests green (quil-config TTL, quil-rpc pool_tests incl. zero-TTL, quil-p2p forward_filter_tests, devnet + devnet-proxy); fmt (devnet crates only) + clippy --no-deps clean. KNOWN FOLLOW-UP: the e2e docker partition test does not yet pass on this base. v2.1.0.25 moved global consensus from blossomsub gossip to direct point-to-point gRPC (direct_global_consensus_publisher / submit_global_consensus), so the proxy's gossip-snooping no longer observes consensus frames and times out. The proxy must be reworked to detect the stop frame (and impose global-consensus partitions) on the gRPC submit_global_consensus path instead of gossip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CassOnMars merged commit f7753c3 into QuilibriumNetwork:v2.1.0.24 Jun 18, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

consensus: rejoin a lagging archive by syncing proposals from peers#570

consensus: rejoin a lagging archive by syncing proposals from peers#570
CassOnMars merged 1 commit into
QuilibriumNetwork:v2.1.0.24from
dazthecorgi:fix-archive-consensus-catch-up

dazthecorgi commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dazthecorgi commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants