Skip to content

[feature not live] docs(merge-queue): add Testing Duration chart to metrics page#647

Draft
samgutentag wants to merge 3 commits into
mainfrom
sam-gutentag/mq-testing-duration-chart
Draft

[feature not live] docs(merge-queue): add Testing Duration chart to metrics page#647
samgutentag wants to merge 3 commits into
mainfrom
sam-gutentag/mq-testing-duration-chart

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

  • Documents the new Testing Duration chart in the Merge Queue Health tab
  • Covers the Outcome and Cycle Ended In filter dropdowns
  • Notes separate bucketing behavior (no cross-chart hover sync)
  • Adds stat measures table (Avg, Min, Max, Sum, P50, P95, P99)

Source

Test plan

  • Preview in GitBook

Generated by Claude Code

Documents the new Testing Duration chart in the Merge Queue Health tab,
including the available stat measures and the Outcome / Cycle Ended In
filter dropdowns. Notes the separate bucketing behavior from the other
health charts.

Source: trunk-io/trunk2#3919

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@trunk-staging-io
Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 9, 2026

Claude finished @samgutentag's task in 2m 18s —— View job


Review summary

Reviewed merge-queue/administration/metrics.md (+42/-0). The new Testing duration section is well-structured, accurate to the trunk2 PR, and slots in cleanly between Time in queue and Drill down into metrics. No blocking issues.

Inline comments posted (all non-blocking):

  • L116–124 — Stat-measures table style differs from the existing Time in queue table (column padding, trailing periods). The new wording is actually more grammatical; suggested either matching format or harmonizing both.
  • L138Cancelled vs Canceled spelling: the page is already mixed (older Conclusion-count content uses American, newer drill-down/Prometheus content uses British). Worth being deliberate; if the UI label is Cancelled, keep as-is.
  • L110 — "the cycle" is referenced before being introduced. Suggested "the testing cycle" for clarity. Also flagged that this is the only major section on the page without a figure.
  • L132–138 — Reader-facing: Interrupted vs Cancelled descriptions read similarly; consider a one-line distinction so readers know which filter to pick.

Strengths

  • The cross-chart bucketing caveat at L110 is exactly the kind of footgun-prevention worth calling out.
  • L112 ("a single PR can contribute multiple data points if its testing cycle restarted") is a useful clarification — easy to miss otherwise.
  • Filter tables clearly distinguish Outcome (per testing cycle) from Cycle ended in (per overall PR cycle).

Tasks

  • Read diff and surrounding sections
  • Cross-check style/consistency
  • Post inline comments
  • Post review summary
    • Branch: sam-gutentag/mq-testing-duration-chart

Comment thread merge-queue/administration/metrics.md Outdated
Comment thread merge-queue/administration/metrics.md Outdated
Comment thread merge-queue/administration/metrics.md Outdated
Comment thread merge-queue/administration/metrics.md Outdated
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@samgutentag
Copy link
Copy Markdown
Member Author

Verification status (2026-05-12): pending

Eng merged but flag off in prod. Hold off.

  • Eng PR: trunk-io/trunk2#3919 (merged 2026-05-08, merge commit intact on main)
  • Flag: displayMergeHealthTestDuration
  • Signals:
    • 0 follow-up PRs in trunk-io/trunk2 reference the flag (exact-phrase search)
    • ts/apps/e2e/flags.json has no entry for this flag (defaults to off in e2e)
    • LaunchDarkly bot posts in #staging-notifications and #production-notifications are dated 2026-05-07, before the eng PR merged on 2026-05-08. Per the recency rule these count as background only, not rollout confirmation.
    • No post-merge Slack messages confirming staging or prod rollout
    • Eng PR body explicitly states "This is behind a feature flag" and lists "allow drilling down" as the next phase, suggesting rollout is not imminent

Wait for the flag to be enabled in prod, then re-run /verify-docs-pr 647.

@samgutentag samgutentag changed the title docs(merge-queue): add Testing Duration chart to metrics page [feature not live] docs(merge-queue): add Testing Duration chart to metrics page May 12, 2026
Folds in the strongest pieces from PR #649 so it can be closed without
losing work:
- Adds the CI-vs-queue-wait-time framing line
- Wraps the restart caveat in a {% hint style="info" %} block (matches
  the page's existing pattern)
- Adds a worked filter example combining Outcome and Cycle ended in
- Cross-links to the Time in queue section
- Promotes the stat measures table under its own #### subheading
- Drops em dashes per house style

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 12, 2026

Claude finished @samgutentag's task in 1m 30s —— View job


Review summary

Reviewed merge-queue/administration/metrics.md (+50/-0). The second commit (ca90308) cleanly folds the narrative bits from #649 — CI-vs-wait-time framing, the worked filter example, the hint block wrapping the restart caveat, and the promoted Statistical measures subheading. No blocking issues; status remains in line with the verification gate (displayMergeHealthTestDuration still off in prod).

A few non-blocking nits, mostly carryovers from the 2026-05-09 review plus one introduced by the new commit:

Carried over (still unaddressed):

  • L148–156 — stat measures table formatting still differs from the Time in queue table at L98–106 (no column padding, no trailing periods). The new wording is grammatically tighter ("…below which 95% of testing durations fall" vs the older "…below 95% of the time in queue falls."), so harmonizing up to the new wording is a good follow-up.
  • L110 — "when the cycle reaches its final state" still introduces "the cycle" before it has been defined. Suggest "when the testing cycle reaches its final state" so the next paragraph ("its testing cycle restarted") reads consistently.
  • L129–130 — Interrupted vs Cancelled descriptions are still close enough that a reader has to guess which filter to pick. A one-line distinction (what triggers each) would help.
  • No figure — every other major H3 on this page has a screenshot. Worth adding one once the feature is live in prod and you can capture it.

New in ca90308:

  • L142 — the example sentence uses American "canceled or failed runs", but the filter table four lines up at L130 uses British "Cancelled". Same section disagreeing with itself reads jarringly even given the pre-existing page-wide mix. Match whichever spelling the UI label uses.

Other small things:

  • L110"spends in the TESTING state": uppercase TESTING is fine if it matches the UI label / Prometheus label, but it's stylistically heavier than the lowercase state names used elsewhere on the page (e.g. "merged, failing, or being canceled" at L92). Worth a quick sanity check against the UI string.
  • L130"Cycles cancelled before tests ran to completion" reads awkwardly (if they were cancelled before completion, they didn't "run to completion"). Consider "Cycles cancelled before tests completed" or "Cycles cancelled before tests finished".
  • L126 vs L136"All Outcomes" in the Outcome filter but just "All" in the Cycle ended in filter. If the UI labels actually differ, fine; otherwise consider matching.

Strengths

  • The CI-vs-queue-wait-time framing at L112 (added in ca90308) is exactly the right context — readers will immediately understand why this chart exists separately from Time in queue.
  • The worked filter example at L142 (Passed + Merged → clean CI baseline) turns an abstract filter combo into something a reader can act on.
  • Promoting the stat measures into a Statistical measures subheading is a nice symmetry with Filters.
  • The hint block at L114–116 ({% hint style="info" %}) matches the page's existing pattern and surfaces the multi-data-point caveat in a way that's hard to miss.

Tasks

  • Read current state of merge-queue/administration/metrics.md
  • Diff against origin/main and inspect both commits
  • Cross-check carried-over issues from prior review
  • Review new commit (ca90308) for new issues
  • Post review summary
    • Branch: sam-gutentag/mq-testing-duration-chart

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

- Clarify "the cycle" as "the testing cycle" on first reference
- Rewrite Interrupted vs Cancelled filter rows with proto-defined
  distinctions (Interrupted: run cut short, cycle may continue;
  Cancelled: cycle ends mid-test)
- Harmonize the Time in queue stat-measures table with the new
  Testing duration table (cleaner column layout, no trailing periods,
  "value below which" phrasing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 12, 2026

Claude finished @samgutentag's task in 1m 20s —— View job


Re-review after d1080f4

Reviewed merge-queue/administration/metrics.md against origin/main (now +59/-9). The third commit, d1080f4, cleanly resolves the substantive items from the prior two rounds. No blocking issues; PR remains gated on the prod rollout of displayMergeHealthTestDuration.

Carried-over items — status

# Item Status
1 Stat-measures table style — harmonize Time in queue (L98–106) with Testing duration (L148–156) Fixed. Both tables now use the un-padded layout, no trailing periods, and the "value below which N% … fall" phrasing.
2 "the cycle" introduced before being defined (L110) Fixed. Now reads "when the testing cycle reaches its final state".
3 Interrupted vs Cancelled indistinguishable (L129–130) Fixed with proto-grounded wording. The "cycle may continue" vs "cycle ends" distinction is exactly what a reader needs to pick the right filter.
4 No figure for the new chart Deferred — appropriate; capture once displayMergeHealthTestDuration is on in prod.
5 "Cycles cancelled before tests ran to completion" reads awkwardly (L130) Fixed as part of #3 — now "Cycles cancelled mid-test".

Still-open nits (non-blocking)

  • L142 — American "canceled" inside a section that uses British "Cancelled". "…without noise from canceled or failed runs." sits four lines below the Cancelled filter row at L139. Sam clarified earlier that the British spelling matches the proto enum / UI label, so the filter rows are correct — but the prose at L142 isn't referring to a UI label, just describing the data, and would read more consistently as "cancelled or failed runs". Same-paragraph spelling mix is the kind of thing that catches a reader's eye even when the page-wide mix doesn't.
  • L110 — uppercase TESTING. Stylistically heavier than the lowercase state names used at L92 ("merged, failing, or being canceled"). Fine if it's matching a specific UI string / Prometheus label; worth a quick sanity check otherwise.
  • L126 vs L136 — "All Outcomes" vs "All". If the UI labels genuinely differ between the two dropdowns, leave as-is; otherwise consider matching for symmetry.

Strengths preserved from prior commits

  • Cross-chart bucketing footgun called out at L110 + the hint block at L114–116.
  • CI-vs-queue-wait-time framing at L112.
  • Worked filter example at L142 (Passed + Merged → clean CI baseline).
  • Proto-grounded Interrupted/Cancelled distinction is genuinely useful — readers won't have to guess.

Tasks

  • Read current state of merge-queue/administration/metrics.md
  • Diff against origin/main and review d1080f4
  • Verify carried-over issues from prior reviews are addressed
  • Check for any new issues introduced
  • Post review summary
    • Branch: sam-gutentag/mq-testing-duration-chart

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants