Skip to content

[feature not live] docs(merge-queue): document Testing Duration metrics chart#649

Closed
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/merge-queue-testing-duration-metrics
Closed

[feature not live] docs(merge-queue): document Testing Duration metrics chart#649
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/merge-queue-testing-duration-metrics

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

  • Adds a Testing Duration section to merge-queue/administration/metrics.md
  • Explains that testing duration measures time in the TESTING phase only, distinct from Time in Queue (total entry-to-exit time)
  • Documents the Outcome and Cycle Ended In filters, and the statistical measures (avg, min, max, sum, p50, p95, p99)
  • Notes that a single PR can appear multiple times if its cycle restarted

Context

Shipped in trunk-io/trunk2#3919 (v173). The chart is behind a feature flag and will be rolling out.

Test plan

  • Verify the Testing Duration section renders correctly in GitBook preview
  • Confirm the section appears between "Time in queue" and "Drill down into metrics"
  • Check that filter descriptions match the actual UI dropdowns

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Generated by Claude Code

Add a Testing Duration section to the Merge Queue metrics page covering
the new chart that shows time PRs spend in the TESTING phase. Explains
the distinction from Time in Queue, the Outcome and Cycle Ended In filters,
and the available statistical measures.

Shipped in trunk-io/trunk2#3919 (v173).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@trunk-staging-io
Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 12, 2026

Claude finished @samgutentag's task in 1m 37s —— View job


Review complete

  • Read the changed file and surrounding sections for context
  • Compare against existing Time in queue section for style consistency
  • Check for typos, grammar, formatting
  • Post inline review comments

Summary

Content is accurate and well-organized — the section sits in the right place, the relationship to Time in queue is explained clearly, and the hint about cycle restarts is a useful catch. Issues are all style/consistency-level, no blockers:

  1. Em-dashes — new section uses -- while the rest of the page uses . GitBook won't auto-convert, so they'll render as literal double-hyphens. (inline)
  2. Statistical-measures table — trailing punctuation and "Average" wording diverge from the existing Time in queue table on the same page; suggested aligned version inline. Worth noting your new percentile phrasing is actually clearer than the existing one. (inline)
  3. Vague "and others" in the Outcome filter description — enumerate or qualify. (inline)
  4. Testing Metrics is title-cased; sibling section names on the page are sentence case. (inline)
  5. Final-state terminology mixes "merged" with the page's existing Pass/Failure/Cancel vocabulary; pick one. (inline)
    · Branch

Comment on lines +110 to +125
Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).

This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.

{% hint style="info" %}
Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
{% endhint %}

The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.

#### Filters

Two filters let you narrow the data:

* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
* **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: em-dash consistency. The rest of this file uses real em-dashes () for parenthetical dashes (see lines 149, 151–154, 161, 173–175), but this new section uses double-hyphens (--) in lines 110, 115, 124, and 125. GitBook does not auto-convert -- to , so these will render as literal double-hyphens, which looks off next to the rest of the page.

Recommend swapping each -- for (em-dash). E.g.:

Suggested change
Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).
This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.
{% hint style="info" %}
Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
{% endhint %}
The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.
#### Filters
Two filters let you narrow the data:
* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
* **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.
Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).
This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.
{% hint style="info" %}
Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
{% endhint %}
The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.
#### Filters
Two filters let you narrow the data:
* **Outcome** Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
* **Cycle ended in** Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.

Comment on lines +133 to +141
| Measure | Explanation |
| ------- | ----------------------------------------------------------- |
| Average | Average testing duration during the time bucket |
| Minimum | The shortest testing duration in the time bucket |
| Maximum | The longest testing duration in the time bucket |
| Sum | The total of all testing durations added together |
| P50 | The value below which 50% of testing durations fall |
| P95 | The value below which 95% of testing durations fall |
| P99 | The value below which 99% of testing durations fall |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with the Time in queue table. Compared to the existing table at lines 98–106, this new table differs in two small ways:

  1. Trailing punctuation. The Time in queue rows end with periods; these rows do not.
  2. "Average" row wording. The Time in queue version reads Average of all time in queue during the time bucket, but here it's Average testing duration during the time bucket. The parallel phrasing reads better when the two tables sit on the same page.

Note the new percentile rows actually read more clearly than the existing ones (The value below which 50% of testing durations fall vs. the older The value below 50% of the time in queue falls., which is grammatically awkward). Up to you whether to (a) align to the existing style for now or (b) fix the older table to match the new, clearer wording in a follow-up.

Minimal alignment to existing style:

Suggested change
| Measure | Explanation |
| ------- | ----------------------------------------------------------- |
| Average | Average testing duration during the time bucket |
| Minimum | The shortest testing duration in the time bucket |
| Maximum | The longest testing duration in the time bucket |
| Sum | The total of all testing durations added together |
| P50 | The value below which 50% of testing durations fall |
| P95 | The value below which 95% of testing durations fall |
| P99 | The value below which 99% of testing durations fall |
| Measure | Explanation |
| ------- | --------------------------------------------------------- |
| Average | Average of all testing durations during the time bucket. |
| Minimum | The shortest testing duration in the time bucket. |
| Maximum | The longest testing duration in the time bucket. |
| Sum | The total of all testing durations added together. |
| P50 | The value below which 50% of testing durations fall. |
| P95 | The value below which 95% of testing durations fall. |
| P99 | The value below which 99% of testing durations fall. |


Two filters let you narrow the data:

* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vague "and others". Options include Passed, Failed, and others. leaves the reader wondering what "others" are. If the dropdown has a fixed, enumerable set of options (e.g., Passed / Failed / Canceled / Timed out), list them. If the set is open-ended or evolving, say so explicitly (e.g., "and additional outcomes such as Canceled and Timed out") so the reader knows whether to expect to discover more.

Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
{% endhint %}

The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casing inconsistency. Testing Metrics is title-cased here, but the surrounding section names on this page are sentence case (e.g., Time in queue, Conclusion count, Drill down into metrics). If Testing Metrics is a literal UI label, consider quoting or pointing that out so it doesn't read like a style slip; otherwise lowercase to Testing metrics for consistency.


### Testing duration

Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminology clarification. The phrase the test cycle reaches a final state (merged, failed, or canceled) mixes terminology: elsewhere on this page the Conclusion count table (line 88) uses Pass / Failure / Cancel categories, where "Merged by Trunk" is a reason within Pass. Consider matching that vocabulary, e.g. …reaches a final state (passed, failed, or canceled), or align with the Cycle ended in filter values you describe on line 125 (Merged / Failed). Whichever you pick, the same three terms ideally show up in both places.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@samgutentag
Copy link
Copy Markdown
Member Author

Verification status (2026-05-12): pending

Eng merged but flag off in prod. Hold off.

  • Eng PR: trunk-io/trunk2#3919 (merged 2026-05-08, merge commit intact on main)
  • Flag: displayMergeHealthTestDuration
  • Signals:
    • 0 follow-up PRs in trunk-io/trunk2 reference the flag (exact-phrase search)
    • ts/apps/e2e/flags.json has no entry for this flag (defaults to off in e2e)
    • LaunchDarkly bot posts in #staging-notifications and #production-notifications are dated 2026-05-07, before the eng PR merged on 2026-05-08. Per the recency rule these count as background only, not rollout confirmation.
    • No post-merge Slack messages confirming staging or prod rollout
    • Eng PR body explicitly states "This is behind a feature flag" and lists "allow drilling down" as the next phase, suggesting rollout is not imminent

Note: PR #647 documents the same feature and received the same verdict. Coordinate which docs PR to land before merging.

Wait for the flag to be enabled in prod, then re-run /verify-docs-pr 649.

@samgutentag samgutentag changed the title docs(merge-queue): document Testing Duration metrics chart [feature not live] docs(merge-queue): document Testing Duration metrics chart May 12, 2026
samgutentag added a commit that referenced this pull request May 12, 2026
Folds in the strongest pieces from PR #649 so it can be closed without
losing work:
- Adds the CI-vs-queue-wait-time framing line
- Wraps the restart caveat in a {% hint style="info" %} block (matches
  the page's existing pattern)
- Adds a worked filter example combining Outcome and Cycle ended in
- Cross-links to the Time in queue section
- Promotes the stat measures table under its own #### subheading
- Drops em dashes per house style

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@samgutentag
Copy link
Copy Markdown
Member Author

Closing this in favor of #647, which documents the same feature (Testing Duration chart from trunk-io/trunk2#3919).

The following pieces of this PR have been carried forward into #647 (commit ca90308):

  • The CI-vs-queue-wait-time framing ("Use this chart to understand CI performance specifically, separate from queue wait time")
  • The worked filter example (Outcome: Passed + Cycle ended in: Merged for a clean CI baseline)
  • The [Time in queue](#time-in-queue) cross-link
  • The restart caveat wrapped in a {% hint style="info" %} block (matches the page's existing pattern)
  • A dedicated #### Statistical measures subheading

What was kept from #647 instead of this PR:

  • Complete enumeration of Outcome filter values (All Outcomes / Passed / Failed / Interrupted / Cancelled)
  • Complete enumeration of Cycle ended in filter values (All / Merged / Failed / Cancelled / In Flight)
  • The cross-chart bucketing caveat (no hover sync across charts)
  • Real em dashes for now, normalized to periods in the merged section

No work is lost. See #647 for the consolidated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants