[feature not live] docs(merge-queue): document Testing Duration metrics chart#649
[feature not live] docs(merge-queue): document Testing Duration metrics chart#649samgutentag wants to merge 1 commit into
Conversation
Add a Testing Duration section to the Merge Queue metrics page covering the new chart that shows time PRs spend in the TESTING phase. Explains the distinction from Time in Queue, the Outcome and Cycle Ended In filters, and the available statistical measures. Shipped in trunk-io/trunk2#3919 (v173). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Claude finished @samgutentag's task in 1m 37s —— View job Review complete
SummaryContent is accurate and well-organized — the section sits in the right place, the relationship to Time in queue is explained clearly, and the hint about cycle restarts is a useful catch. Issues are all style/consistency-level, no blockers:
|
| Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled). | ||
|
|
||
| This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time. | ||
|
|
||
| {% hint style="info" %} | ||
| Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once. | ||
| {% endhint %} | ||
|
|
||
| The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls. | ||
|
|
||
| #### Filters | ||
|
|
||
| Two filters let you narrow the data: | ||
|
|
||
| * **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution. | ||
| * **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set. |
There was a problem hiding this comment.
Style: em-dash consistency. The rest of this file uses real em-dashes (—) for parenthetical dashes (see lines 149, 151–154, 161, 173–175), but this new section uses double-hyphens (--) in lines 110, 115, 124, and 125. GitBook does not auto-convert -- to —, so these will render as literal double-hyphens, which looks off next to the rest of the page.
Recommend swapping each -- for — (em-dash). E.g.:
| Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled). | |
| This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time. | |
| {% hint style="info" %} | |
| Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once. | |
| {% endhint %} | |
| The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls. | |
| #### Filters | |
| Two filters let you narrow the data: | |
| * **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution. | |
| * **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set. | |
| Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue — from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled). | |
| This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time. | |
| {% hint style="info" %} | |
| Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once. | |
| {% endhint %} | |
| The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls. | |
| #### Filters | |
| Two filters let you narrow the data: | |
| * **Outcome** — Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution. | |
| * **Cycle ended in** — Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set. |
| | Measure | Explanation | | ||
| | ------- | ----------------------------------------------------------- | | ||
| | Average | Average testing duration during the time bucket | | ||
| | Minimum | The shortest testing duration in the time bucket | | ||
| | Maximum | The longest testing duration in the time bucket | | ||
| | Sum | The total of all testing durations added together | | ||
| | P50 | The value below which 50% of testing durations fall | | ||
| | P95 | The value below which 95% of testing durations fall | | ||
| | P99 | The value below which 99% of testing durations fall | |
There was a problem hiding this comment.
Consistency with the Time in queue table. Compared to the existing table at lines 98–106, this new table differs in two small ways:
- Trailing punctuation. The Time in queue rows end with periods; these rows do not.
- "Average" row wording. The Time in queue version reads
Average of all time in queue during the time bucket, but here it'sAverage testing duration during the time bucket. The parallel phrasing reads better when the two tables sit on the same page.
Note the new percentile rows actually read more clearly than the existing ones (The value below which 50% of testing durations fall vs. the older The value below 50% of the time in queue falls., which is grammatically awkward). Up to you whether to (a) align to the existing style for now or (b) fix the older table to match the new, clearer wording in a follow-up.
Minimal alignment to existing style:
| | Measure | Explanation | | |
| | ------- | ----------------------------------------------------------- | | |
| | Average | Average testing duration during the time bucket | | |
| | Minimum | The shortest testing duration in the time bucket | | |
| | Maximum | The longest testing duration in the time bucket | | |
| | Sum | The total of all testing durations added together | | |
| | P50 | The value below which 50% of testing durations fall | | |
| | P95 | The value below which 95% of testing durations fall | | |
| | P99 | The value below which 99% of testing durations fall | | |
| | Measure | Explanation | | |
| | ------- | --------------------------------------------------------- | | |
| | Average | Average of all testing durations during the time bucket. | | |
| | Minimum | The shortest testing duration in the time bucket. | | |
| | Maximum | The longest testing duration in the time bucket. | | |
| | Sum | The total of all testing durations added together. | | |
| | P50 | The value below which 50% of testing durations fall. | | |
| | P95 | The value below which 95% of testing durations fall. | | |
| | P99 | The value below which 99% of testing durations fall. | |
|
|
||
| Two filters let you narrow the data: | ||
|
|
||
| * **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution. |
There was a problem hiding this comment.
Vague "and others". Options include Passed, Failed, and others. leaves the reader wondering what "others" are. If the dropdown has a fixed, enumerable set of options (e.g., Passed / Failed / Canceled / Timed out), list them. If the set is open-ended or evolving, say so explicitly (e.g., "and additional outcomes such as Canceled and Timed out") so the reader knows whether to expect to discover more.
| Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once. | ||
| {% endhint %} | ||
|
|
||
| The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls. |
There was a problem hiding this comment.
Casing inconsistency. Testing Metrics is title-cased here, but the surrounding section names on this page are sentence case (e.g., Time in queue, Conclusion count, Drill down into metrics). If Testing Metrics is a literal UI label, consider quoting or pointing that out so it doesn't read like a style slip; otherwise lowercase to Testing metrics for consistency.
|
|
||
| ### Testing duration | ||
|
|
||
| Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled). |
There was a problem hiding this comment.
Terminology clarification. The phrase the test cycle reaches a final state (merged, failed, or canceled) mixes terminology: elsewhere on this page the Conclusion count table (line 88) uses Pass / Failure / Cancel categories, where "Merged by Trunk" is a reason within Pass. Consider matching that vocabulary, e.g. …reaches a final state (passed, failed, or canceled), or align with the Cycle ended in filter values you describe on line 125 (Merged / Failed). Whichever you pick, the same three terms ideally show up in both places.
|
Verification status (2026-05-12): Eng merged but flag off in prod. Hold off.
Note: PR #647 documents the same feature and received the same verdict. Coordinate which docs PR to land before merging. Wait for the flag to be enabled in prod, then re-run |
Folds in the strongest pieces from PR #649 so it can be closed without losing work: - Adds the CI-vs-queue-wait-time framing line - Wraps the restart caveat in a {% hint style="info" %} block (matches the page's existing pattern) - Adds a worked filter example combining Outcome and Cycle ended in - Cross-links to the Time in queue section - Promotes the stat measures table under its own #### subheading - Drops em dashes per house style Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Closing this in favor of #647, which documents the same feature (Testing Duration chart from trunk-io/trunk2#3919). The following pieces of this PR have been carried forward into #647 (commit
What was kept from #647 instead of this PR:
No work is lost. See #647 for the consolidated PR. |
Summary
merge-queue/administration/metrics.mdContext
Shipped in trunk-io/trunk2#3919 (v173). The chart is behind a feature flag and will be rolling out.
Test plan
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Generated by Claude Code