[feature not live] docs(merge-queue): document Testing Duration metrics chart by samgutentag · Pull Request #649 · trunk-io/docs

samgutentag · 2026-05-12T18:08:33Z

Summary

Adds a Testing Duration section to merge-queue/administration/metrics.md
Explains that testing duration measures time in the TESTING phase only, distinct from Time in Queue (total entry-to-exit time)
Documents the Outcome and Cycle Ended In filters, and the statistical measures (avg, min, max, sum, p50, p95, p99)
Notes that a single PR can appear multiple times if its cycle restarted

Context

Shipped in trunk-io/trunk2#3919 (v173). The chart is behind a feature flag and will be rolling out.

Test plan

Verify the Testing Duration section renders correctly in GitBook preview
Confirm the section appears between "Time in queue" and "Drill down into metrics"
Check that filter descriptions match the actual UI dropdowns

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Generated by Claude Code

Add a Testing Duration section to the Merge Queue metrics page covering the new chart that shows time PRs spend in the TESTING phase. Explains the distinction from Time in Queue, the Outcome and Cycle Ended In filters, and the available statistical measures. Shipped in trunk-io/trunk2#3919 (v173). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

trunk-staging-io · 2026-05-12T18:08:37Z

Merging to main in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

claude · 2026-05-12T18:09:03Z

Claude finished @samgutentag's task in 1m 37s —— View job

Review complete

Read the changed file and surrounding sections for context
Compare against existing Time in queue section for style consistency
Check for typos, grammar, formatting
Post inline review comments

Summary

Content is accurate and well-organized — the section sits in the right place, the relationship to Time in queue is explained clearly, and the hint about cycle restarts is a useful catch. Issues are all style/consistency-level, no blockers:

Em-dashes — new section uses -- while the rest of the page uses —. GitBook won't auto-convert, so they'll render as literal double-hyphens. (inline)
Statistical-measures table — trailing punctuation and "Average" wording diverge from the existing Time in queue table on the same page; suggested aligned version inline. Worth noting your new percentile phrasing is actually clearer than the existing one. (inline)
Vague "and others" in the Outcome filter description — enumerate or qualify. (inline)
Testing Metrics is title-cased; sibling section names on the page are sentence case. (inline)
Final-state terminology mixes "merged" with the page's existing Pass/Failure/Cancel vocabulary; pick one. (inline)
· Branch

claude · 2026-05-12T18:10:26Z

+Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).
+
+This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.
+
+{% hint style="info" %}
+Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
+{% endhint %}
+
+The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.
+
+#### Filters
+
+Two filters let you narrow the data:
+
+* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
+* **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.


Style: em-dash consistency. The rest of this file uses real em-dashes (—) for parenthetical dashes (see lines 149, 151–154, 161, 173–175), but this new section uses double-hyphens (--) in lines 110, 115, 124, and 125. GitBook does not auto-convert -- to —, so these will render as literal double-hyphens, which looks off next to the rest of the page.

Recommend swapping each -- for — (em-dash). E.g.:

Suggested change

Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).

This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.

{% hint style="info" %}

Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.

{% endhint %}

The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.

#### Filters

Two filters let you narrow the data:

* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.

* **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.

Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue — from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).

This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.

{% hint style="info" %}

Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.

{% endhint %}

The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.

#### Filters

Two filters let you narrow the data:

* **Outcome** — Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.

* **Cycle ended in** — Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.

claude · 2026-05-12T18:10:33Z

+| Measure | Explanation |
+| ------- | ----------------------------------------------------------- |
+| Average | Average testing duration during the time bucket |
+| Minimum | The shortest testing duration in the time bucket |
+| Maximum | The longest testing duration in the time bucket |
+| Sum | The total of all testing durations added together |
+| P50 | The value below which 50% of testing durations fall |
+| P95 | The value below which 95% of testing durations fall |
+| P99 | The value below which 99% of testing durations fall |


Consistency with the Time in queue table. Compared to the existing table at lines 98–106, this new table differs in two small ways:

Trailing punctuation. The Time in queue rows end with periods; these rows do not.

"Average" row wording. The Time in queue version reads Average of all time in queue during the time bucket, but here it's Average testing duration during the time bucket. The parallel phrasing reads better when the two tables sit on the same page.

Note the new percentile rows actually read more clearly than the existing ones (The value below which 50% of testing durations fall vs. the older The value below 50% of the time in queue falls., which is grammatically awkward). Up to you whether to (a) align to the existing style for now or (b) fix the older table to match the new, clearer wording in a follow-up.

Minimal alignment to existing style:

Suggested change

| Measure | Explanation |

| ------- | ----------------------------------------------------------- |

| Average | Average testing duration during the time bucket |

| Minimum | The shortest testing duration in the time bucket |

| Maximum | The longest testing duration in the time bucket |

| Sum | The total of all testing durations added together |

| P50 | The value below which 50% of testing durations fall |

| P95 | The value below which 95% of testing durations fall |

| P99 | The value below which 99% of testing durations fall |

| Measure | Explanation |

| ------- | --------------------------------------------------------- |

| Average | Average of all testing durations during the time bucket. |

| Minimum | The shortest testing duration in the time bucket. |

| Maximum | The longest testing duration in the time bucket. |

| Sum | The total of all testing durations added together. |

| P50 | The value below which 50% of testing durations fall. |

| P95 | The value below which 95% of testing durations fall. |

| P99 | The value below which 99% of testing durations fall. |

claude · 2026-05-12T18:10:35Z

+
+Two filters let you narrow the data:
+
+* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.


Vague "and others". Options include Passed, Failed, and others. leaves the reader wondering what "others" are. If the dropdown has a fixed, enumerable set of options (e.g., Passed / Failed / Canceled / Timed out), list them. If the set is open-ended or evolving, say so explicitly (e.g., "and additional outcomes such as Canceled and Timed out") so the reader knows whether to expect to discover more.

claude · 2026-05-12T18:10:37Z

+Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
+{% endhint %}
+
+The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.


Casing inconsistency. Testing Metrics is title-cased here, but the surrounding section names on this page are sentence case (e.g., Time in queue, Conclusion count, Drill down into metrics). If Testing Metrics is a literal UI label, consider quoting or pointing that out so it doesn't read like a style slip; otherwise lowercase to Testing metrics for consistency.

claude · 2026-05-12T18:10:40Z


+### Testing duration
+
+Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).


Terminology clarification. The phrase the test cycle reaches a final state (merged, failed, or canceled) mixes terminology: elsewhere on this page the Conclusion count table (line 88) uses Pass / Failure / Cancel categories, where "Merged by Trunk" is a reason within Pass. Consider matching that vocabulary, e.g. …reaches a final state (passed, failed, or canceled), or align with the Cycle ended in filter values you describe on line 125 (Merged / Failed). Whichever you pick, the same three terms ideally show up in both places.

github-actions

Auto-approved: Claude code review passed.

samgutentag · 2026-05-12T19:31:50Z

Verification status (2026-05-12): pending

Eng merged but flag off in prod. Hold off.

Eng PR: trunk-io/trunk2#3919 (merged 2026-05-08, merge commit intact on main)
Flag: displayMergeHealthTestDuration
Signals:
- 0 follow-up PRs in trunk-io/trunk2 reference the flag (exact-phrase search)
- ts/apps/e2e/flags.json has no entry for this flag (defaults to off in e2e)
- LaunchDarkly bot posts in #staging-notifications and #production-notifications are dated 2026-05-07, before the eng PR merged on 2026-05-08. Per the recency rule these count as background only, not rollout confirmation.
- No post-merge Slack messages confirming staging or prod rollout
- Eng PR body explicitly states "This is behind a feature flag" and lists "allow drilling down" as the next phase, suggesting rollout is not imminent

Note: PR #647 documents the same feature and received the same verdict. Coordinate which docs PR to land before merging.

Wait for the flag to be enabled in prod, then re-run /verify-docs-pr 649.

Folds in the strongest pieces from PR #649 so it can be closed without losing work: - Adds the CI-vs-queue-wait-time framing line - Wraps the restart caveat in a {% hint style="info" %} block (matches the page's existing pattern) - Adds a worked filter example combining Outcome and Cycle ended in - Cross-links to the Time in queue section - Promotes the stat measures table under its own #### subheading - Drops em dashes per house style Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

samgutentag · 2026-05-12T19:41:31Z

Closing this in favor of #647, which documents the same feature (Testing Duration chart from trunk-io/trunk2#3919).

The following pieces of this PR have been carried forward into #647 (commit ca90308):

The CI-vs-queue-wait-time framing ("Use this chart to understand CI performance specifically, separate from queue wait time")
The worked filter example (Outcome: Passed + Cycle ended in: Merged for a clean CI baseline)
The [Time in queue](#time-in-queue) cross-link
The restart caveat wrapped in a {% hint style="info" %} block (matches the page's existing pattern)
A dedicated #### Statistical measures subheading

What was kept from #647 instead of this PR:

Complete enumeration of Outcome filter values (All Outcomes / Passed / Failed / Interrupted / Cancelled)
Complete enumeration of Cycle ended in filter values (All / Merged / Failed / Cancelled / In Flight)
The cross-chart bucketing caveat (no hover sync across charts)
Real em dashes for now, normalized to periods in the merged section

No work is lost. See #647 for the consolidated PR.

claude Bot reviewed May 12, 2026

View reviewed changes

github-actions Bot approved these changes May 12, 2026

View reviewed changes

samgutentag changed the title ~~docs(merge-queue): document Testing Duration metrics chart~~ [feature not live] docs(merge-queue): document Testing Duration metrics chart May 12, 2026

samgutentag closed this May 12, 2026

claude Bot mentioned this pull request May 12, 2026

[feature not live] docs(merge-queue): add Testing Duration chart to metrics page #647

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature not live] docs(merge-queue): document Testing Duration metrics chart#649

[feature not live] docs(merge-queue): document Testing Duration metrics chart#649
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/merge-queue-testing-duration-metrics

samgutentag commented May 12, 2026

Uh oh!

trunk-staging-io Bot commented May 12, 2026

Uh oh!

claude Bot commented May 12, 2026 •

edited

Loading

Uh oh!

claude Bot May 12, 2026

Uh oh!

claude Bot May 12, 2026

Uh oh!

claude Bot May 12, 2026

Uh oh!

claude Bot May 12, 2026

Uh oh!

claude Bot May 12, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

samgutentag commented May 12, 2026

Uh oh!

samgutentag commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants


		Two filters let you narrow the data:

		* Outcome -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select All Outcomes to see the full distribution.


		### Testing duration

		Testing duration shows how long each PR spends in the TESTING phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).

Conversation

samgutentag commented May 12, 2026

Summary

Context

Test plan

Uh oh!

trunk-staging-io Bot commented May 12, 2026

Uh oh!

claude Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review complete

Summary

Uh oh!

claude Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

samgutentag commented May 12, 2026

Uh oh!

samgutentag commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 12, 2026 •

edited

Loading