feat: add Browser RUM dashboard template#2413
Conversation
Adds a "Browser RUM" template to the dashboards gallery for browser sessions instrumented with the HyperDX Browser SDK (or any OTel browser instrumentation emitting a rum.sessionId resource attribute): - Performance Overview: page-view/session/error KPIs, Core Web Vitals (LCP/INP/CLS) p75, median/p75/p90 page-load percentiles, long tasks - Page Views Breakdown: traffic by URL, browser, country, device size (derived from screen.xy) - Errors section with tabs (overview, JS exceptions by message and by page, failing API calls) - Six dashboard filters: Service, Environment, Service Version, Page URL, Browser, Country Top Browsers / Top Countries tiles and the Browser/Country filters populate when the collector's useragent and geoip processors are on.
🦋 Changeset detectedLatest commit: d742182 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
Deep Review✅ No critical issues found. This PR is a declarative dashboard template ( 🟡 P2 -- recommended
🔵 P3 nitpicks (5)
Reviewers (6): correctness, maintainability, testing, project-standards, performance, learnings-researcher. Testing gaps:
|
E2E Test Results✅ All tests passed • 197 passed • 3 skipped • 1364s
Tests ran across 4 shards in parallel. |
Trim the dashboard description to a single sentence to match the length and style of the existing runtime-metrics templates.
The browser is already captured out of the box: the OTel document-load instrumentation sets http.user_agent (navigator.userAgent) on documentLoad spans. The template was instead grouping on user_agent.name / user_agent.original, which require collector-side enrichment that isn't present by default, so Top Browsers came up empty against real data. - Top Browsers now parses the browser name from SpanAttributes ['http.user_agent'] in SQL (Edge/Opera/Firefox/Chrome/Safari/Other), scoped to spans carrying the UA. Works with no SDK or collector change. - Removed the dashboard-level Browser filter: http.user_agent only exists on documentLoad spans, so a cross-tile filter keyed on it would zero out every non-documentLoad tile. It can return once the UA is promoted to a resource attribute (present on every span). Country tile/filter still depend on the collector geoip processor, since the browser cannot determine the user's country.
The chart builder editor only renders a WHERE input bound to the per-series aggCondition (ChartSeriesEditor); the top-level `where` input renders solely for Search-type tiles (ChartEditorControls.tsx:148 vs :334). So builder tiles that stored their filter in top-level `where` showed an empty WHERE box even though the filter applied correctly in SQL (renderChartConfig reads config.where directly). This affected nearly every tile, not just Page Views; the earlier OR-vs-AND theory was a red herring. Move each tile's filter from top-level `where` into the aggCondition of every select (clearing `where`). renderChartConfig promotes an all-selects aggCondition back into a real WHERE clause (renderChartConfig.ts:944,1019), so for a single shared condition the rendered query is result-identical (count() WHERE c == countIf(c) WHERE c, etc.) while the condition now shows in the editor. Left unchanged: Errors over Time and Top Errored Sessions, which already use per-series aggConditions (their meaningful conditions already display; their top-level where is only the broad rum.sessionId scope). Verified: dashboardTemplates schema test + app ci:lint pass; SQL result-equivalence confirmed by reading renderChartConfig's aggCondition promotion. Live editor click-through deferred (dev stack down).
Wire up the table onClick row-action (SavedChartConfig.onClick, type
'search') on the tables whose grouped value reverses cleanly into a
search filter:
- Top Errored Sessions -> opens the session's spans
(rum.sessionId:"{{Session}}") — the client-side tracing drilldown
- Top URLs / Slowest Pages -> page views / doc loads for that URL
- Errors per Page -> errors for that URL
- Top JS Errors -> spans for that exception message
Each targets the Traces source by name ({ mode: 'id', id: 'Traces' });
the import flow auto-matches that to the user's mapped source and
rewrites it to the concrete ID (DBDashboardImportPage onClick mapping +
convertToDashboardDocument), so it stays portable. whereTemplate uses
Handlebars row-column variables. Skipped tiles whose group key can't be
reversed (Top Failing API Calls concat, Top Browsers/Countries/Device
derived buckets).
Builder tables without an onClick fall back to buildTableRowSearchUrl, which derives the drilldown from config.where — now empty (filters moved to aggCondition), so those drilldowns lost their scope. And the derived group keys (browser/device/concat) don't reverse into a filter. There's no template-level way to disable a builder-table row action, so give the remaining tables a correct onClick instead: - Top JS Errors: match the coalesced group value across exception.message / message / SpanName (it previously only matched exception.message, so e.g. an "unhandledrejection" row returned nothing). - Top Browsers: substring-match the parsed name against http.user_agent. - Top Countries: exact geo.country.name match. - Top Failing API Calls: regroup by http.url so the row reverses; drill into fetch/xhr calls to that endpoint. - Top Device Sizes: regroup by raw screen.xy so the row reverses; drill into documentLoad spans at that resolution. Every table now has a working, scoped row action; the scope-less legacy fallback no longer fires.
Greptile SummaryThis PR adds a new Browser RUM dashboard template — a purely declarative JSON file registering 18 tiles across three sections (Performance Overview, Page Views Breakdown, and Errors) plus five dashboard-level filters scoped to
Confidence Score: 5/5Safe to merge — all changes are additive, declarative JSON with no runtime logic. The change is purely a new JSON template and a one-line import registration. No existing behavior is altered, and the schema validation test already guards against malformed templates. The one inconsistency found (Top Countries onClick not covering the iso_code fallback) is a display-level drilldown edge case that does not affect correctness of the main dashboard tiles. No files require special attention; the minor drilldown inconsistency in browser-rum.json is self-contained. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
T["Browser RUM Template"]
T --> F["5 Filters\n(Service / Env / Version / Page URL / Country)"]
T --> C1["Performance Overview\n(rum-perf)"]
T --> C2["Page Views Breakdown\n(rum-breakdown)"]
T --> C3["Errors\n(rum-errors)"]
C1 --> P1["Page Views KPI (rum-006)"]
C1 --> P2["Median Page Load (rum-017)"]
C1 --> P3["p90 Page Load (rum-018)"]
C1 --> P4["LCP / INP / CLS p75 (rum-001/002/003)"]
C1 --> P5["Active Sessions (rum-005)"]
C1 --> P6["Sessions w/ Errors (rum-019)"]
C1 --> P7["Page Load time series (rum-020)"]
C1 --> P8["Page Views & Long Tasks (rum-021/024)"]
C2 --> B1["Top URLs (rum-014)"]
C2 --> B2["Top Browsers (rum-026)"]
C2 --> B3["Top Countries (rum-027)"]
C2 --> B4["Top Device Sizes (rum-028)"]
C2 --> B5["Slowest Pages p90 (rum-011)"]
C2 --> B6["Top Errored Sessions (rum-016)"]
C3 --> TAB1["Overview Tab\nJS Errors + AJAX Errors KPIs + Errors over Time\n(rum-007/008/010)"]
C3 --> TAB2["JS Exceptions Tab\nBy message + by page\n(rum-012/015)"]
C3 --> TAB3["API Failures Tab\nTop Failing API Calls\n(rum-013)"]
Reviews (5): Last reviewed commit: "Merge branch 'main' into teeohhem/hackat..." | Re-trigger Greptile |
…ition Code-review fixes for the Errors section: 1. AJAX Errors KPI (rum-008) and Top Failing API Calls (rum-013) had no rum.sessionId guard, so server-side fetch/xhr spans could inflate the counts relative to the rest of the dashboard. Add the SQL equivalent of the lucene rum.sessionId:* guard the sibling tiles use (ResourceAttributes['rum.sessionId'] != ''). 2. The AJAX Errors KPI counted status>=400 OR error span status, while the "Errors over Time" AJAX series only counted error span status — so a 404 with no error status hit the KPI but not the chart. Align the chart's AJAX series to the same (more complete) definition so the KPI total and the chart line measure the identical event set.
pulpdrew
left a comment
There was a problem hiding this comment.
Cool stuff, nice to see some newer dashboard features being exercised! Couple of suggestions
| "config": { | ||
| "name": "Median Page Load (ms)", | ||
| "source": "Traces", | ||
| "displayType": "number", | ||
| "granularity": "auto", | ||
| "alignDateRangeToGranularity": true, | ||
| "select": [ | ||
| { | ||
| "aggFn": "quantile", | ||
| "level": 0.5, | ||
| "valueExpression": "Duration / 1000000", | ||
| "aggCondition": "SpanName:\"documentLoad\"", | ||
| "aggConditionLanguage": "lucene" | ||
| } | ||
| ], | ||
| "where": "", | ||
| "whereLanguage": "lucene", | ||
| "numberFormat": { | ||
| "output": "number", | ||
| "mantissa": 0, | ||
| "thousandSeparated": true | ||
| } |
There was a problem hiding this comment.
Is there a change this will be the wrong precision for duration? Same thing elsewhere.
Instead, we could remove the divisor and numberFormat, so that duration format with the correct precision will be inferred:
| "config": { | |
| "name": "Median Page Load (ms)", | |
| "source": "Traces", | |
| "displayType": "number", | |
| "granularity": "auto", | |
| "alignDateRangeToGranularity": true, | |
| "select": [ | |
| { | |
| "aggFn": "quantile", | |
| "level": 0.5, | |
| "valueExpression": "Duration / 1000000", | |
| "aggCondition": "SpanName:\"documentLoad\"", | |
| "aggConditionLanguage": "lucene" | |
| } | |
| ], | |
| "where": "", | |
| "whereLanguage": "lucene", | |
| "numberFormat": { | |
| "output": "number", | |
| "mantissa": 0, | |
| "thousandSeparated": true | |
| } | |
| "config": { | |
| "name": "Median Page Load", | |
| "source": "Traces", | |
| "displayType": "number", | |
| "granularity": "auto", | |
| "alignDateRangeToGranularity": true, | |
| "select": [ | |
| { | |
| "aggFn": "quantile", | |
| "level": 0.5, | |
| "valueExpression": "Duration", | |
| "aggCondition": "SpanName:\"documentLoad\"", | |
| "aggConditionLanguage": "lucene" | |
| } | |
| ], | |
| "where": "", | |
| "whereLanguage": "lucene" |
There was a problem hiding this comment.
Good call — done in d3a5b0f. The page-load duration tiles (`rum-017` median, `rum-018` p90, the page-load line `rum-020`, and Slowest Pages `rum-011`) now select the raw `Duration` expression with no divisor and no manual `numberFormat`, so the duration format is inferred at the correct precision.
| "config": { | ||
| "name": "LCP p75 (ms)", | ||
| "source": "Traces", | ||
| "displayType": "number", | ||
| "granularity": "auto", | ||
| "alignDateRangeToGranularity": true, | ||
| "select": [ | ||
| { | ||
| "aggFn": "quantile", | ||
| "level": 0.75, |
There was a problem hiding this comment.
This is interesting, we don't support p75 through the app, so this renders as an empty aggFn.
Do we need p75 or could we do p95? If p75, maybe we should try a custom aggregation to populate the dropdown correctly.
Sidenote, we should probably add a validation so we don't accept this during import, or expand support to custom quantile levels.
There was a problem hiding this comment.
Switched the Core Web Vitals p75 KPIs (LCP/INP/CLS) to a custom aggregation (`aggFn: "none"`, `quantile(0.75)(...)`) in d3a5b0f so they render correctly instead of an empty aggFn. We do want p75 specifically here — it's the standard Core Web Vitals reporting percentile. The page-load duration tiles use the supported p50/p90/p99 levels. Agree that an import-time validation rejecting unsupported quantile levels would be a good app-side follow-up.
| "groupBy": [ | ||
| { | ||
| "valueExpression": "coalesce(nullif(SpanAttributes['http.url'], ''), nullif(SpanAttributes['page.url'], ''), nullif(SpanAttributes['location.href'], ''))", | ||
| "alias": "URL" | ||
| } | ||
| ], |
There was a problem hiding this comment.
Converted every table's `groupBy` to string form (` AS ""`) in d3a5b0f, which renders in the builder. Keeping the `AS ""` preserves both the column header and the row-data key that each tile's `onClick.whereTemplate` Handlebars vars depend on. Agreed an import-time transform/validation for array groupBy would be a worthwhile app-side follow-up.
| "w": 24, | ||
| "h": 8, | ||
| "config": { | ||
| "name": "Top Errored Sessions", |
There was a problem hiding this comment.
A few of these tables could probably benefit from setting groupByColumnsOnLeft so they read more naturally
There was a problem hiding this comment.
Done — added `groupByColumnsOnLeft: true` to all the breakdown and error tables in d3a5b0f, so the grouped column reads on the left.
| "mode": "id", | ||
| "id": "Traces" | ||
| }, | ||
| "whereTemplate": "ResourceAttributes.rum.sessionId:\"{{Session}}\"", |
There was a problem hiding this comment.
Nice to see this getting used!
| { | ||
| "aggFn": "quantile", | ||
| "level": 0.75, | ||
| "valueExpression": "Duration / 1000000", | ||
| "aggCondition": "SpanName:\"documentLoad\"", | ||
| "aggConditionLanguage": "lucene", | ||
| "alias": "Page Load p75 (ms)" | ||
| }, | ||
| { | ||
| "aggFn": "count", | ||
| "valueExpression": "", | ||
| "alias": "Views", | ||
| "aggCondition": "SpanName:\"documentLoad\"", | ||
| "aggConditionLanguage": "lucene" | ||
| } |
There was a problem hiding this comment.
We could add per-series numberFormats here to render the p75 as a duration and the count as a number
There was a problem hiding this comment.
Addressed via the duration-inference change in d3a5b0f — the Slowest Pages table now selects raw `Duration` for the percentile column (auto-formatted as a duration) while the `Views` count stays a plain number, so we get the per-column formatting without a manual `numberFormat`.
- Page-load duration tiles (median/p90 KPIs, line, Slowest Pages) now select raw Duration with supported percentile levels (p50/p90/p99) instead of Duration/1000000 + manual numberFormat, so the builder infers human-readable duration formatting and the editor shows real percentile options. - Core Web Vitals p75 KPIs (LCP/INP/CLS) use a custom aggregation (quantile(0.75)(...)) so the unsupported 0.75 level renders correctly. - Table groupBys converted from array to string form (E AS "Alias"), preserving column headers and row-click drilldown keys, and grouped columns now render on the left (groupByColumnsOnLeft).
|
Thanks for the thorough review @pulpdrew! Pushed
The two greptile P1 AJAX items were already fixed in A couple of suggestions point at app-side follow-ups rather than this template (import-time validation/transformation for unsupported quantile levels and array |
pulpdrew
left a comment
There was a problem hiding this comment.
A couple of suggestions point at app-side follow-ups rather than this template (import-time validation/transformation for unsupported quantile levels and array groupBy) — happy to file those separately.
Agreed, those make sense as separate improvements
LGTM!

Summary
Adds a Browser RUM template to the dashboards gallery (
Dashboards → Templates) for browser sessions instrumented with the HyperDX Browser SDK — or any OpenTelemetry browser instrumentation that emits arum.sessionIdresource attribute. It fills a gap: HyperDX ships a browser SDK but had no out-of-the-box RUM dashboard. The template is purely declarative JSON validated by the existingdashboardTemplatesschema test; the only code change is registering it indashboardTemplates/index.ts, plus a changeset.The dashboard is organized into three sections: Performance Overview (page-view/session/error KPIs, Core Web Vitals LCP/INP/CLS p75, median/p90/p99 page-load percentiles, long tasks), Page Views Breakdown (traffic by URL, browser, country, and device size derived from
screen.xy), and a tabbed Errors section (overview, JS exceptions by message and by page, failing API calls). It also defines five dashboard-level filters: Service, Environment, Service Version, Page URL, and Country.Screenshots or video
Tab-1780519086079.webm
How to test locally or on Vercel
Dashboards → Templates → Browser RUM → Import, then map each tile/filter to your Traces source (auto-maps if a source named "Traces" exists).@hyperdx/browserat your collector (or seedwebvitals/documentLoad/ fetch+xhr / exception spans carryingrum.sessionId). Verify the KPIs, Core Web Vitals, breakdown tables, and Errors tabs populate, and that the five filters apply.useragentandgeoipprocessors are enabled (noted in the tile titles + dashboard description).References