Add CI Visibility instrumentation for JMH benchmarks by robertpi · Pull Request #11498 · DataDog/dd-trace-java

robertpi · 2026-05-29T08:37:28Z

Summary

Adds CI Visibility support for JMH (Java Microbenchmark Harness, org.openjdk.jmh), the dominant Java microbenchmarking framework. Benchmark runs are now reported as test spans in the Datadog test explorer with performance metrics attached.

Closes SDTEST-930.

How it works

JMH's OutputFormat interface receives lifecycle callbacks exactly once per benchmark method. We instrument BaseRunner.<init> with bytecode advice to wrap the user's OutputFormat with our DDOutputFormat decorator — this is the only hook needed, with zero overhead on the benchmark hot path.

Each benchmark method produces:

A suite span (test_suite_end) for the benchmark class
A test span (test) for the benchmark method

With benchmark-specific metric tags on the test span:

Tag	Source
`benchmark.value`	Aggregated primary score
`benchmark.error`	99.9% CI half-width
`benchmark.unit`	e.g. `"ns/op"`, `"ops/ms"`
`benchmark.run.mode`	e.g. `"avgt"`, `"thrpt"`
`benchmark.run.iterations`	Measurement iteration count
`benchmark.run.warmup_iterations`	Warmup iteration count
`benchmark.run.forks`	Fork count
`benchmark.run.threads`	Thread count
`benchmark.run.time_unit`	e.g. `"NANOSECONDS"`
`benchmark.p50/p90/p95/p99`	Percentiles (when N > 1)
`benchmark.min` / `benchmark.max`	Bounds
`benchmark.sample_count`	Total samples

@Param-parameterised benchmarks follow the same test.parameters convention as JUnit 5 parameterized tests: {"metadata":{"test_name":"myMethod:size=1000"}}.

Changes

New module dd-java-agent/instrumentation/jmh/jmh-1.0/ — instrumentation + integration tests + fixture templates
New module dd-smoke-tests/jmh/ — end-to-end smoke test with real agent
Tags.java — 17 new benchmark.* tag constants
TestFrameworkInstrumentation — new JMH enum value
TestDecorator — new TEST_TYPE_BENCHMARK constant (for future use)
supported-configurations.json — DD_TRACE_JMH_ENABLED registered

Note on integration test style

JmhInstrumentationTest is a Groovy/Spock test extending CiVisibilityInstrumentationTest. This is an intentional exception to the JUnit 5 convention: CiVisibilityInstrumentationTest is a Spock Specification subclass whose setup()/cleanup() lifecycle cannot be triggered by the JUnit 5 runner. All other CI Visibility instrumentation tests in the codebase follow this same pattern.

Test plan

./gradlew :dd-java-agent:instrumentation:jmh:jmh-1.0:test — unit tests (JmhUtilsTest) + integration tests (simple benchmark, parameterised benchmark via fixture comparison)
./gradlew :dd-smoke-tests:jmh:test — end-to-end smoke test: forks a JVM with the agent, runs a JMH benchmark, asserts span tags and benchmark.value > 0

🤖 Generated with Claude Code

Instruments JMH's Runner constructor to wrap its OutputFormat with a DDOutputFormat decorator. The decorator fires once per benchmark method (after all forks and iterations complete) to emit CI Visibility test spans — zero overhead on the benchmark hot path. Each benchmark method produces a suite span + test span with benchmark metrics (score, error, unit, percentiles, run config) attached as tags. Parameterised @Param benchmarks follow the same test.parameters convention as JUnit 5 parameterized tests. Changes: - New module: dd-java-agent/instrumentation/jmh/jmh-1.0 - Tags.java: add benchmark.* tag constants - TestFrameworkInstrumentation: add JMH enum value - TestDecorator: add TEST_TYPE_BENCHMARK constant - Design spec: docs/design/jmh-ci-visibility.md Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Groovy/Spock integration tests extending CiVisibilityInstrumentationTest that run JMH benchmarks in-process (forks=0) and verify the emitted CI Visibility spans against FTL fixture templates. Covers: - Simple (unparameterized) benchmark: suite + test spans with benchmark run config metrics (mode, unit, iterations, forks, threads, time_unit) - Parameterised benchmark (@Param): two test spans with test.parameters set following the JUnit 5 convention Also fixes: - BaseRunner instrumented instead of Runner (JDK 17+ rejects PUTFIELD on a final field of a superclass from advice injected into the subclass) - JMH annotation processor added to testAnnotationProcessor so that META-INF/BenchmarkList is generated at test compile time - DD_TRACE_JMH_ENABLED registered in supported-configurations.json Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Java JUnit 5 smoke test that forks a real JVM subprocess with the dd-java-agent attached, runs a JMH benchmark in-process (forks=0) against a MockBackend, and verifies that the expected CI Visibility spans arrive with correct tags: - test.framework = "jmh" - test.name, test.suite, test.status - benchmark.run.mode, benchmark.unit - benchmark.value > 0 (measured score actually present) The benchmark class (SmokeTestBenchmark) lives in src/main/java so the JMH annotation processor can generate META-INF/BenchmarkList at compile time, making it available on the classpath that is passed to the subprocess. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

1. splitBenchmarkName returned the full parameterised suffix as the test name (e.g. "myMethod:size=1000") instead of just the method name ("myMethod"). Fix: use baseName (param-stripped) for the method slice. 2. endBenchmark had no null guard — if called without a prior startBenchmark the handler would receive null keys. Fix: early-return when suiteKey/testKey are null. 3. handler.close() in endRun was not in a finally block, so a crash in close() would swallow delegate.endRun(); and an exception in endBenchmark could bypass close() entirely. Fix: try/finally in both endBenchmark and endRun. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

datadog-datadog-prod-us1-2 · 2026-05-29T08:37:37Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 6 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-java | check_inst 4/4

🔧 Fix in code (Fix with Cursor).
CodeNarc rule violations found. Exceeded maximum number of priority 3 violations, see details in report.

Run system tests | main / End-to-end #5 / play 5

🔧 Fix in code (Fix with Cursor).
1 failed test. ValueError: No span validates this test at utils/interfaces/_library/core.py:460.

DataDog/apm-reliability/dd-trace-java | test_inst: [25, 5/8]

🔄 Retry job. This looks flaky and may succeed on retry.
Job failed: execution took longer than 1h0m0s seconds due to timeout.

View all 6 failed jobs.

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 942e468 | Docs | Datadog PR Page | Give us feedback!}

robertpi · 2026-05-29T09:05:34Z

JmhInstrumentationTest.groovy is a new Groovy file — tag: override-groovy-enforcement added as a justified exception.

CiVisibilityInstrumentationTest is a Spock Specification subclass. Its setup()/cleanup() lifecycle hooks are driven by the Spock runner and cannot be triggered by the JUnit 5 engine. A Java subclass was attempted and confirmed non-functional for this reason. All other CI Visibility instrumentation tests in the repo follow this same Groovy pattern for the same reason.

cit-pr-commenter-54b7da · 2026-06-01T10:20:45Z

Test Environment - sbt-scalatest

Job Status: success

Scenario	Overhead (%)
agent	55.04
agentEvpProxy	54.98

cit-pr-commenter-54b7da · 2026-06-01T10:23:33Z

Test Environment - nebula-release-plugin

Job Status: success

Scenario	Overhead (%)
agent	36.71
agentless	36.91
agentlessCodeCoverage	44.32
agentlessLineCoverage	73.57

cit-pr-commenter-54b7da · 2026-06-01T10:25:33Z

Test Environment - pass4s

Job Status: success

Scenario	Overhead (%)
agent	15.66
agentless	15.88
agentlessCodeCoverage	20.02

cit-pr-commenter-54b7da · 2026-06-01T10:26:22Z

Test Environment - reactive-streams-jvm

Job Status: success

Scenario	Overhead (%)
agent	21.68
agentless	18.59
agentlessCodeCoverage	21.13
agentlessLineCoverage	29.96

dd-octo-sts · 2026-06-01T10:27:25Z

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite	Status
Startup	🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Scenario	Candidate	master	Δ (95% CI of mean)
startup:insecure-bank:iast:Agent	13.93 s	13.91 s	[-0.9%; +1.2%] (no difference)
startup:insecure-bank:tracing:Agent	12.84 s	13.00 s	[-2.6%; +0.0%] (no difference)
startup:petclinic:appsec:Agent	16.70 s	16.44 s	[+0.4%; +2.8%] (maybe worse)
startup:petclinic:iast:Agent	16.58 s	16.69 s	[-2.1%; +0.8%] (no difference)
startup:petclinic:profiling:Agent	16.44 s	16.63 s	[-2.4%; +0.2%] (no difference)
startup:petclinic:tracing:Agent	15.89 s	15.99 s	[-1.9%; +0.7%] (no difference)

Commit: 942e4683 · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

cit-pr-commenter-54b7da · 2026-06-01T10:32:20Z

Test Environment - sonar-kotlin

Job Status: success

Scenario	Overhead (%)
agent	12.93
agentless	11.93
agentlessCodeCoverage	15.67
agentlessLineCoverage	18.82

cit-pr-commenter-54b7da · 2026-06-01T10:34:47Z

Test Environment - jolokia

Job Status: success

Scenario	Overhead (%)
agent	94.41
agentless	91.65
agentlessCodeCoverage	100.11
agentlessLineCoverage	101.37

cit-pr-commenter-54b7da · 2026-06-01T10:41:03Z

Test Environment - okhttp

Job Status: success

Scenario	Overhead (%)
agent	19.36
agentless	19.68
agentlessCodeCoverage	22.52
agentlessLineCoverage	44.49

cit-pr-commenter-54b7da · 2026-06-01T10:48:20Z

Test Environment - spring_boot

Job Status: success

Scenario	Overhead (%)
agent	16.11
agentless	10.25
agentlessCodeCoverage	13.45
agentlessLineCoverage	33.00

cit-pr-commenter-54b7da · 2026-06-01T11:04:14Z

Test Environment - sonar-java

Job Status: success

Scenario	Overhead (%)
agent	8.92
agentless	14.81
agentlessCodeCoverage	110.02
agentlessLineCoverage	153.86

robertpi and others added 4 commits May 28, 2026 15:26

robertpi added type: feature request comp: ci visibility Continuous Integration Visibility tag: ai generated Largely based on code generated by an AI or LLM labels May 29, 2026

robertpi added the tag: override-groovy-enforcement Override the "Enforce Groovy Migration" check label May 29, 2026

Merge branch 'master' into sdtest-930-jmh-ci-visibility

942e468

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CI Visibility instrumentation for JMH benchmarks#11498

Add CI Visibility instrumentation for JMH benchmarks#11498
robertpi wants to merge 5 commits into
masterfrom
sdtest-930-jmh-ci-visibility

robertpi commented May 29, 2026

Uh oh!

datadog-datadog-prod-us1-2 Bot commented May 29, 2026 •

edited by datadog-prod-us1-5 Bot

Loading

Uh oh!

robertpi commented May 29, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robertpi commented May 29, 2026

Summary

How it works

Changes

Note on integration test style

Test plan

Uh oh!

datadog-datadog-prod-us1-2 Bot commented May 29, 2026 • edited by datadog-prod-us1-5 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

robertpi commented May 29, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - sbt-scalatest

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - nebula-release-plugin

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - pass4s

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - reactive-streams-jvm

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026

🟢 Java Benchmark SLOs — All performance SLOs passed

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - sonar-kotlin

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - jolokia

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - okhttp

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - spring_boot

Uh oh!

cit-pr-commenter-54b7da Bot commented Jun 1, 2026

Test Environment - sonar-java

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

datadog-datadog-prod-us1-2 Bot commented May 29, 2026 •

edited by datadog-prod-us1-5 Bot

Loading