Skip to content

Add CI Visibility instrumentation for JMH benchmarks#11498

Draft
robertpi wants to merge 5 commits into
masterfrom
sdtest-930-jmh-ci-visibility
Draft

Add CI Visibility instrumentation for JMH benchmarks#11498
robertpi wants to merge 5 commits into
masterfrom
sdtest-930-jmh-ci-visibility

Conversation

@robertpi
Copy link
Copy Markdown
Member

Summary

Adds CI Visibility support for JMH (Java Microbenchmark Harness, org.openjdk.jmh), the dominant Java microbenchmarking framework. Benchmark runs are now reported as test spans in the Datadog test explorer with performance metrics attached.

Closes SDTEST-930.

How it works

JMH's OutputFormat interface receives lifecycle callbacks exactly once per benchmark method. We instrument BaseRunner.<init> with bytecode advice to wrap the user's OutputFormat with our DDOutputFormat decorator — this is the only hook needed, with zero overhead on the benchmark hot path.

Each benchmark method produces:

  • A suite span (test_suite_end) for the benchmark class
  • A test span (test) for the benchmark method

With benchmark-specific metric tags on the test span:

Tag Source
benchmark.value Aggregated primary score
benchmark.error 99.9% CI half-width
benchmark.unit e.g. "ns/op", "ops/ms"
benchmark.run.mode e.g. "avgt", "thrpt"
benchmark.run.iterations Measurement iteration count
benchmark.run.warmup_iterations Warmup iteration count
benchmark.run.forks Fork count
benchmark.run.threads Thread count
benchmark.run.time_unit e.g. "NANOSECONDS"
benchmark.p50/p90/p95/p99 Percentiles (when N > 1)
benchmark.min / benchmark.max Bounds
benchmark.sample_count Total samples

@Param-parameterised benchmarks follow the same test.parameters convention as JUnit 5 parameterized tests: {"metadata":{"test_name":"myMethod:size=1000"}}.

Changes

  • New module dd-java-agent/instrumentation/jmh/jmh-1.0/ — instrumentation + integration tests + fixture templates
  • New module dd-smoke-tests/jmh/ — end-to-end smoke test with real agent
  • Tags.java — 17 new benchmark.* tag constants
  • TestFrameworkInstrumentation — new JMH enum value
  • TestDecorator — new TEST_TYPE_BENCHMARK constant (for future use)
  • supported-configurations.jsonDD_TRACE_JMH_ENABLED registered

Note on integration test style

JmhInstrumentationTest is a Groovy/Spock test extending CiVisibilityInstrumentationTest. This is an intentional exception to the JUnit 5 convention: CiVisibilityInstrumentationTest is a Spock Specification subclass whose setup()/cleanup() lifecycle cannot be triggered by the JUnit 5 runner. All other CI Visibility instrumentation tests in the codebase follow this same pattern.

Test plan

  • ./gradlew :dd-java-agent:instrumentation:jmh:jmh-1.0:test — unit tests (JmhUtilsTest) + integration tests (simple benchmark, parameterised benchmark via fixture comparison)
  • ./gradlew :dd-smoke-tests:jmh:test — end-to-end smoke test: forks a JVM with the agent, runs a JMH benchmark, asserts span tags and benchmark.value > 0

🤖 Generated with Claude Code

robertpi and others added 4 commits May 28, 2026 15:26
Instruments JMH's Runner constructor to wrap its OutputFormat with a
DDOutputFormat decorator. The decorator fires once per benchmark method
(after all forks and iterations complete) to emit CI Visibility test
spans — zero overhead on the benchmark hot path.

Each benchmark method produces a suite span + test span with benchmark
metrics (score, error, unit, percentiles, run config) attached as tags.
Parameterised @Param benchmarks follow the same test.parameters convention
as JUnit 5 parameterized tests.

Changes:
- New module: dd-java-agent/instrumentation/jmh/jmh-1.0
- Tags.java: add benchmark.* tag constants
- TestFrameworkInstrumentation: add JMH enum value
- TestDecorator: add TEST_TYPE_BENCHMARK constant
- Design spec: docs/design/jmh-ci-visibility.md

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Groovy/Spock integration tests extending CiVisibilityInstrumentationTest
that run JMH benchmarks in-process (forks=0) and verify the emitted CI
Visibility spans against FTL fixture templates.

Covers:
- Simple (unparameterized) benchmark: suite + test spans with benchmark
  run config metrics (mode, unit, iterations, forks, threads, time_unit)
- Parameterised benchmark (@Param): two test spans with test.parameters
  set following the JUnit 5 convention

Also fixes:
- BaseRunner instrumented instead of Runner (JDK 17+ rejects PUTFIELD on
  a final field of a superclass from advice injected into the subclass)
- JMH annotation processor added to testAnnotationProcessor so that
  META-INF/BenchmarkList is generated at test compile time
- DD_TRACE_JMH_ENABLED registered in supported-configurations.json

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Java JUnit 5 smoke test that forks a real JVM subprocess with the dd-java-agent
attached, runs a JMH benchmark in-process (forks=0) against a MockBackend,
and verifies that the expected CI Visibility spans arrive with correct tags:
- test.framework = "jmh"
- test.name, test.suite, test.status
- benchmark.run.mode, benchmark.unit
- benchmark.value > 0 (measured score actually present)

The benchmark class (SmokeTestBenchmark) lives in src/main/java so the JMH
annotation processor can generate META-INF/BenchmarkList at compile time,
making it available on the classpath that is passed to the subprocess.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
1. splitBenchmarkName returned the full parameterised suffix as the test
   name (e.g. "myMethod:size=1000") instead of just the method name
   ("myMethod"). Fix: use baseName (param-stripped) for the method slice.

2. endBenchmark had no null guard — if called without a prior
   startBenchmark the handler would receive null keys. Fix: early-return
   when suiteKey/testKey are null.

3. handler.close() in endRun was not in a finally block, so a crash in
   close() would swallow delegate.endRun(); and an exception in
   endBenchmark could bypass close() entirely. Fix: try/finally in both
   endBenchmark and endRun.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@robertpi robertpi added type: feature request comp: ci visibility Continuous Integration Visibility tag: ai generated Largely based on code generated by an AI or LLM labels May 29, 2026
@datadog-datadog-prod-us1-2
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1-2 Bot commented May 29, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 6 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-java | check_inst 4/4   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). CodeNarc rule violations found. Exceeded maximum number of priority 3 violations, see details in report.

Run system tests | main / End-to-end #5 / play 5   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). 1 failed test. ValueError: No span validates this test at utils/interfaces/_library/core.py:460.

DataDog/apm-reliability/dd-trace-java | test_inst: [25, 5/8]   View in Datadog   GitLab

🔄 Retry job. This looks flaky and may succeed on retry. Job failed: execution took longer than 1h0m0s seconds due to timeout.

View all 6 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 942e468 | Docs | Datadog PR Page | Give us feedback!

@robertpi robertpi added the tag: override-groovy-enforcement Override the "Enforce Groovy Migration" check label May 29, 2026
@robertpi
Copy link
Copy Markdown
Member Author

JmhInstrumentationTest.groovy is a new Groovy file — tag: override-groovy-enforcement added as a justified exception.

CiVisibilityInstrumentationTest is a Spock Specification subclass. Its setup()/cleanup() lifecycle hooks are driven by the Spock runner and cannot be triggered by the JUnit 5 engine. A Java subclass was attempted and confirmed non-functional for this reason. All other CI Visibility instrumentation tests in the repo follow this same Groovy pattern for the same reason.

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - sbt-scalatest

Job Status: success

Scenario Overhead (%)
agent 55.04
agentEvpProxy 54.98

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - nebula-release-plugin

Job Status: success

Scenario Overhead (%)
agent 36.71
agentless 36.91
agentlessCodeCoverage 44.32
agentlessLineCoverage 73.57

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - pass4s

Job Status: success

Scenario Overhead (%)
agent 15.66
agentless 15.88
agentlessCodeCoverage 20.02

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - reactive-streams-jvm

Job Status: success

Scenario Overhead (%)
agent 21.68
agentless 18.59
agentlessCodeCoverage 21.13
agentlessLineCoverage 29.96

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 1, 2026

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.93 s 13.91 s [-0.9%; +1.2%] (no difference)
startup:insecure-bank:tracing:Agent 12.84 s 13.00 s [-2.6%; +0.0%] (no difference)
startup:petclinic:appsec:Agent 16.70 s 16.44 s [+0.4%; +2.8%] (maybe worse)
startup:petclinic:iast:Agent 16.58 s 16.69 s [-2.1%; +0.8%] (no difference)
startup:petclinic:profiling:Agent 16.44 s 16.63 s [-2.4%; +0.2%] (no difference)
startup:petclinic:tracing:Agent 15.89 s 15.99 s [-1.9%; +0.7%] (no difference)

Commit: 942e4683 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - sonar-kotlin

Job Status: success

Scenario Overhead (%)
agent 12.93
agentless 11.93
agentlessCodeCoverage 15.67
agentlessLineCoverage 18.82

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - jolokia

Job Status: success

Scenario Overhead (%)
agent 94.41
agentless 91.65
agentlessCodeCoverage 100.11
agentlessLineCoverage 101.37

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - okhttp

Job Status: success

Scenario Overhead (%)
agent 19.36
agentless 19.68
agentlessCodeCoverage 22.52
agentlessLineCoverage 44.49

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - spring_boot

Job Status: success

Scenario Overhead (%)
agent 16.11
agentless 10.25
agentlessCodeCoverage 13.45
agentlessLineCoverage 33.00

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Test Environment - sonar-java

Job Status: success

Scenario Overhead (%)
agent 8.92
agentless 14.81
agentlessCodeCoverage 110.02
agentlessLineCoverage 153.86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: ci visibility Continuous Integration Visibility tag: ai generated Largely based on code generated by an AI or LLM tag: override-groovy-enforcement Override the "Enforce Groovy Migration" check type: feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants