Validate agent-based LLMObs meta_struct export [python@0829a33d95872059d52363b6472a5f7735c6f30d] by mabdinur · Pull Request #7047 · DataDog/system-tests

mabdinur · 2026-05-29T19:44:46Z

Summary

Validation-only draft PR. Not for merge. The goal is purely to run the full system-tests CI and confirm it passes against:

dd-trace-py PR #18254 (munir/agentbased-llmo, commit 0829a33) — agent-based LLMObs export, where LLMObs payloads ride APM traces via meta_struct["_llmobs"] for kept traces.
dd-apm-test-agent PR #370 — extracts/synthesizes EVP LLMObs requests from meta_struct, published as the dev image ddapm-test-agent:dev-llmobs-meta-struct.

Changes

Bump the test agent to the dev dev-llmobs-meta-struct image across all LLM-adjacent surfaces (other scenarios stay on the released image):

INTEGRATION_FRAMEWORKS — anthropic / openai / google_genai LLMObs tests.
PARAMETRIC — covers tests/parametric/test_llm_observability/.
VCRCassettesContainer — backs INTEGRATION_FRAMEWORKS, AI_GUARD, and AI_GUARD_TELEMETRY.

How the tracer build is selected

The [python@0829a33d95872059d52363b6472a5f7735c6f30d] marker in the PR title puts Python into system-tests dev mode. dd-trace-py no longer publishes wheels keyed by dev-branch name, but per-commit wheels are still available in S3, so the marker pins the exact commit SHA of the PR head (load-binary.sh python -> python-load-from-s3 -> prebuilt wheel from dd-trace-py-builds).

Note: S3 artifacts age out after ~2 weeks. If the build later fails to find the wheel, re-run the dd-trace-py GitLab CI for the commit (or update the SHA to a fresher commit).

Expected CI behavior

The Fail if target branch is specified job is expected to FAIL by design — it is a merge-guard that fires whenever a [lang@...] marker is present. This is not a test failure.
All actual system-tests jobs (LLMObs integration + parametric + AI Guard) should pass.

Test plan

INTEGRATION_FRAMEWORKS LLMObs suites pass in CI against the tracer commit + dev test agent
PARAMETRIC LLMObs tests pass
AI_GUARD / AI_GUARD_TELEMETRY pass
No unexpected regressions in other scenarios

Point the integration_frameworks scenario at the dev-tagged dd-apm-test-agent image (dev-llmobs-meta-struct) that synthesizes EVP LLMObs requests from APM trace meta_struct["_llmobs"], so the suite can validate dd-trace-py's agent-based LLMObs export (PR #18254). Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-29T19:45:19Z

CODEOWNERS have been resolved as:

utils/_context/_scenarios/integration_frameworks.py                     @DataDog/system-tests-core
utils/_context/_scenarios/parametric.py                                 @DataDog/system-tests-core
utils/_context/containers.py                                            @DataDog/system-tests-core

datadog-datadog-prod-us1 · 2026-05-29T19:46:10Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 15 Pipeline jobs failed

Testing the test | System Tests (golang, dev) / parametric / parametric (1)

🔧 Fix in code (Fix with Cursor).
2 failed assertions in test_distinct_aggregationkeys_TS003: expected 1 bucket containing stats but found 2 buckets instead.
🧪 1 Test failed
tests.parametric.test_library_tracestats.Test_Library_Tracestats.test_distinct_aggregationkeys_TS003[agent_env0-library_env0, parametric-golang] from system_tests_suite (Fix with Cursor)
AssertionError: There should be one bucket containing the stats
assert 2 == 1
 &#43;  where 2 = len([{&#39;Duration&#39;: 10000000000, &#39;Start&#39;: 1780085010000000000, &#39;Stats&#39;: [{&#39;Duration&#39;: 3851002, &#39;ErrorSummary&#39;: store: {}}, m...key:0, offset:0, zero_count: 0.0, count: 0.0, sum: 0.0, min: inf, max: -inf, &#39;Errors&#39;: 0, &#39;GRPCStatusCode&#39;: &#39;&#39;, ...}]}])

self = &lt;tests.parametric.test_library_tracestats.Test_Library_Tracestats object at 0x7f8b0cb13440&gt;
test_agent = &lt;utils.docker_fixtures._test_agent.TestAgentAPI object at 0x7f8b0c547c20&gt;
test_library = &lt;utils.docker_fixtures._test_clients._test_client_parametric.ParametricTestClientApi object at 0x7f8adaa6b3b0&gt;

    @enable_tracestats()
    @enable_agent_version()
...
DataDog/system-tests | Ubuntu_20_amd64.HOS: [test-app-php]

🔄 Retry job. This looks flaky and may succeed on retry.
Exception launching AWS provision step remote command. AssertionError: Previous errors in the virtual machine provisioning steps.

Testing the test | Fail if target branch is specified

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration.
This PR can't be merged due to the title specifying a target branch.
View all 15 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: fd32dcb | Docs | Datadog PR Page | Give us feedback!}

Extend the dev-tagged dd-apm-test-agent (dev-llmobs-meta-struct) to the remaining LLM-adjacent surfaces so the draft validates no regressions: - PARAMETRIC: exercises tests/parametric/test_llm_observability. - VCRCassettesContainer: backs INTEGRATION_FRAMEWORKS plus the AI_GUARD and AI_GUARD_TELEMETRY scenarios. Non-LLM references (APMTestAgentContainer used by DOCKER_SSI, and the k8s lib-injection test agent) are intentionally left on their pinned versions. Co-authored-by: Cursor <cursoragent@cursor.com>

mabdinur changed the title ~~Validate agent-based LLMObs meta_struct export [python@munir/agentbased-llmo]~~ Validate agent-based LLMObs meta_struct export [python@0829a33d95872059d52363b6472a5f7735c6f30d] May 29, 2026

mabdinur closed this May 29, 2026

mabdinur reopened this May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate agent-based LLMObs meta_struct export [python@0829a33d95872059d52363b6472a5f7735c6f30d]#7047

Validate agent-based LLMObs meta_struct export [python@0829a33d95872059d52363b6472a5f7735c6f30d]#7047
mabdinur wants to merge 2 commits into
mainfrom
munir/test-llmo-system-test-changes

mabdinur commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 Bot commented May 29, 2026 •

edited by datadog-official Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mabdinur commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How the tracer build is selected

Expected CI behavior

Test plan

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-datadog-prod-us1 Bot commented May 29, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mabdinur commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

datadog-datadog-prod-us1 Bot commented May 29, 2026 •

edited by datadog-official Bot

Loading