Skip to content

Add FFE evaluation metrics#3911

Draft
leoromanovsky wants to merge 4 commits into
leo.romanovsky/m2-m3-evaluation-completed-basefrom
leo.romanovsky/m3-ffe-evaluation-metrics
Draft

Add FFE evaluation metrics#3911
leoromanovsky wants to merge 4 commits into
leo.romanovsky/m2-m3-evaluation-completed-basefrom
leo.romanovsky/m3-ffe-evaluation-metrics

Conversation

@leoromanovsky
Copy link
Copy Markdown

@leoromanovsky leoromanovsky commented May 22, 2026

Motivation

Milestone 3: PHP emits feature_flag.evaluations counters with attributes matching the Go/JS/Java/.NET reference SDKs. Covers both PHP 7 (DD Client hook) and PHP 8 (OpenFeature SDK hook), with delivery routed through the libdatadog sidecar per the architectural rule that the tracer extension performs no I/O outside the sidecar.

Planning/reference doc: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

Where this PR fits in the stack

This is the OTLP metrics layer. Sibling of #3910 (EVP exposures) under the hook PR #3909.

PHP FFE 4-PR stack — this PR highlighted

Where this PR fits in the target system

This PR contributes the EvaluationMetricWriter (in-PHP OTLP protobuf encoding via existing OtlpMetricEncoder), the OpenFeature SDK Hook layer (EvalMetricsHook on DataDogProvider), the ddog_sidecar_send_ffe_metrics FFI call path, and the OTLP HTTP intake leg (highlighted). EVP/Agent path is future for #3910.

PHP FFE system architecture — this PR's scope highlighted

Changes

PHP side:

  • Add Internal/Metric/EvaluationMetricWriter with bounded aggregation by attribute set, drop accounting, and shutdown flush.
  • Add Internal/Metric/EvaluationMetricHook (DD Client hook) and OtlpMetricEncoder (PHP 7-safe OTLP/protobuf encoder).
  • Add Internal/Metric/SidecarOtlpMetricsTransport that JSON-encodes nothing — it passes the OTLP/protobuf bytes from OtlpMetricEncoder to the native \DDTrace\send_ffe_metrics() FFI. Endpoint resolution: OTEL_EXPORTER_OTLP_METRICS_ENDPOINT, falling back to OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics, default http://localhost:4318/v1/metrics.
  • Add DDTrace\OpenFeature\EvalMetricsHook implementing OpenFeature\interfaces\hooks\Hook (after/error stages) to match the architectural pattern used by Go/JS/Java/.NET. Registered on DataDogProvider via setHooks().
  • DataDogProvider constructs its internal DD Client with DefaultEvaluationCompletedHook::createWithoutMetric() so the OpenFeature path records the metric via the OpenFeature hook (NOT via the DD Client hook), preventing double-counting. PHP 7 path keeps recording through the DD Client hook.
  • Add Internal/CompositeEvaluationCompletedHook and Internal/DefaultEvaluationCompletedHook returning [EvaluationMetricHook]. This file conflicts with Add FFE exposure emission #3910's [ExposureHook] composite at merge time — resolution combines both hooks. Intentional cost of independent siblings.
  • Update Client::create() to call DefaultEvaluationCompletedHook::create().
  • Unit coverage for success/error metric attributes (variant, reason, allocation_key, error.type), aggregation, drop behavior, no-double-count between the DD Client hook and the OpenFeature SDK hook on the PHP 8 path.

C/Rust bridge: the \DDTrace\send_ffe_metrics() native function and supporting C+Rust wiring were added in #3910's commit (because both transports share the bridge surface). This branch picks them up via the same libdatadog submodule pin to the FFE branch tip.

Decisions

  • Transport via sidecar (FFI), not PHP-side HTTP. Per Bob's 2026-05-22 PR Add FFE exposure emission #3910 review: tracer extension performs no I/O outside the sidecar. Verified 2026-05-23: dd-trace-php has no other OTLP-metric-export-from-PHP path; every other tracer→Agent egress already uses the sidecar (DogStatsD, trace stats, telemetry self-metrics). FFE metrics follow the same pattern via ddog_sidecar_send_ffe_metrics in feat(sidecar): forward FFE exposures to EVP proxy libdatadog#2026.
  • PHP encodes OTLP/protobuf; sidecar just POSTs. The existing OtlpMetricEncoder is PHP 7-safe; rewriting the encoder in Rust would be larger scope without functional benefit. Sidecar is a dumb HTTP relay for this path.
  • OpenFeature after+error (not finally). PHP OpenFeature SDK's Hook::finally signature does NOT carry ResolutionDetails (unlike Go/JS/Java/.NET), so a true finally-style port loses variant/reason info on the error path. The after+error split preserves attribute coverage.
  • No double-counting on PHP 8 path. DataDogProvider uses createWithoutMetric() so only the OpenFeature hook records on that path. PHP 7 path uses the DD Client hook.

Dependencies

History

This PR was originally stacked linearly on #3910 (which itself was the M2 PR with the exposures + hook seam folded together). Per the 2026-05-23 restructure, the branch was rebased onto #3909 (the standalone hook PR), the M2 exposure commit was dropped from history, the OTLP transport was refactored from PHP-side raw socket to the libdatadog sidecar, and the branch was force-pushed (user-authorized one-time exception to the no-force-push rule).

@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented May 22, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 5 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | check-big-regressions   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Regression Check failed: Metric MetricName.EXECUTION_TIME regression of +10.47% exceeds threshold of 10.00%.

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [7.4]   View in Datadog   GitLab

🔄 Retry job. This looks flaky and may succeed on retry. 2 failed tests. Error: process timed out during execution of 'dynamic_config_multiconfig' and 'client_side_stats' tests.

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [8.0]   View in Datadog   GitLab

🔄 Retry job. This looks flaky and may succeed on retry. Job failed due to scheduling issues: 0/240 nodes available, insufficient resources and untolerated taints.

View all 5 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 60.70% (+0.00%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 2a48c49 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 22, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-05-23 19:50:04

Comparing candidate commit 2a48c49 in PR branch leo.romanovsky/m3-ffe-evaluation-metrics with baseline commit 49d53ef in branch leo.romanovsky/m2-m3-evaluation-completed-base.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 192 metrics, 0 unstable metrics.

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟥 execution_time [+10.531µs; +13.029µs] or [+10.475%; +12.959%]

scenario:TraceSerializationBench/benchSerializeTrace-opcache

  • 🟥 execution_time [+7.624µs; +15.276µs] or [+2.228%; +4.463%]

@leoromanovsky leoromanovsky marked this pull request as ready for review May 23, 2026 16:01
@leoromanovsky leoromanovsky requested review from a team as code owners May 23, 2026 16:01
@leoromanovsky leoromanovsky requested review from dd-oleksii and typotter and removed request for a team May 23, 2026 16:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f021234c39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +63 to +64
if (!is_array($parts) || !isset($parts['scheme'], $parts['host']) || strtolower($parts['scheme']) !== 'http') {
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Support HTTPS OTLP metric endpoints

When OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT is configured with an https:// collector URL, resolveEndpoint() passes that value through but post() rejects every scheme except http, causing flush() to fail and drop all buffered feature_flag.evaluations for that common remote-collector setup. Either support TLS sockets here or avoid accepting HTTPS endpoints as if they were usable.

Useful? React with 👍 / 👎.

@leoromanovsky leoromanovsky marked this pull request as draft May 23, 2026 16:10
leoromanovsky added a commit to DataDog/libdatadog that referenced this pull request May 23, 2026
Adds a parallel pathway for PHP feature-flag evaluation metrics
mirroring the FfeExposures forwarder. dd-trace-php encodes
`feature_flag.evaluations` counters as OTLP/protobuf in PHP
(via its existing PHP 7-safe `OtlpMetricEncoder`) and ships the
encoded bytes to the sidecar, which POSTs them to the user-configured
OTLP HTTP metrics intake.

Why a sibling action instead of reusing FfeExposures:

- The OTLP collector is not the Datadog Agent. It's user-configurable
  via OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (default
  http://localhost:4318/v1/metrics), so the endpoint travels with the
  payload rather than being derived from the sidecar session's agent
  base URL.
- Content type differs (application/x-protobuf vs application/json).
- No EVP subdomain header.
- The payload is binary protobuf, not a JSON string.

dd-trace-php side (PR DataDog/dd-trace-php#3911) will refactor its
existing `OtlpHttpMetricTransport` (which currently does PHP-side
HTTP I/O, violating the architectural rule "no I/O outside the
sidecar") to call this new FFI.

Validation:

- `cargo test -p datadog-sidecar ffe` passes 7 tests
  (3 exposures + 4 metrics).
- `cargo check -p datadog-sidecar-ffi` clean.
leoromanovsky added a commit that referenced this pull request May 23, 2026
Per Bob's PR review (2026-05-22), the tracer extension must perform no
I/O outside the sidecar. Replaces the raw-socket `AgentExposureTransport`
with `SidecarExposureTransport`, which forwards exposure batches to the
libdatadog sidecar via a new native PHP function `\DDTrace\send_ffe_exposures`
that calls the `ddog_sidecar_send_ffe_exposures` FFI added in
DataDog/libdatadog#2026.

PHP side:

- Delete `Internal/Exposure/AgentExposureTransport.php` (raw socket
  POST to the Agent EVP proxy).
- Add `Internal/Exposure/SidecarExposureTransport.php` that JSON-encodes
  the batch and calls `\DDTrace\send_ffe_exposures()`. Fire-and-forget;
  the sidecar handles retries.
- Update `ExposureWriter::createDefault()` to instantiate the sidecar
  transport.
- Drop the obsolete `testAgentTransportBuildsAgentEvpRequest` PHPUnit
  test (HTTP construction now lives in libdatadog, covered by
  `cargo test -p datadog-sidecar ffe_flusher`).
- Add `Internal/DefaultEvaluationCompletedHook` and
  `Internal/CompositeEvaluationCompletedHook` so production callers go
  through a composite hook factory. In this PR the composite contains
  only `ExposureHook`; the metrics PR (#3911) contributes
  `EvaluationMetricHook` and the file conflict at merge resolves by
  combining both. Update `Client::create()` to call
  `DefaultEvaluationCompletedHook::create()`.

C/Rust bridge:

- Declare `ddog_ByteSlice` (and underlying `ddog_Slice_U8`) in
  `components-rs/common.h` for the metrics path; declare both
  `ddog_sidecar_send_ffe_exposures` and `ddog_sidecar_send_ffe_metrics`
  in `components-rs/sidecar.h`.
- Add C wrappers `ddtrace_sidecar_send_ffe_exposures(zend_string *)`
  and `ddtrace_sidecar_send_ffe_metrics(zend_string *endpoint,
  zend_string *payload_bytes)` in `ext/sidecar.{h,c}` that call the FFI
  with the current sidecar transport + instance id + queue id.
- Declare native PHP functions `\DDTrace\send_ffe_exposures(string): bool`
  and `\DDTrace\send_ffe_metrics(string, string): bool` in
  `ext/ddtrace.stub.php`; add corresponding arginfo entries and
  `ZEND_FUNCTION` registrations in `ext/ddtrace_arginfo.h`; implement
  `PHP_FUNCTION(DDTrace_send_ffe_exposures)` and
  `PHP_FUNCTION(DDTrace_send_ffe_metrics)` in `ext/ddtrace.c`.
- Bump `libdatadog` submodule to FFE branch tip `29762335c` (which
  provides both FFIs). The submodule will be bumped to the libdatadog
  main commit once #2026 merges.

Docs:

- Add `docs/php-ffe-stack/{stack,system}-pr3910.{mmd,png}` for this PR.

Validation:

- `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags`
  → 41 tests, 174 assertions, OK.
- libdatadog sidecar tests (`cargo test -p datadog-sidecar ffe_flusher`)
  → 3 passed, on the pinned submodule commit.
- Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`.

`make test_featureflags` and `make test_c TESTS=tests/ext/ffe/...` will
run in CI; running them locally requires rebuilding the extension which
is gated behind libdatadog #2026 merging.
Adds the M3 evaluation-metrics layer on top of the hook PR (#3909) as a
sibling of the EVP exposures PR (#3910). Records `feature_flag.evaluations`
for both PHP 7 (DD Client hook) and PHP 8 (OpenFeature SDK hook); both
paths share `EvaluationMetricHook::sharedWriter()` for unified
aggregation. OTLP/protobuf payloads are encoded in PHP via the existing
`OtlpMetricEncoder` and delivered to the user-configured OTLP HTTP
metrics intake through the libdatadog sidecar (`ddog_sidecar_send_ffe_metrics`
FFI added in DataDog/libdatadog#2026).

This branch is force-pushed (user-authorized one-time exception to the
no-force-push rule, 2026-05-23) to restructure history away from being
linearly stacked on the M2 exposures PR (#3910). The PR now stacks
directly on the hook PR (#3909) as a sibling of the EVP PR.

PHP side:

- Add `Internal/Metric/EvaluationMetricWriter` with bounded series
  aggregation, drop accounting, and shutdown flush.
- Add `Internal/Metric/EvaluationMetricHook` (DD Client hook) and
  `OtlpMetricEncoder` (PHP 7-safe protobuf encoding).
- Add `Internal/Metric/SidecarOtlpMetricsTransport` that calls
  `\DDTrace\send_ffe_metrics()` (FFI declared in #3910). Endpoint
  resolution: `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`, falling back to
  `OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics`, default
  `http://localhost:4318/v1/metrics`.
- Add `DDTrace\OpenFeature\EvalMetricsHook` implementing
  `OpenFeature\interfaces\hooks\Hook` (after + error stages), registered
  on `DataDogProvider` via `setHooks()`.
- `DataDogProvider` constructs its internal DD `Client` with
  `DefaultEvaluationCompletedHook::createWithoutMetric()` so the
  OpenFeature path records the metric via the OpenFeature hook (PR 3911
  scope) and NOT via the DD Client hook — preventing double-counting.
  PHP 7 path keeps recording via the DD Client hook.
- Add `Internal/CompositeEvaluationCompletedHook` and
  `Internal/DefaultEvaluationCompletedHook` (metric-only composite).
  This is the merge-conflict point with PR #3910's `[ExposureHook]`
  composite — second merge resolves by combining both hooks.
- Update `Client::create()` to call `DefaultEvaluationCompletedHook::create()`.
- Drop the obsolete `testOtlpTransportBuildsHttpProtobufRequest` PHPUnit
  test (HTTP construction now lives in libdatadog, covered by
  `cargo test -p datadog-sidecar ffe_metrics_flusher`).
- Add `_files_openfeature.php` entry for `EvalMetricsHook.php`.

C/Rust bridge: the `\DDTrace\send_ffe_metrics()` native function, its C
wrapper `ddtrace_sidecar_send_ffe_metrics()`, and the
`ddog_sidecar_send_ffe_metrics` FFI declaration in `components-rs/sidecar.h`
were already added in #3910. This PR's branch picks up those changes
once #3910 merges (or via the same libdatadog submodule pin during
review). For development locally the libdatadog submodule is pinned to
the FFE branch tip (`29762335c`).

Docs:

- Add `docs/php-ffe-stack/{stack,system}-pr3911.{mmd,png}` per the
  4-PR documentation convention.

Validation:

- `php vendor/bin/phpunit --config phpunit.xml tests/api/Unit/FeatureFlags`
  → 40 tests, 160 assertions, OK.
- Mermaid PNGs regenerate via `npx @mermaid-js/mermaid-cli`.

`make test_featureflags`, OpenFeature PHPUnit, and ffe-dogfooding
end-to-end validation will run in CI / are validated separately by
FOLLOW-05 Steps 4–5.
@leoromanovsky leoromanovsky force-pushed the leo.romanovsky/m3-ffe-evaluation-metrics branch from f021234 to e74b050 Compare May 23, 2026 17:44
@leoromanovsky leoromanovsky changed the base branch from leo.romanovsky/m2-ffe-exposures to leo.romanovsky/m2-m3-evaluation-completed-base May 23, 2026 17:44
@leoromanovsky
Copy link
Copy Markdown
Author

Force-pushed per the 2026-05-23 restructure.

The branch was rebased: dropped the M2 exposure commit from history and reapplied just the M3-specific changes on top of #3909 (hook PR). The OTLP transport was also refactored from a PHP-side raw socket (the prior `OtlpHttpMetricTransport`) to a libdatadog sidecar call (`SidecarOtlpMetricsTransport` → `\DDTrace\send_ffe_metrics()` → `ddog_sidecar_send_ffe_metrics` FFI). This matches the same architectural rule Bob applied to #3910: tracer extension performs no I/O outside the sidecar.

This is the one-time force-push authorized for the 4-PR restructure (no other branches gain force-push permission). `--force-with-lease` was used to fail safely if anything else had updated the remote.

The branch now stacks directly on #3909 instead of on #3910 — sibling structure (M3 sibling of M2 under hook).

CI will be red on the Rust compile job until DataDog/libdatadog#2026 merges and the submodule is bumped to a libdatadog/main commit containing both FFE FFIs. Locally the submodule is pinned to the FFE branch tip `29762335c` for development; `tests/api/Unit/FeatureFlags` 40 tests / 160 assertions pass against that pin.

The SidecarOtlpMetricsTransport::resolveEndpoint() default ("http://localhost:4318/v1/metrics")
doesn't match the system-tests parametric setup, where the PHP test
client receives DD_AGENT_HOST=<test_agent_container> but no
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT. The previous OtlpHttpMetricTransport
(replaced by this transport) derived the OTLP endpoint from DD_AGENT_HOST
+ port 4318. Restore that fallback so system-tests
test_php_ffe_evaluation_metric finds the test-agent OTLP intake.

Resolution order (now matches the old transport):
1. OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (explicit)
2. OTEL_EXPORTER_OTLP_ENDPOINT + /v1/metrics
3. DD_AGENT_HOST + :4318/v1/metrics
4. localhost:4318/v1/metrics
The C/Rust bridge for the new native PHP functions
\DDTrace\send_ffe_exposures() and \DDTrace\send_ffe_metrics() lives on
the M2 EVP exposures PR (#3910) because that PR introduced the bridge
when refactoring the exposure transport. PR #3911 (this PR) needs the
same bridge for its OTLP metrics transport — without it the
SidecarOtlpMetricsTransport silently drops batches because
function_exists('\\DDTrace\\send_ffe_metrics') is false.

Adds the same bridge files here so the M3 branch is independently
compilable. At merge time the two PRs will conflict at the file level on
these bridge files; resolution is deduplication (the bridge is identical
in both PRs by design).

Files added/modified:
- components-rs/sidecar.h: declares ddog_sidecar_send_ffe_exposures and
  ddog_sidecar_send_ffe_metrics FFIs.
- components-rs/common.h: declares ddog_ByteSlice typedef for the
  metrics payload.
- ext/sidecar.h, ext/sidecar.c: C wrappers
  ddtrace_sidecar_send_ffe_exposures() and ddtrace_sidecar_send_ffe_metrics().
- ext/ddtrace.stub.php, ext/ddtrace_arginfo.h, ext/ddtrace.c: declares
  the native PHP functions and the PHP_FUNCTION implementations.
Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE
actions before application-entry check"). Without this fix, the
`SidecarOtlpMetricsTransport::send()` call from PHP would silently
no-op for short-lived processes: the sidecar received the
`FfeMetrics` action but dropped it because the `Entry::Occupied`
gate on the application metadata had not yet fired.

This unblocks the parametric system-test
`Test_Feature_Flag_Parametric_Evaluation_Metrics::test_php_ffe_evaluation_metric`
which exercises the full PHP -> sidecar -> OTLP-HTTP-intake path
end-to-end. Local result: 26/27 FFE-scoped parametric tests pass
(remaining failure is the EVP exposure test, which lives on the
M2 PR #3910 branch).
leoromanovsky added a commit that referenced this pull request May 23, 2026
Pulls in libdatadog commit `875ec8f0e` ("fix(sidecar): dispatch FFE
actions before application-entry check") so the EVP exposure batch sent
via `ddog_sidecar_send_ffe_exposures` is no longer silently dropped
when the PHP runtime hasn't yet registered the application against the
sidecar's `QueueId`. Same fix is on the sibling PR #3911 (`2a48c4987")
for the OTLP metric path.

Without this submodule bump, `Test_Feature_Flag_Parametric_Exposures::test_php_ffe_exposure_event`
sees zero EVP POSTs at the test-agent because the sidecar's
`enqueue_actions` handler discards the `FfeExposures` action under
the `Entry::Occupied` gate.
leoromanovsky added a commit that referenced this pull request May 23, 2026
…macOS+colima

`build-debug-artifact` produces a `/output` bind mount via `-v
${TMP_OUT}:/output`. On macOS+colima only paths under $HOME are
mounted into the Linux VM; the macOS default `/var/folders/...` temp
dir is not, so writes from inside the container to `/output` land in
the VM and never propagate back to the host. The script exits without
error and the user's binaries dir is left with whatever stale artifact
was there before.

Pin TMP_OUT/TMP_PKG under $OUTPUT_DIR (which the user just passed as
an absolute, usable destination) so the bind mount is always on a
host-visible path. Inherits cleanup via the existing EXIT trap.

The same fix is independently on the M3 branch (#3911) as part of
e74b050; bringing it to the M2 branch (#3910) so both branches are
buildable on macOS without a `TMPDIR=$HOME/...` override.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant