Skip to content

Refactor jsRpcSession span parenting and ownership#6704

Closed
Ankcorn wants to merge 2 commits intomainfrom
tankcorn/jsrpc-span-parenting
Closed

Refactor jsRpcSession span parenting and ownership#6704
Ankcorn wants to merge 2 commits intomainfrom
tankcorn/jsrpc-span-parenting

Conversation

@Ankcorn
Copy link
Copy Markdown
Member

@Ankcorn Ankcorn commented Apr 30, 2026

Prep for native jsRpcSession span enrichment.

Background

jsRpcSession is the user span for one RPC session. Today it's created by Fetcher::getClientForOneCall (HTTP) and kept alive via .attach() on the
WorkerInterface. AIG enrichment needs to write to this span from callImpl's response handler — but .attach() makes it unreachable, and Fetcher constructing an
RPC-layer event blurs HTTP/RPC layering.

Commit 1 — Test: jsRpcSession parented to enclosing enterSpan

RPC analog of fetchInsideEnterSpan. Locks in the contract before moving the span.

Commit 2 — Move span ownership into JsRpcSessionCustomEvent

  1. Lifetime alignment. The event's lifetime is the session. Was correct via .attach() ordering (side effect); now correct by construction.
  2. Reachability. .attach() hides the object; AIG needs a named path. event->jsRpcSessionSpan is that path.
  3. Layering (partial). "jsRpcSession" string moves out of generic HTTP plumbing into Fetcher::getJsRpcClient returning {worker, span}. OutgoingFactory
    variants pass SpanBuilder(nullptr) — they already make a durable_object_subrequest span.

Commit 3 — Move event construction into callImpl

  1. Layering (completion). HTTP no longer instantiates RPC events. Fetcher produces {worker, span}; callImpl names and uses them.
  2. No cross-boundary pointer. AIG needs &event->jsRpcSessionSpan in callImpl's response lambda; constructing the event there keeps it in-scope. Otherwise
    the AIG PR would need a ClientForOneCall { client, SpanBuilder* } struct on the provider API.
  3. Honest interface. JsRpcClientProvider had one virtual hiding two shapes. Now two, each defaulted:
    • getClientForOneCall — cap-holders (JsRpcStub, JsRpcPromise).
    • tryGetJsRpcSessionClient — session-creators (Fetcher).

Note. Both virtuals append to a shared path vector. callImpl gives the session attempt a scratch vector and only commits on success, so forwarders can
append unconditionally. (Caught early — 4 pipelining tests failed before the fix.)

Net effect

Before After
Span owner .attach()-ed to WorkerInterface Member of JsRpcSessionCustomEvent
Event constructor Fetcher (HTTP) callImpl (RPC)
JsRpcClientProvider One virtual, two shapes Two virtuals, one each
AIG coupling Needs struct + raw pointer Reads event->jsRpcSessionSpan

@Ankcorn Ankcorn requested review from a team as code owners April 30, 2026 15:15
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

UnknownError: ProviderInitError

github run

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

@Ankcorn Bonk workflow failed. Check the logs for details.

View workflow run · To retry, trigger Bonk again.

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This lifts the jsRpcSession span out of TraceContext and onto JsRpcSessionCustomEvent, so the span is reachable from the event and available for enrichment in the followup PR. The span lifetime and ownership model looks correct.

One minor note (low severity):

Comment thread src/workerd/api/http.c++ Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

Review posted successfully. Here's a summary of my analysis:

Overall assessment: The PR is well-structured. The span lifetime model is correct — SpanBuilder is move-only with RAII semantics, and storing it on JsRpcSessionCustomEvent gives the span the right lifetime (session duration). The SpanParent::addRef() usage correctly creates independent ref-counted references. The test coverage is good, adding a new jsRpcInsideEnterSpan test case that validates the span parenting via the tail worker's hierarchy-aware collector.

One minor issue flagged: The new getJsRpcClient method constructs SubrequestMetadata without featureFlagsForFl, which the old code path through getSubrequestChannelImpl did set. This is unlikely to matter in practice for RPC sessions but is a behavioral difference worth noting. I posted a suggestion comment with the fix.

github run

@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch from 04e17f6 to b28449b Compare April 30, 2026 16:48
@Ankcorn
Copy link
Copy Markdown
Member Author

Ankcorn commented Apr 30, 2026

@ask-bonk review this PR

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The featureFlagsForFl issue from my previous review has been addressed — the uint channel case now populates it via the metadataExtra callback.

On re-review, the span ownership model is sound and the code is correct. One minor nit below on a comment that looks like it has placeholder text left over from editing.

This review was generated by an AI assistant (Bonk) and may contain mistakes.

Comment thread src/workerd/api/http.c++ Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

Review posted successfully. Here's a summary of the re-review:

Previous feedback status: The featureFlagsForFl issue I flagged in the first review has been addressed — the uint channel case now populates it via the metadataExtra callback at src/workerd/api/http.c++:2440-2444.

Re-review findings: The code is correct. The span ownership model is sound:

  • SpanBuilder is move-only with RAII end() in its destructor — storing it on JsRpcSessionCustomEvent gives the correct session-scoped lifetime.
  • SpanParent::addRef() creates independent ref-counted references via kj::Refcounted, so the SubrequestMetadata gets its own reference to the observer.
  • No redundant span is created because getSubrequest is called without operationName, skipping the makeUserTraceSpan path in getSubrequestNoChecks.
  • All lambdas capture by reference ([&]) and are called synchronously, so no dangling references.

One minor nit posted: A comment at line 2414 in http.c++ has placeholder text that appears to have been swallowed (angle brackets rendered as empty HTML tags).

github run

@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch 2 times, most recently from fdc5681 to 04aa1ca Compare April 30, 2026 21:19
@Ankcorn
Copy link
Copy Markdown
Member Author

Ankcorn commented Apr 30, 2026

@ask-bonk review this PR

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both issues from prior reviews (missing featureFlagsForFl, truncated comment) are addressed. On re-review the refactoring is correct — no new issues found.

Summary of verification:

  • Span lifetime. SpanBuilder is move-only with RAII end() in its destructor. Storing it on JsRpcSessionCustomEvent gives it session-scoped lifetime, matching the logical span duration. The event is kept alive by .attach() on the customEvent() promise. Correct.
  • SpanParent ref-counting. SpanParent(SpanBuilder&) calls mapAddRef on the builder's observer (a kj::Refcounted). addRef() does the same. Both the sessionSpan builder and the SubrequestMetadata::userSpanParent hold independent kj::Own references to the same observer. Correct.
  • Capture-by-reference safety. All [&] lambdas in withSessionSpan / getSubrequest are called synchronously (kj::FunctionParam is non-owning by design). No dangling references.
  • Scratch path vector. callImpl uses a separate sessionPath vector for tryGetJsRpcSessionClient and only kj::mvs it into path on success. JsRpcProperty appending unconditionally to the scratch vector is harmless since it's discarded on the kj::none fallback. The StringPtrs point to JsRpcProperty::name members that outlive the synchronous methodPath construction. Correct.
  • getCap() before event move. getCap() moves the promise-side cap out of the event; the cap then lives independently via capnp's refcounting. Moving the event into customEvent() afterward is safe — same pattern as the original code.
  • OutgoingFactory variants. Returning SpanBuilder(nullptr) for DO/cross-process actor factories correctly skips the jsRpcSession span. These factories produce their own outer span (e.g. durable_object_subrequest), so no redundant span is created. Matches prior behavior.
  • Test. The new jsRpcInsideEnterSpan test validates that getCurrentUserTraceSpan().newChild() picks up the enterSpan-pushed AsyncContextFrame, mirroring the existing fetchInsideEnterSpan case for HTTP. The .wd-test correctly adds the rpc compat flag and the entrypoint binding.

This review was generated by an AI assistant (Bonk) and may contain mistakes.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

Review posted successfully. Here's a summary:

Both issues from my prior reviews have been addressed (featureFlagsForFl now populated in the uint channel path, comment text restored). On this re-review, the refactoring is correct with no new issues found.

I verified seven aspects of correctness:

  • Span lifetime — RAII SpanBuilder on JsRpcSessionCustomEvent gives session-scoped lifetime
  • SpanParent ref-countingaddRef() creates independent refcounted references, no use-after-free
  • Capture-by-reference safety — all [&] lambdas in withSessionSpan/getSubrequest are called synchronously via kj::FunctionParam
  • Scratch path vector — discarded on kj::none fallback, StringPtrs outlive their use
  • getCap() before event move — cap lives independently via capnp refcounting
  • OutgoingFactory variants — correctly skip jsRpcSession span (matching prior behavior)
  • Test coverage — new jsRpcInsideEnterSpan properly validates span parenting through AsyncContextFrame

github run

@Ankcorn Ankcorn changed the title Tankcorn/jsrpc span parenting Refactor jsRpcSession span parenting and ownership Apr 30, 2026
@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch 2 times, most recently from 685b7fa to b367ed8 Compare April 30, 2026 23:28
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 1, 2026

Merging this PR will not alter performance

✅ 72 untouched benchmarks
⏩ 129 skipped benchmarks1


Comparing tankcorn/jsrpc-span-parenting (63ac6bd) with main (7d0a721)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 1, 2026

Codecov Report

❌ Patch coverage is 79.48718% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.61%. Comparing base (7d0a721) to head (63ac6bd).

Files with missing lines Patch % Lines
src/workerd/api/http.c++ 75.75% 3 Missing and 5 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6704   +/-   ##
=======================================
  Coverage   66.61%   66.61%           
=======================================
  Files         402      402           
  Lines      115803   115841   +38     
  Branches    19395    19401    +6     
=======================================
+ Hits        77139    77166   +27     
- Misses      27094    27099    +5     
- Partials    11570    11576    +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kentonv kentonv self-requested a review May 4, 2026 13:12
Comment thread src/workerd/api/worker-rpc.h Outdated
// If this isn't the root object (i.e. this is a JsRpcProperty), the property path starting from
// the root object will be appended to `path`.
//
// For cap-holding providers (JsRpcStub, JsRpcPromise), this returns the live cap. Session-
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be breaking the abstraction that this interface is meant to provide. What's the point of having a shared interface between Fetcher and JsRpcStub if they implement different methods and have to be called differently?

The right answer here is to push the Fetcher-specific logic into getClientForOneCall. It's not immediately obvious to me why this can't be done here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the point here is that #6695 needs access to the SpanBuilder for the session, we can extend getClientForOneCall() with an extra output parameter to let it fill that in when it's a top-level request.

@kentonv
Copy link
Copy Markdown
Member

kentonv commented May 4, 2026

I wonder if this should even be JSRPC-specific at all, or if it should apply to all event types (e.g. including fetch events)? Seems like you could benefit from this applying to all cross-worker subrequests.

Perhaps, then, it makes more sense to extend IoChannelFactory::SubrequestMetadata with a callback for enriching the parent span. When dispatched over Cap'n Proto RPC, we should convert that to an RPC callback object. In the internal codebase this would end up being another parameter to the Pipeline.startEvent() RPC.

Ankcorn added 2 commits May 5, 2026 13:09
…sionCustomEvent

Per Dan's spec, the user-visible jsRpcSession span belongs in
JsRpcSessionCustomEvent (the membrane event itself), not at the
session-dispatch site in Fetcher::getClient. Move it accordingly:

- Server side, in run(): wraps the membrane lifetime, from delivered()
  through to all caps being dropped. Parented to the caller's enclosing
  user span via USER_SPAN_CONTEXT_PROPAGATION (the metadata.userSpanParent
  set on the client side flows here as the IoContext's current user span).
- Client side, in sendRpc(): only fires for cross-process dispatch
  (in-process service bindings reach the callee via WorkerEntrypoint::
  customEvent -> event->run() and never enter sendRpc). Wraps the wire
  round-trip. Parented to the caller's enclosing user span when an
  IoContext is in scope; constructs as unobserved otherwise so capnp
  dispatch contexts that lack an IoContext don't crash.

Fetcher::getClientForOneCall stops going through getClient(... "jsRpcSession")
and instead inlines the channel switch to:
  - Construct an internal trace span (still needed for SubrequestMetadata
    .parentSpan ID propagation to the callee's nested internal subrequests).
  - Pass getCurrentUserTraceSpan() as metadata.userSpanParent so the new
    server-side jsRpcSession span lands as a direct child of the caller's
    enclosing user span.
  - Skip user-span synthesis at the dispatch site -- the user-visible span
    is now emitted by JsRpcSessionCustomEvent itself.

Internal trace span layout is unchanged: it stays at the dispatch site
because its ID has to be available when SubrequestMetadata is assembled,
which happens before the event runs. Internal-trace asymmetry vs the user
trace is a deliberate trade-off (see worker-rpc.c++ comment).

Tail-worker-test fixtures updated:
  - jsrpcDoSubrequest (caller): loses the client-side jsRpcSession user
    spans; only durable_object_subrequest remains as a client-side user
    span.
  - Callee fixtures (myActorJsrpc, jsrpcNonFunction, jsrpcGetCounter):
    wrapped with spanOpen/spanClose for the new server-side jsRpcSession.
  - expectedWithPropagation: callees no longer attach as direct children
    of jsrpcDoSubrequest. The buildTree heuristic uses sequential
    per-invocation span IDs that collide across invocations in the same
    trace, so it can attach a callee under whichever invocation it last
    indexed at the colliding ID. The shape reflects the heuristic's
    limitation, not actual misparenting -- the underlying trigger
    contexts and traceIds remain correct.
Adds Case 7 to tracing-hierarchy-instrumentation-test: an RPC call made
inside enterSpan() must produce a jsRpcSession user span whose parent
context is the enterSpan, verifying the USER_SPAN_CONTEXT_PROPAGATION
flow across the membrane after moving jsRpcSession to the server-side
JsRpcSessionCustomEvent::run.

The server-side jsRpcSession lives on the *callee*'s invocation, so
direct parentSpanId equality with the caller's enterSpan doesn't work:
each invocation's SequentialSpanSubmitter mints span IDs starting at 1,
while cross-invocation references in the streaming-tail (the trigger
context on the callee's onset) carry the real 64-bit spanId. Instead
this test asserts the structural propagation we actually care about:
  - jsRpcSession lives on a separate callee invocation,
  - and, when USER_SPAN_CONTEXT_PROPAGATION is enabled (@all-autogates),
    the callee shares the caller's traceId and onset's trigger context
    references a span on the caller's invocation.

The instrumentation-test-helper collector is extended to capture
per-invocation traceId and triggerSpanId from onset events, accessible
to assertions via state.invocations.

The wd-test mock service now also streams tails so callee-side spans
are observable in the same tail stream the test inspects.
@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch from a1dafc5 to 63ac6bd Compare May 5, 2026 13:09
@Ankcorn
Copy link
Copy Markdown
Member Author

Ankcorn commented May 6, 2026

Thanks for the review - I was working through some stuff with Dan and this PR was kinda dealing with 2 slightly different intentions at the same time so I got mega confused. I'm trying to break down more so closing this in favour of #6734

@Ankcorn Ankcorn closed this May 6, 2026
@danlapid
Copy link
Copy Markdown
Collaborator

danlapid commented May 7, 2026

I wonder if this should even be JSRPC-specific at all, or if it should apply to all event types (e.g. including fetch events)? Seems like you could benefit from this applying to all cross-worker subrequests.

Perhaps, then, it makes more sense to extend IoChannelFactory::SubrequestMetadata with a callback for enriching the parent span. When dispatched over Cap'n Proto RPC, we should convert that to an RPC callback object. In the internal codebase this would end up being another parameter to the Pipeline.startEvent() RPC.

That would be really nice but not all of our subrequests are over http, for example, some subrequests happen to be transported over http, imagine a worker-to-worker cached subrequest that runs over channel tokens over http over cache.

How would we then extend this implementation idea to also work for fetch entrypoints?

Building this into the jsrpc session mechanism is clear as we must have bidirectional RPC for jsrpc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants