EW-10817 Add HibernationManager unit tests by harrishancock · Pull Request #6732 · cloudflare/workerd

harrishancock · 2026-05-06T10:43:31Z

Adds a kj_test suite covering normal HibernationManager behavior (basic comm, close, binary, auto-response, multi-IR, multi-WS, hibernation flows, output gate interactions) along with regression tests for EW-10817 that use KJ_EXPECT_LOG to capture the production "another message send is already in progress" assertion. The regression tests pass while the bug exists and fail loudly when the fix lands. Code comments document each test's purpose, lifecycle, and known workarounds.

Supporting TestFixture additions: SetupParams::actorLoopback so the actor and the HibernationManager share one Loopback; newIoContext() and newIncomingRequest(IoContext&) for multiple IRs per IoContext; drainAndDestroy() and pollEventLoop() helpers; getActor() and getTimerChannel() accessors so tests can construct the HM outside any IoContext; resetActor() to simulate eviction by reconstructing the Worker::Actor with the same id and Loopback.

Two workarounds for known bugs are noted at their use sites: a leaked api::WebSocket ref to dodge the AsyncObject destructor issue, and an explicit end1->receive() drain in some EW-10817 repros to consume the orphaned BlockedSend at teardown. Both go away once EW-10817 is fixed.

Assisted-by: OpenCode:claude-opus-4.7

ask-bonk · 2026-05-06T10:45:27Z

That's correct. The alloc<T>(args...) passes env.js plus the websocket as constructor arguments. The WebSocket constructor takes (jsg::Lock&, kj::Own<kj::WebSocket>). This is the standard pattern.

After careful review, I believe this is a high-quality PR. The tests are well-structured, well-commented, and the TestFixture additions are properly designed. The intentional leaks and workarounds are clearly documented with TODO references to the bug being tracked. The BUILD.bazel changes are correct.

The only potential issue I see is minor: the enterContext method constructs an Environment without initializing the features field, but this is a pre-existing pattern (also present in runInIoContext) and the HM tests don't use features, so it's not actionable for this PR.

LGTM

github run

Adds a kj_test suite covering normal HibernationManager behavior (basic comm, close, binary, auto-response, multi-IR, multi-WS, hibernation flows, output gate interactions) along with regression tests for EW-10817 that use KJ_EXPECT_LOG to capture the production "another message send is already in progress" assertion. The regression tests pass while the bug exists and fail loudly when the fix lands. Code comments document each test's purpose, lifecycle, and known workarounds. Supporting TestFixture additions: SetupParams::actorLoopback so the actor and the HibernationManager share one Loopback; newIoContext() and newIncomingRequest(IoContext&) for multiple IRs per IoContext; drainAndDestroy() and pollEventLoop() helpers; getActor() and getTimerChannel() accessors so tests can construct the HM outside any IoContext; resetActor() to simulate eviction by reconstructing the Worker::Actor with the same id and Loopback. Two workarounds for known bugs are noted at their use sites: a leaked api::WebSocket ref to dodge the AsyncObject destructor issue, and an explicit end1->receive() drain in some EW-10817 repros to consume the orphaned BlockedSend at teardown. Both go away once EW-10817 is fixed. Assisted-by: OpenCode:claude-opus-4.7

jclee · 2026-05-08T02:21:46Z

+  // Promise — the Worker::Lock is only valid for the synchronous duration of the
+  // callback). The context is NOT destroyed afterwards — the caller still owns the
+  // IncomingRequest.
+  template <typename CallBack>


Nit: "CallBack" is consistent with the pre-existing template var for "runInIoContext()" above, but I'd kind of expect it to be "Callback" instead (here and elsewhere in the file), since it's referring to the single-word noun, not the verb.

jclee · 2026-05-08T02:27:28Z

+  // Drain an IncomingRequest (waiting on its waitUntil tasks) and then destroy it. Tests should
+  // use this rather than letting the IncomingRequest's Own go out of scope, otherwise the
+  // IncomingRequest destructor logs a warning about un-drained waitUntil tasks. Production code
+  // paths always drain.


If it's important to drain, I suppose it might be possible to just have the fixture track the outstanding requests and drain them automatically in its destructor.

But having the explicit drain is fine, too... and maybe offers finer-grained control, or is needed for specific use cases?

Ah, I'm seeing the tests are using this to selectively drain during the test rather than just at the end... and in some of the repros, selectively not draining the request, presumably because it's unnecessary in those cases.

jclee · 2026-05-08T02:29:24Z

+    auto* leaked = new jsg::Ref<api::WebSocket>(apiWs.addRef());
+#if KJ_HAS_COMPILER_FEATURE(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
+    __lsan_ignore_object(leaked);
+#else
+    (void)leaked;
+#endif


Heh... I didn't actually realize we had leak detection actively running in asan tests.

jclee · 2026-05-08T02:33:04Z

+  KJ_ASSERT(stats.customEventCalls >= 1, "expected at least one customEvent dispatch",
+      stats.customEventCalls);


I was curious... Is there a reason we aren't asserting for exactly one event? I suppose part of it is that it would be difficult to introspect the custom event type?

jclee · 2026-05-08T02:49:50Z

+  // The KJ_EXPECT_LOG below captures the bug's symptom (the assertion's ERROR log) so the
+  // test passes while EW-10817 is open. When the bug is fixed, the log won't fire and the
+  // KJ_EXPECT_LOG will fail — that's the signal to update this test (delete the EXPECT_LOG,
+  // delete the workaround end1->receive() drain at the end, and add positive assertions about
+  // what should happen instead).
+  KJ_EXPECT_LOG(ERROR, "another message send is already in progress");


I'm guessing the log is actually emitted either during the fixture.enterContext() or fixture.pollEventLoop() calls? In which case, might not hurt to group the triggering call in a block with the log expectation. But not a big deal, especially considering it's temporary.

Oh... Now I'm seeing this is probably to be consistent with the other tests having their log expectations upfront. That's fine, too.

jclee

Looks good...

It's great to have unit test coverage for this, thanks! I guess we'd been relying mostly on integration tests for coverage, but of course that makes it very hard to test precise event orderings like this.

harrishancock requested review from a team as code owners May 6, 2026 10:43

harrishancock force-pushed the harris/EW-10817-hibernation-manager-tests branch from a8642ec to a07592b Compare May 6, 2026 11:23

harrishancock force-pushed the harris/EW-10817-hibernation-manager-tests branch from a07592b to fd5b913 Compare May 6, 2026 14:20

connyay mentioned this pull request May 7, 2026

Add binary support to setWebSocketAutoResponse #6738

Draft

jclee reviewed May 8, 2026

View reviewed changes

jclee approved these changes May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EW-10817 Add HibernationManager unit tests#6732

EW-10817 Add HibernationManager unit tests#6732
harrishancock wants to merge 1 commit intomainfrom
harris/EW-10817-hibernation-manager-tests

harrishancock commented May 6, 2026

Uh oh!

ask-bonk Bot commented May 6, 2026

Uh oh!

jclee May 8, 2026 •

edited

Loading

Uh oh!

jclee May 8, 2026

Uh oh!

jclee May 8, 2026

Uh oh!

jclee May 8, 2026

Uh oh!

jclee May 8, 2026

Uh oh!

jclee May 8, 2026 •

edited

Loading

Uh oh!

jclee left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		KJ_ASSERT(stats.customEventCalls >= 1, "expected at least one customEvent dispatch",
		stats.customEventCalls);

Conversation

harrishancock commented May 6, 2026

Uh oh!

ask-bonk Bot commented May 6, 2026

Uh oh!

jclee May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jclee May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jclee May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jclee May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jclee May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jclee May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jclee left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jclee May 8, 2026 •

edited

Loading

jclee May 8, 2026 •

edited

Loading