Fix WorkQueue destructor deadlock by merging into AppRuntime#147
Open
bghgary wants to merge 3 commits intoBabylonJS:mainfrom
Open
Fix WorkQueue destructor deadlock by merging into AppRuntime#147bghgary wants to merge 3 commits intoBabylonJS:mainfrom
bghgary wants to merge 3 commits intoBabylonJS:mainfrom
Conversation
8649d1f to
3e5dd39
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a shutdown deadlock in the JavaScript worker queue by inlining the former WorkQueue implementation into Babylon::AppRuntime and changing cancellation to be dispatched as a queued work item, then adds a deterministic regression test using arcana testing hooks to reproduce the old race.
Changes:
- Merge
WorkQueueintoAppRuntime(thread, dispatcher, cancellation, suspend/resume) and cancel via queued work item. - Add a deterministic unit test (
DestroyDoesNotDeadlock) guarded byARCANA_TESTING_HOOKS. - Update test harness / build configuration (Win32 gtest argv init; arcana fork +
ARCANA_TESTING_HOOKSdefinition).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| Tests/UnitTests/Win32/App.cpp | Initializes gtest with argc/argv and runs tests directly. |
| Tests/UnitTests/Shared/Shared.cpp | Adds deterministic deadlock regression test using arcana hooks. |
| Core/AppRuntime/Source/WorkQueue.h | Removes standalone WorkQueue (merged into AppRuntime). |
| Core/AppRuntime/Source/WorkQueue.cpp | Removes standalone WorkQueue implementation. |
| Core/AppRuntime/Source/AppRuntime.cpp | Moves queue/thread/cancel/suspend logic into AppRuntime and cancels via queued work item. |
| Core/AppRuntime/Source/AppRuntime_JSI.cpp | Switches V8 JSI task runner adapter to post tasks via AppRuntime::Dispatch. |
| Core/AppRuntime/Include/Babylon/AppRuntime.h | Adds dispatcher/thread/cancellation members and in-class Append helper. |
| Core/AppRuntime/CMakeLists.txt | Removes WorkQueue sources from build. |
| CMakeLists.txt | Switches arcana dependency to a fork and defines ARCANA_TESTING_HOOKS when tests are enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8452ce3 to
8ac4ba4
Compare
WorkQueue::~WorkQueue() had a race condition where cancel() + notify_all() fired without the queue mutex, so the signal could be lost if the worker thread hadn't entered condition_variable::wait() yet, causing join() to hang forever. This change merges WorkQueue into AppRuntime, eliminating split-lifetime issues, and dispatches cancellation as a work item via Append(). Since push() acquires the queue mutex, it blocks until the worker enters wait(), guaranteeing the notification is delivered. Changes: - Merge WorkQueue members into AppRuntime (thread, dispatcher, cancel source, env, suspension lock) - Remove WorkQueue.h and WorkQueue.cpp - Update AppRuntime_JSI.cpp TaskRunnerAdapter to use AppRuntime::Dispatch - Add deterministic regression test using arcana testing hooks - Fix member declaration order so m_options outlives worker thread The regression test uses arcana::set_before_wait_callback() to sleep while holding the queue mutex before wait(), deterministically triggering the race. See BabylonJS#146 for the test running against the old broken code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8ac4ba4 to
354f12a
Compare
Contributor
Author
|
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix the WorkQueue destructor deadlock by merging WorkQueue into AppRuntime and dispatching cancellation as a work item.
The Bug
WorkQueue::~WorkQueue() cancelled from the main thread via cancel() + notify_all(). The notify fired without holding the queue mutex, so if the worker thread hadn't entered condition_variable::wait() yet, the signal was lost and join() hung forever. See #146 for a deterministic repro of the deadlock against the old code.
The Fix
Regression Test
Includes a deterministic test using arcana testing hooks (microsoft/arcana.cpp#59) that sleeps while holding the queue mutex before wait(). This guarantees the worker is in the vulnerable window when destruction fires. The test passes with this fix and deadlocks with the old code (#146).
Dependencies