fix(walkvm): unwind through virtual thread continuation boundaries#450
fix(walkvm): unwind through virtual thread continuation boundaries#450
Conversation
52107ce to
19ae1ee
Compare
Fixes walkVM to correctly traverse JVM virtual thread (Project Loom) continuation boundaries, exposing carrier thread frames in wall-clock profiles. Two unwind paths are implemented: - Path A (enterSpecial): CPU-bound VTs that never yield — all frames are thawed; the profiler traverses the enterSpecial nmethod by identity to reach carrier frames via ContinuationEntry. - Path B (cont_returnBarrier): blocking VTs that park/unpark — when remounted with frozen frames in the StackChunk, cont_returnBarrier is the return PC of the bottommost thawed frame. Checked before CodeHeap::findNMethod() since it is a JVM stub, not an nmethod. By default a synthetic "JVM Continuation" root frame (BCI_NATIVE_FRAME) is inserted at the boundary so the sample is not marked truncated. With wextend=vt_carrier the profiler walks through to carrier frames; failures emit BCI_ERROR (truthful truncation). The wextend argument is string-parsed and extensible for future flags. Additional changes: - Add carrier_frames bit to StackWalkFeatures (uses one padding bit) - Use FRAME_PC_SLOT for architecture-portable carrier frame extraction - Split VMContinuationEntry into DECLARE_V21_TYPES_DO to prevent assert(type_size() > 0) on JDK <21 debug builds; expand at all four declare/init/read/verify sites - Three new counters: WALKVM_CONT_BARRIER_HIT, WALKVM_ENTER_SPECIAL_HIT, WALKVM_CONT_ENTRY_NULL - isValidFP / isValidSP helpers with unit tests in stackWalker_ut.cpp - DDPROF_DISABLE_CONT_UNWIND env var for negative testing (DEBUG only) - Integration tests: VirtualThreadWallClockTest covers both paths on JDK 21+ with wextend=vt_carrier Resolves SCP-1110 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
19ae1ee to
8ed7897
Compare
CI Test ResultsRun: #23807620068 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Failed Testsmusl-aarch64/debug / 25-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 21-librcaJob: View logs No detailed failure information available. Check the job logs. glibc-aarch64/debug / 21Job: View logs No detailed failure information available. Check the job logs. glibc-aarch64/debug / 25-graalJob: View logs No detailed failure information available. Check the job logs. musl-amd64/debug / 25-librcaJob: View logs No detailed failure information available. Check the job logs. glibc-aarch64/debug / 25Job: View logs No detailed failure information available. Check the job logs. glibc-aarch64/debug / 21-graalJob: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 21-graalJob: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 21Job: View logs No detailed failure information available. Check the job logs. Summary: Total: 32 | Passed: 23 | Failed: 9 Updated: 2026-03-31 16:45:45 UTC |
Scan-Build Report
Bug Summary
Reports
|
||||||||||||||||||||||||||||||||||||
ddprof-test/src/test/java/com/datadoghq/profiler/wallclock/VirtualThreadWallClockTest.java
Fixed
Show fixed
Hide fixed
ddprof-test/src/test/java/com/datadoghq/profiler/wallclock/VirtualThreadWallClockTest.java
Fixed
Show fixed
Hide fixed
ddprof-test/src/test/java/com/datadoghq/profiler/wallclock/VirtualThreadWallClockTest.java
Fixed
Show fixed
Hide fixed
ddprof-test/src/test/java/com/datadoghq/profiler/wallclock/VirtualThreadWallClockTest.java
Fixed
Show fixed
Hide fixed
Move JDK 21+ virtual-thread fields (_cont_entry_offset, _cont_return_barrier_addr, _cont_entry_return_pc_addr, _cont_entry_parent_offset) from DECLARE_TYPE_FIELD_DO to a new DECLARE_V21_TYPE_FIELD_DO macro that is excluded from verify_offsets() assertions. These fields are absent from gHotSpotVMStructs in many JDK 21-26 distributions, causing SIGABRT in debug builds. Add C++ symbol-lookup fallback in resolveOffsets() for StubRoutines::_cont_returnBarrier and ContinuationEntry::_return_pc so Path A (enterSpecial detection) activates on JDK 21-26 even when vmStructs does not export them. Guard VMJavaThread::contEntry() against type_size()==0 and change walkThroughContinuation to accept a path_a flag: on JDK 21-26 the enterSpecial frame FP is derived directly from the current fp rather than via ContinuationEntry::entryFP(), avoiding the assert in that method. Nested continuation tracking is silently skipped on JDK 21-26. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| @TestTemplate | ||
| @ValueSource(strings = {"vm", "vmx", "fp", "dwarf"}) | ||
| @SuppressWarnings("unused") // cstack is injected by @CStack test-template extension | ||
| public void samplesCarrierFramesFromBlockingVT(@CStack String cstack) throws Exception { |
| @TestTemplate | ||
| @ValueSource(strings = {"vm", "vmx", "fp", "dwarf"}) | ||
| @SuppressWarnings("unused") // cstack is injected by @CStack test-template extension | ||
| public void samplesCarrierFramesFromCpuBoundVT(@CStack String cstack) throws Exception { |
What does this PR do?:
Fixes
walkVMto correctly unwind through JVM virtual thread (Project Loom) continuation boundaries, exposing carrier thread frames in wall-clock profiles. Implements two unwind paths:enterSpecialnmethod by identity to reach carrier frames viaContinuationEntry.cont_returnBarrieris the return PC of the bottommost thawed frame. This is a JVM stub (not an nmethod), so it must be checked beforeCodeHeap::findNMethod().By default, when a continuation boundary is reached the profiler inserts a synthetic
"JVM Continuation"root frame (BCI_NATIVE_FRAME) and stops — the VT's own frames are complete and carrier internals are not profiling-relevant for most users. The sample is not marked truncated.wextend=vt_carrieropt-in: power users can enable carrier frame walking via the newwextendargument. Withwextend=vt_carrierthe profiler walks through the continuation boundary to the carrier thread; any failure to do so emits aBCI_ERRORframe so the sample is truthfully marked truncated. Additional tokens may be added towextendin the future for other stack-walk extensions.Parent-chain walking via
ContinuationEntry::_parentis supported for nested continuations (e.g. aContinuation.run()inside a VT, as used by Kotlin coroutines). Not triggered by standard single-level VTs today, but required once any JVM language runtime layers continuations on top of VTs.Motivation:
Before this fix, wall-clock profiles of applications using Java 21+ virtual threads showed truncated stack traces — carrier thread frames (
ForkJoinWorkerThread) were never visible. The root causes were:cont_returnBarrieris a JVM stub, not in the nmethod table — the check was dead code placed afterfindNMethod().enterSpecialdetection was missing entirely.Additional Notes:
DDPROF_DISABLE_CONT_UNWIND=1env var disables both unwind paths at runtime (DEBUG builds only) for negative testing.WALKVM_CONT_BARRIER_HIT,WALKVM_ENTER_SPECIAL_HIT,WALKVM_CONT_ENTRY_NULL.isValidFP,isValidSP) guards all dereferences; any remaining SIGSEGV from stale pointers is caught by the existing setjmp crash protection inwalkVM.entryFP()layout:[ContinuationEntry bytes][carrier_fp][carrier_pc][carrier_sp...]— confirmed against OpenJDK source. UsesFRAME_PC_SLOTfor architecture portability (ppc64 hasFRAME_PC_SLOT=2).VMContinuationEntryis split intoDECLARE_V21_TYPES_DO(separate fromDECLARE_TYPES_DO) so thatverify_offsets()in DEBUG builds does not asserttype_size() > 0on JDK < 21.carrier_framesfeature bit added toStackWalkFeatures.How to test the change?:
Integration tests:
VirtualThreadWallClockTestcovers both paths on JDK 21+ withwextend=vt_carrier:samplesCarrierFramesFromCpuBoundVT— verifies Path A (enterSpecial)samplesCarrierFramesFromBlockingVT— verifies Path B (cont_returnBarrier)Both tests assert that
ForkJoinWorkerThreadcarrier frames appear in at least one wall-clock sample from the virtual thread. Skipped on JDK < 21.Unit tests:
stackWalker_ut.cppcoversStackWalkValidation::isValidFP,isValidSP, anddropUnknownLeafboundary conditions.Run with:
./gradlew ddprof-test:testRelease(ortestDebugif you want to check also the negative-test path).For Datadog employees: