Skip to content

PyCode_Addr2Line shouldn't lock in free threaded Python #148037

@colesbury

Description

@colesbury

Bug report

In #138664 we added a critical section on the code object in PyCode_Addr2Line to fix crashes involving profiling.

There are two problems with using a lock here:

  1. Code objects are shared across threads and PyCode_Addr2Line is a frequently function, leading to contention during profiling.

  2. This introduces a stop-the-world safe point in profiling code, which can cause bugs if the profiling code isn't expecting it. For example, PyTorch's profiler isn't robust to a stop-the-world event (or GIL release) happening during profiling, and this leads to crashes. The problem is essentially:

  • PyTorch assumes that after PyEval_SetProfileAllThreads(NULL, NULL); no threads are still in the PyTorch profile code. This is incorrect due to the PyCode_Addr2Line() critical section suspending the thread state in the middle of a profile
  • PyTorch frees a bunch of profiler state, but thread suspended on PyCode_Addr2Line() resumes and tries reading that state leading to a crash.

This is a bug in PyTorch, so I'm going to fix it there as well, but the critical section in PyCode_Addr2Line() makes it much more likely to encounter the bug.

Proposed fix to CPython

  1. No critical section in PyCode_Addr2Line. Use atomic reads on co->_co_monitoring and co->_co_monitoring->lines
  2. Use atomic stores to initialize co->_co_monitoring and co->_co_monitoring->lines and reorder the stores so that the contents are initialized before the pointer is stored.

cc @kumaraditya303

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.14bugs and security fixes3.15new features, bugs and security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)topic-free-threadingtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions