Skip to content

fix(api): prevent background task garbage collection in lifespan startup#6551

Open
alexzhu0 wants to merge 1 commit into
keephq:mainfrom
alexzhu0:fix/api-background-task-gc
Open

fix(api): prevent background task garbage collection in lifespan startup#6551
alexzhu0 wants to merge 1 commit into
keephq:mainfrom
alexzhu0:fix/api-background-task-gc

Conversation

@alexzhu0
Copy link
Copy Markdown

@alexzhu0 alexzhu0 commented May 31, 2026

Closes #6552

Problem

Two asyncio.create_task(...) calls in keep/api/api.py create background tasks without holding a strong reference to the returned Task object. The Python asyncio docs warn that the event loop only keeps a weak reference to a task, so a fire-and-forget task can be garbage-collected mid-execution and silently cancelled — no traceback, just missing work.

Important: Save a reference to the result of [create_task()], to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn't referenced elsewhere may be garbage collected at any time, even before it's done.
https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task

Affected sites

  1. startup() line 183 — the maintenance-window watcher:

    asyncio.create_task(process_watcher_task.async_process_watcher())

    Zero strong references. The Task can be collected before it ever runs async_process_watcher, and when it does the watcher just disappears.

  2. lifespan() line 234 — the debug pending-task logger:

    asyncio.create_task(check_pending_tasks(background_tasks))

    check_pending_tasks is passed the background_tasks set as an argument, but the Task wrapping the coroutine is never added to that set. So this task also lacks a strong reference and is GC-eligible.

Root cause

CPython garbage-collects coroutine Tasks the same way it collects any other object. The asyncio event loop only keeps weak references to running tasks (asyncio.all_tasks() is implemented via a WeakSet). If no other code holds a strong reference, the GC is free to reclaim the Task at any GC pass, which cancels it.

Changes

keep/api/api.py — one file, +10 / -5 lines:

  • startup() now accepts an optional background_tasks: set | None = None parameter; when present the watcher task is added to it and registers a done-callback to discard itself on completion (task.add_done_callback(background_tasks.discard)).
  • lifespan() passes its existing background_tasks set into startup(), and now also adds the debug check_pending_tasks task to that same set with the same done-callback. The set was already being created here for request-context use; this just makes the lifespan tasks share it.
  • The set type annotation is tightened to set[asyncio.Task] for clarity.

Backward compatible: any other caller of startup() continues to work because the new parameter defaults to None.

Testing

Manual reasoning + asyncio docs reference; the GC-cancellation behaviour is established Python asyncio semantics and the fix matches the pattern recommended in the stdlib docs (hold strong reference in a set, discard via done-callback).

I considered adding a unit test but a test that reliably triggers gc.collect() mid-task in pytest would be fragile and the bug class is well-documented. Happy to add a regression test that simply asserts the watcher Task ends up in app.state if maintainers prefer.

The two `asyncio.create_task(...)` calls in `keep/api/api.py` had no strong
reference held to the returned Task object. Per the Python asyncio docs
(https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task),
the event loop only keeps a weak reference to a task, so a fire-and-forget
task can be garbage-collected mid-execution. When that happens the task is
silently cancelled and the user sees no traceback, just missing work.

Affected sites:
- `startup()` line 183: `asyncio.create_task(process_watcher_task.async_process_watcher())`
  — the maintenance-window watcher had zero strong references.
- `lifespan()` line 234: `asyncio.create_task(check_pending_tasks(background_tasks))`
  — the debug pending-task logger was passed `background_tasks` as an argument
  but never added itself to that set, so it had no strong reference either.

Fix: hold a strong reference by adding each task to the existing
`background_tasks` set that `lifespan()` already maintains for request-context
use, and register a done-callback so the task removes itself on completion.
`startup()` now accepts an optional `background_tasks` set; when present it is
used to track the watcher task. Passing `None` keeps backward compatibility
for any other caller.
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. Bug Something isn't working labels May 31, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 31, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: background tasks created in api.py lifespan can be garbage-collected mid-execution

2 participants