feat: background workers = non-HTTP workers with shared state#2287
feat: background workers = non-HTTP workers with shared state#2287nicolas-grekas wants to merge 8 commits intophp:mainfrom
Conversation
e1655ab to
867e9b3
Compare
|
Interesting approach to parallelism, what would be a concrete use case for only letting information flow one way from the sidekick to the http workers? Usually the flow would be inverted, where a http worker offloads work to a pool of 'sidekick' workers and can optionally wait for a task to complete. |
da54ab8 to
a06ba36
Compare
|
Thank you for the contribution. Interesting idea, but I'm thinking we should merge the approach with #1883. The kind of worker is the same, how they are started is but a detail. @nicolas-grekas the Caddyfile setting should likely be per |
ad71bfe to
05e9702
Compare
|
@AlliBalliBaba The use case isn't task offloading (HTTP->worker), but out-of-band reconfigurability (environment->worker->HTTP). Sidekicks observe external systems (Redis Sentinel failover, secret rotation, feature flag changes, etc.) and publish updated configuration that HTTP workers pick up on their next request; with per-request consistency guaranteed via Task offloading (what you describe) is a valid and complementary pattern, but it solves a different problem. The non-HTTP worker foundation here could support both. @henderkes Agreed that the underlying non-HTTP worker type overlaps with #1883. The foundation (skip HTTP startup/shutdown, immediate readiness, cooperative shutdown) is the same. The difference is the API layer and the DX goals:
Happy to follow up with your proposals now that this is hopefully clarified. |
05e9702 to
8a56d4c
Compare
|
Great PR! Couldn't we create a single API that covers both use case? We try to keep the number of public symbols and config option as small as possible! |
Yes, that's why I'd like to unify the two API's and background implementations into one. Unfortunately the first task worker attempt didn't make it into |
|
The PHP-side API has been significantly reworked since the initial iteration: I replaced The old design used
Key improvements:
Other changes:
|
cb65f46 to
4dda455
Compare
|
Thanks @dunglas and @henderkes for the feedback. I share the goal of keeping the API surface minimal. Thinking about it more, the current API is actually quite small and already general:
The name "sidekick" works as a generic concept: a helper running alongside. The current set_vars/get_vars protocol covers the config-publishing use case. For task offloading (HTTP->worker) later, the same sidekick infrastructure could support:
Same worker type, same So the path would be:
The foundation (non-HTTP threads, cooperative shutdown, crash recovery, per-php_server scoping) is shared. Only the communication primitives differ. WDYT? |
b3734f5 to
ed79f46
Compare
|
|
|
Hmm, it seems they are on some versions, for example here: https://github.com/php/frankenphp/actions/runs/23192689128/job/67392820942?pr=2287#step:10:3614 For the cache, I'm not aware of a Github feature that allow to clear everything unfortunately 🙁 |
|
@AlliBalliBaba pointed me to this PR: #2306 (comment) Reading the discussion it seems there is agreement that the shared state layer is essentially a process-wide key-value store. Took the time over the weekend to build one as a proof of concept, which I think offers some interesting benefits over the approach proposed in this PR. You can find the code here: https://github.com/johanjanssens/frankenstate. It implements a cross-thread shared state as a standalone FrankenPHP extension, a $state = new FrankenPHP\SharedArray();
$state['feature_flags'] = ['new_ui' => true];
$flags = $state['feature_flags'];Key differences from the
With this approach the "config watcher" flow changes to: The store is a KV with a standard protocol. The sync logic lives wherever it makes sense. A cron job, a deploy script, a Redis subscriber, whatever pushes data in over the Redis protocol, or whatever Go code uses the API. FrankenPHP doesn't need to know or care where the data comes from. Might be worth considering as an alternative approach to the shared state part of this PR. With a standalone shared state store, the background worker and signaling infrastructure in this PR may not be needed, or could be developed as a more generic solution decoupled from state management. Happy to develop it further if there is interest. Do not want to hijack this PR. Created a discussion thread here: #2324. |
|
To truly benefit from ZTS, you'll have to do what's done in this PR and leave the copy mechanism on the C side. Couple that with potential 0-copy if the value hasn't changed and performance is potentially orders of magnitude better. I still think we need a fast built-in cross-thread copy mechanism like was done in this PR, just think that the API could be more generic. |
Essentially my only remaining issue here.
I feel like it would be great if background workers could use this more general KV store as a backend to push/pull values back to http threads that started the background workers. I'm really sorry for being so nitpicky here because it's a great addition, I'm just focussed on making sure the decisions made here won't bite us later. I'll give the code a proper review in a few days again. |
@nicolas-grekas i thought you widened the argument to also accept single string or other scalar value. or have you reverted that? if its reverted, i retire my suggestion :-) regarding the general KV store vs workers: from my understanding, the intention from nikolas was that workers could be automatically launched on demand when a namespace is requested. maybe an approach could be:
setting values in the worker could use the generic set KV store method. except if the worker get method needs to be aware when the value is set, maybe we'd need a specific worker set value method to unlock the blocking call. but this would separate the handling of on-demand workers from the underlying KV mechanism. alternatively, we could have only the worker part and later refactor to a generic KV mechanism... |
Then we have an extra function for essentially the same thing, which is what I want to avoid. I'm closely aligned with @AlliBalliBaba's vision here. Similar to what he proposed: // In library
$redisEndpoints = frankenphp_get_vars('redis:redis-host');
if(!$redisEndpoints){
frankenphp_start_background_worker('/path/to/background-worker.php', [
'host' => 'redis-host'
]);
}
$redisEndpoints = frankenphp_get_vars('redis');Except that I like @nicolas-grekas concept of at-most-one worker, which would simplify this to: # in library
frankenphp_start_background_worker("projectDir/path/to/redis-updater.php", scopes: ['redis-host'], args: []);
$redisEndpoints = frankenphp_get_vars('redis-host');
# in redis-updater
frankenphp_set_vars($scopes, ['host' => 'new-redis-host']); |
|
@dbu Scalar widening was reverted; see #2287 (comment) About the explicit Entrypoint from PHP = security surface. The Caddyfile is the trust boundary today. The sysadmin declares which scripts can run as background workers, and PHP code references them by name. If PHP can pass arbitrary script paths to Generic Lazy-start is the feature, not a side effect. The whole point of I therefor reverted back to As a reminder, my goal here is to push the boundaries of what can be done in a PHP app. A generic KV store is not such a boundary. APCu, Redis, etc all exist to tackle this. FrankenPHP can certainly have its say on the matter, but that's a different problem area than this PR. Background workers give PHP something it never had: long-running processes that share state with HTTP workers in the same process, with a powerful and simple design. That's the feature this PR delivers. Let's land it :) |
It's the same security-wise as a library being able to call Having this as an explicit function just clarifies what actually happens. I still think it would be a lot simpler to not allow lazy starts and just have people configure workers directly. If we do allow starting background workers at runtime, it should be very explicit and not be hidden behind a getter function. |
|
With the current implementation, it just becomes very hard to track down potential issues. An example: I see that With Without lazy start it's even simpler, the worker needs to call |
I disagree here. I never advocated for arbitrary paths, only files within the current root. It's the same attack vector as someone on the internet visiting that file.
There's one-writer, many-readers concurrency either way. If you don't want conflicts, don't use the same scope (worker) names. The API I proposed already passed the scopes a background worker has access to in the explicit
No need to check if (!$redisEndpoints). The boilerplate here is one call to
We absolutely and whole-heartedly agree! We only disagree on the magic, lazy starting, specific API that would be better solved by integrating with a more general KV store backend. If we merge this with your proposed API in a year we will have:
|
|
I've been thinking about this while working on other things, https://github.com/symfony/php-ext-deepclone to name one of them which is of direct interest to this PR. TL;DR: I'd like to keep the current API shape ( Here is why, listing your arguments: " " "It's no worse than " There's also a design-level cost: taking a path at the API surface leaks the implementation. Right now, a name is an opaque handle for "some process that publishes this data". The mechanism is up to the infra running the app: on FrankenPHP it starts a local PHP script, on a polyfill we could dispatch to an external process. The moment "one-writer-many-readers concurrency either way": this is a deep design choice, not a cosmetic one. The current model enforces a single writer per scope at the type level: only the background worker holding "FrankenPHP should have a generic KV store": I'd argue that's outside FrankenPHP's scope. APCu, Redis, Memcached, and plenty of extensions already do this well. FrankenPHP's unique value is thread and worker management: that's what no PHP extension can replace, because it requires ownership of the process model. The API surface we should expose is the one that couldn't be done without FrankenPHP. That's how we keep it minimal. "start_worker is just more explicit about what the get function does": this is probably the root of the disagreement. Lazy-start from On API bloat: if we don't add Catch-all: It's what allows a library to ship a background worker as a self-contained package without requiring the user to know its internal worker names. That's a real library-ergonomics win. The failure mode (timeout with unclear cause) is solvable with the better error messages mentioned above, not by removing the feature. To recap what this PR delivers:
That's the feature. I'd love to land it ASAP :) |
I think we all would, really, but we definitely need to agree on the API for it. I think we need some outside opinions from @dunglas @alexandre-daubois and @withinboredom at this point to see where we're most likely to reach sufficient consensus. I will try to address your points later when I've got time, I unfortunately still don't agree on all of them. |
|
Thanks for the ping, it got out of my mind. I'll catch up with the conversation this week! |
|
I'm in Berlin until Friday BTW! Let's meet at SymfonyLive or around? @AlliBalliBaba also? Please join me on https://symfony.com/slack |
|
After re-reading the thread, here's my position. The target use case feels right, and I don't have a concrete need today that would justify use cases beyond the background-to-HTTP flow, like HTTP workers publishing data between themselves. I also have no objection in principle to lazy-starting from PHP: the DX it provides is valuable, and starting a worker can remain conditional on an app-level trigger. Feels a bit like goroutines, I like it. That said, two points tip me toward an API that cleanly separates the lifecycle from the data store:
Concretely, I'd support an API where the store is exposed in a generic form, and where starting a background worker is a named, explicit operation. Nothing prevents lazy-start via Caddyfile catch-all from still kicking in on a On the other points, I'm aligned with what already seems to be converging in the thread. Sorry if I forgot something, there are quite a few comments 🙂 |
|
Thanks for taking the time @alexandre-daubois. Two things I'd like to push back on, plus one structural argument that I think hasn't been engaged with yet. On "freezes the On the diagnostic chain concern: I offered some improvements in my previous reply, I think we can push this further until errors from workers are surfaced better to callers, all compatible with the API I'm proposing. That's the one aspect I think we can improve further. On the explicit
The current design sidesteps both: one The structural argument I'd like your read on: the current model is one-writer-many-readers by construction. Only the background worker owning |
Not necessarily the public webroot, but a root defined by the Caddyfile for sure. The problem with your approach is that it limits to a single background worker script, which is likely in a framework like Symfony with a central kernel and container, but otherwise not.
It's a fair point, but again, I'm not concerned with security when it's explicitly configurable through some background-worker-directory. We're not giving anyone a gun here, they're stealing it, pointing it at their foot and fire repeatedly when someone actually manages to run into a security issue with it. And they could do the same with a single entrypoint, too.
While adding API surface is true, but it's unified api surface that we'd most likely add at some point anyways. Then it's better to have an explicit API than magic behaviour on one, but not the other.
That's a very fair point that I don't have a perfect solution to. It's actually one where I'm going back and fourth between even using names (and just using anonymous lists) in another project I'm working on.
See point 1, because at that point it would just be confusing about what issues a lazy start and what doesn't. (And I'm honestly not even sure how useful a lazy start really is, what problem does that solve? The library will have a dependency on the Caddyfile configuration at that point and the worker existing if it's hit once, and if it is, it would never shut down again)
These are all more or less the same point of disagreement which is: this PR is locking that decision in "forever". No generic KV store, no matter if it would ever make sense (I'd argue it would, how else would you share vars within the same application on different threads, but guard it from being accessed by other, unrelated applications? Using apcu for this is very dirty and will suffer from heavy fragmentation for a runtime concern. I think @alexandre-daubois essentially has the same considerations that Alex and I do too.
It would obviously be blocking until started, but it would a generic API surface that could be reused for task workers, that we're still intending to add. And it would be explicit. And it would solve the inability to reason about what a unified
I just think the actual issue with it is the same as before: worker string names lead to poor reasoning. If library A uses 'redis' and library B uses the same, but both expect different worker scripts, we have the exact same issue that the many-writers, many-readers has. If we don't have conflicting worker names, there's no issue with many-writers-many-readers either. |
|
Marc has already articulated most of where I land, so I'll stay short and add a few angles I don't think have come up yet. On the DNS / Redis / Symfony service name analogy, I think the comparison doesn't hold. Those APIs deliberately separate concerns: DNS has About testability: a global function whose call can spawn a worker process is hostile to unit tests. Libraries adopting this will either need to wrap it in their own abstraction or give up on isolation in tests. A store-shaped API is materially more mockable. Also, about the principle of least surprise: Finally, genuine question about the API: is it possible to unset a key? |
|
Edited: I missed last response by @alexandre-daubois and I agree with him. API updated. Thanks, everyone, for the depth of this one! @nicolas-grekas for the huge amount of work, and @henderkes, @AlliBalliBaba, @alexandre-daubois, @dbu for the careful pushback. I've read through the whole thread, and I think we're close to merging it. Most of the back-and-forth is really Here's my opinion on this: Caddyfile and the whole Go/C runtime stay as Nicolas designed them, but we make small changes to the PHP API:
On unsetting a key: with snapshot semantics it's just set_vars a new array without the key — no dedicated primitive needed. If we ever add per-key writes we'd add a matching unset. We can apply the same logic for #2319, drop the
The fact that a worker picks up the task is an implementation detail. WDYT? |
|
Looks like the best of both worlds @dunglas. Dropping Sorry if this was answered somewhere in the comments: what's the defined behavior when the caller has no worker scope, e.g. called from an HTTP request context, a CLI script, or any non-worker code path? Should it be no-op or throw a |
|
I would throw too |
Why do we need a timeout for the get_vars? Shouldn't that just return immediately since the prior
Perhaps this should return an object on which php can call
You're talking about I'm generally happy with that direction, but I'd still want to argue the case for being able to define multiple background worker scripts. We went out of our way to support non-framework code all the way up until this point, for the gain I see (for a single script would already mostly disappear with an explicit |
|
Thanks @dunglas for the proposal, I think we're very close. Let me suggest a small refinement that I think fully addresses the debuggability objection without giving up anything structural. Proposal (noted about #2319 also)frankenphp_require_background_worker(string $name, float $timeout = 30.0): void
frankenphp_set_vars(array $vars): void
frankenphp_get_vars(string|array $name): array
frankenphp_get_worker_handle(): resourceFour functions, same count as your proposal. Two differences:
|
Renames PHP API for forward compat (the API can later serve non-worker use cases): frankenphp_set_worker_vars -> frankenphp_set_vars frankenphp_get_worker_vars -> frankenphp_get_vars Also enriches get_vars timeout errors when the worker fails before reaching set_vars: the exception now includes the worker name, resolved entrypoint path, exit status, number of attempts, and the last PHP error (message, file, line) captured from PG(last_error_*).
Makes lifecycle explicit and decouples it from data access: - Add frankenphp_require_background_worker(string $name, float $timeout = 30.0): lazy-starts the worker and blocks until it has called set_vars once (ready) or the timeout expires. Throws on boot failure with the same rich details. - frankenphp_get_vars(string|array $name): array is now a pure read. It no longer starts workers or waits for readiness. Throws if the target worker is not running or has not called set_vars yet. The $timeout argument is removed (no blocking). This eliminates the "sometimes starts, sometimes doesn't" side effect on reads and makes traceability linear: grep for require_background_worker to find where a dependency is declared. All PHP test scripts are updated to call require_background_worker before get_vars. The Go runtime now exposes two separate exports (go_frankenphp_require_background_worker, go_frankenphp_get_vars) with focused responsibilities.
frankenphp_require_background_worker now behaves according to the caller's context, which makes the call site carry meaningful intent: - HTTP worker BEFORE frankenphp_handle_request (bootstrap): lazy-start and fail-fast. As soon as a boot attempt fails, the rich error is thrown without waiting for the retry backoff cycle. This turns the bootstrap phase into a strict dependency declaration: broken deps = worker boot fails visibly, not serves traffic degraded. - HTTP worker INSIDE frankenphp_handle_request (runtime): assert-only. The worker must already be running (num 1 in Caddyfile, or previously required during bootstrap). Throws immediately if not. Never lazy-starts. Runtime require becomes a clean assertion, not a side-effectful call. - Non-worker mode (classic request, CLI): lazy-start with tolerance. Waits up to timeout, letting the restart/backoff loop recover from transient failures. Matches the existing behavior every request does anyway. Mode detection uses workerThread.isBootingScript, which is already tracked. Test PHP files are restructured to call require_background_worker at bootstrap (before frankenphp_handle_request). New tests cover: - Runtime require asserting on unknown worker - get_vars throwing on not-running worker The boot-failure test is moved to non-worker mode so the tolerant path can exercise the rich error reporting. Docs describe the three modes with examples.
In CLI mode (frankenphp php-cli), there is no worker pool and no SAPI request cycle; the background-worker API is meaningless. Rather than throwing at call time, the four functions are now unregistered from the function table in MINIT when the SAPI is "cli": frankenphp_require_background_worker frankenphp_set_vars frankenphp_get_vars frankenphp_get_worker_handle This lets library code detect the mode cleanly via function_exists() and fall back to alternative config sources without try/catch. HTTP-mode set_vars cannot be hidden the same way because the context (HTTP worker vs background worker) is per-thread, not per-process; threads share the PHP module and function table. Runtime throw stays the approach there, matching PHP's convention for pcntl_*/posix_*.
Background workers inherited the HTTP-worker wording which mentioned frankenphp_handle_request — that function doesn't apply to bg workers. The ready signal for a bg worker is frankenphp_set_vars. - startupFailChan error: "background worker %s has not reached frankenphp_set_vars()" - Warn log (watcher): "(watcher enabled) background worker has not reached frankenphp_set_vars()" - Warn log (normal): "background worker boot failed, restarting" The log now also carries exit_status and, when captured, the last PHP error message from bootFailureInfo. Tailing the FrankenPHP log at Warn level is enough to see what went wrong without reading the PHP error log separately.
In CLI mode the bg-worker functions are no longer exposed, so listing CLI as a "non-worker mode" case is misleading. Non-worker mode now means classic request-per-process (HTTP requests where no worker is configured), which is the only case where require still runs with tolerant lazy-start semantics.
Was: scope passed around as string (formatted "php_server_N") with "" as the "no scope" sentinel. Every consumer had to agree on the string shape, and changing the representation would be a BC break on the public API (NextBackgroundWorkerScope, WithWorkerBackgroundScope, WithRequestBackgroundScope). Now: type BackgroundScope int. Opaque wrapper; callers can only obtain one via NextBackgroundWorkerScope(). The zero value (BackgroundScope(0)) keeps the "no scope" semantics, so the embed/non-caddy path is unchanged. - Type safety: can't accidentally pass an env var or worker name as a scope. - Forward BC: internal representation can change (int -> struct, different hashing, whatever) without touching consumer code. - No overhead: int map keys vs string map keys at startup/per-request is noise.
Summary
Background workers are long-running PHP workers that run outside the HTTP cycle. They observe their environment (Redis, DB, filesystem, etc.) and publish variables that HTTP workers read per-request - enabling real-time reconfiguration without restarts or polling.
PHP API
frankenphp_set_worker_vars(array $vars): void- publishes vars from a background worker (persistent memory, cross-thread). Skips all work when data is unchanged (===check).frankenphp_get_worker_vars(string|array $name, float $timeout = 30.0): array- reads vars from any worker context (blocks until first publish, generational cache)frankenphp_get_worker_handle(): resource- returns a pipe-based stream forstream_select()integration. Closed on shutdown (EOF = stop).Caddyfile configuration
backgroundmarks a worker as non-HTTPnamespecifies an exact worker name; workers withoutnameare catch-all for lazy-started namesmax_threadson catch-all sets a safety cap for lazy-started instances (defaults to 16)max_consecutive_failuresdefaults to 6 (same as HTTP workers)max_execution_timeautomatically disabled for background workersphp_serverblock has its own isolated scope (managed byNextBackgroundWorkerScope())Shutdown
On restart/shutdown, the signaling stream is closed. Workers detect this via
fgets()returningfalse(EOF). Workers have a 5-second grace period.After the grace period, a best-effort force-kill is attempted:
max_execution_timetimer cross-thread viatimer_settime(EG(max_execution_timer_timer))CancelSynchronousIo+QueueUserAPCinterrupts blocking I/O and alertable waitsDuring the restart window,
get_worker_varsreturns the last published data (stale but available). A warning is logged on crash.Forward compatibility
The signaling stream is forward-compatible with the PHP 8.6 poll API RFC.
Poll::addReadableaccepts stream resources directly - code written today withstream_selectwill work on 8.6 withPoll, no API change needed.Architecture
php_serverscope isolation with internal registry (unexported types, minimal public API viaNextBackgroundWorkerScope())backgroundWorkerThreadhandler implementingthreadHandlerinterface - decoupled from HTTP worker code pathsdrain()closes the signaling stream (EOF) for clean shutdown signalingpemalloc) withRWMutexfor safe cross-thread sharingset_worker_varsskip: uses PHP's===(zend_is_identical) to detect unchanged data - skips validation, persistent copy, write lock, and version bumpget_worker_varscalls return the same array instance (===is O(1))IS_ARRAY_IMMUTABLE)ZSTR_IS_INTERNED) - skip copy/free for shared memory strings$_SERVER['FRANKENPHP_WORKER_NAME']set for background workers$_SERVER['FRANKENPHP_WORKER_BACKGROUND']set for all workers (true/false)Example
Test coverage
17 unit tests + 1 Caddy integration test covering: basic vars, at-most-once start, validation, type support (enums, binary-safe strings), multiple background workers, multiple entrypoints, crash restart, signaling stream, worker restart lifecycle, non-background-worker error handling, identity detection, generational cache, named auto-start with m# prefix.
All tests pass on PHP 8.2, 8.3, 8.4, and 8.5 with
-race. Zero memory leaks on PHP debug builds.Documentation
Full docs at
docs/background-workers.md.