fix(demos): cap build parallelism to avoid gateway OOM#62
Merged
Conversation
Gateway has a few heavily-templated TUs (~1.6 GB RSS each); compiling them all in parallel (one per core) OOM-killed cc1plus on many-core or RAM-limited Docker hosts. Use a sequential executor + MAKEFLAGS='-j 2' (peak ~3.2 GB, verified building under a 6 GB cap).
bburda
approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Docker builds for the demos fail intermittently with:
The build always dies in
ros2_medkit_gateway. It is not a compile error and not architecture-specific (diagnostic_bridgeand the other packages build cleanly). It is an out-of-memory kill of the compiler.Root cause
The gateway has a few heavily-templated translation units (HTTP handlers) that peak at ~1.6 GB RSS each when compiling. colcon/make compiled them in parallel, one job per CPU core, so total peak scaled with core count. On many-core or RAM-limited Docker hosts (e.g. Docker Desktop), several heavy units overlap and exceed the memory limit, so
cc1plusis OOM-killed. Whether a given machine succeeds depended on core count and scheduling timing, so it was flaky (passed on 8-core/8 GB, failed on a higher-core host even at 12 GB).Fix
Limit build parallelism in the four demos that build the gateway from source:
--executor sequential(one package at a time) +MAKEFLAGS='-j 2'(at most two compilers at once). Peak drops to ~3.2 GB, making the build deterministic across hosts.manymove_industrialalready has its own parallelism knob (MANYMOVE_COLCON_WORKERS) and is left unchanged.Verification
Built the gateway from
maininside a container hard-capped at--memory=6g --memory-swap=6g:Passes at 6 GB with margin; the previous unbounded build was OOM-killed under the same cap.