feat: speed up `first_fit` in sequence packing with a segment tree by fangwei123456 · Pull Request #15563 · NVIDIA-NeMo/NeMo

fangwei123456 · 2026-03-30T09:56:58Z

Speed up `first_fit` in sequence packing with a segment tree

What does this PR do?

Replaces the O(n) linear scan in find_first_bin_that_fits with an O(log n) segment tree, reducing the overall first_fit packing complexity from O(n^2) to O(n log n). This significantly speeds up sequence packing for large datasets.

A backend parameter is added to first_fit with two options:

"segment_tree" (default) — uses a segment tree for O(log n) per-query lookup
"naive" — uses the original O(n) linear scan

The function signature and return type remain backward-compatible. Downstream callers (first_fit_decreasing, first_fit_shuffle, create_packing_strategy, fill_packing_strategy) require no changes.

This modification is particularly crucial for processing large datasets: in our own experiments, the time required to process 50GB of data was reduced from two and a half hours to just one minute.

Changes

nemo/utils/sequence_packing_utils.py
- Added _SegmentTree class: a 1-indexed flat-array segment tree that stores per-bin remaining capacity, with internal nodes tracking the max of their children. Supports open_bin, query (leftmost bin with capacity >= s), and update in O(log n).
- Added backend parameter ("segment_tree" | "naive") to first_fit.
- Extracted _first_fit_naive and _first_fit_segment_tree as the two backend implementations.
- Kept find_first_bin_that_fits with a deprecation note for backward compatibility.
tests/utils/test_first_fit_backends.py (new)
- 13 parametrized cases + 1 large random test (5000 sequences) verifying both backends produce identical output.
- 1 test for invalid backend error.
- 1 performance benchmark asserting segment tree is faster than naive.

Performance

Benchmarked on 10,000 random sequences (lengths 1–500, pack_size=1024):

Backend	Time
`naive`	0.612s
`segment_tree`	0.024s
Speedup	25x

Tests

pytest tests/utils/test_first_fit_backends.py -v --noconftest

All 16 tests pass, including correctness (both backends match) and performance (segment tree > 2x faster).

Signed-off-by: Wei Fang wei.fang@miromind.ai

Signed-off-by: wei.fang <wei.fang@miromind.ai>

pzelasko · 2026-04-15T17:22:13Z

Which collection is this used in?

fangwei123456 · 2026-04-16T09:43:50Z

@pzelasko This module (nemo/utils/sequence_packing_utils.py) is a standalone utility for sequence packing, located under nemo/utils/ rather than tied to a specific collection. It is not currently imported by other modules in the repo — it's designed to be called as a preprocessing step (e.g., via script) for packing tokenized sequences before training.

pzelasko · 2026-04-16T13:59:25Z

Are you using this for speech / speech LLM models or something else? The repo was recently split and many collections (llm, vlm, diffusion) moved to their own repos in https://github.com/NVIDIA-NeMo org.

Since these utilities are not used for any speech collection, my first thought is that it was on oversight to keep them when purging deprecated collection code. I don't think any of the speech models currently supports packed sequences (although we might add that support later).

fangwei123456 · 2026-04-17T02:55:34Z

I use this packing technique to concatenate samples of different text types into long sequences for SFT training of the LLM. Thank you for your reply. If this method is not useful for speech processing, we can close this PR.

speed up first_fit using a segment tree

39dcd90

Signed-off-by: wei.fang <wei.fang@miromind.ai>

fangwei123456 changed the title ~~Speed up first_fit in sequence packing with a segment tree~~ feat: speed up first_fit in sequence packing with a segment tree Mar 30, 2026

fangwei123456 added 2 commits March 30, 2026 22:19

Merge branch 'main' into feat/find_first_bin_that_fits_segment_tree

1ba314c

Merge branch 'main' into feat/find_first_bin_that_fits_segment_tree

5677028

github-actions bot added the community-request label Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: speed up `first_fit` in sequence packing with a segment tree#15563

feat: speed up `first_fit` in sequence packing with a segment tree#15563
fangwei123456 wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
fangwei123456:feat/find_first_bin_that_fits_segment_tree

fangwei123456 commented Mar 30, 2026 •

edited

Loading

Uh oh!

pzelasko commented Apr 15, 2026

Uh oh!

fangwei123456 commented Apr 16, 2026

Uh oh!

pzelasko commented Apr 16, 2026

Uh oh!

fangwei123456 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fangwei123456 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Speed up first_fit in sequence packing with a segment tree

What does this PR do?

Changes

Performance

Tests

Uh oh!

pzelasko commented Apr 15, 2026

Uh oh!

fangwei123456 commented Apr 16, 2026

Uh oh!

pzelasko commented Apr 16, 2026

Uh oh!

fangwei123456 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fangwei123456 commented Mar 30, 2026 •

edited

Loading

Speed up `first_fit` in sequence packing with a segment tree