feat: add --check-env preflight flag for OOM risk detection before quantization#14
Open
sotanengel wants to merge 7 commits into
Open
feat: add --check-env preflight flag for OOM risk detection before quantization#14sotanengel wants to merge 7 commits into
sotanengel wants to merge 7 commits into
Conversation
Add QuantizationProgressTracker and wire it through calibration, chunked calibration, multi-GPU phase 2, QEP general and arch-aware paths. Runner gains quantization_progress flag (default on). Includes unit tests for ETA formatting and thread-safe stepping. Co-authored-by: Cursor <cursoragent@cursor.com>
Author
|
全く急ぎではないのですが、自分で使っていて気になった点があったためPRを出させていただきました。 |
Raise clear error for unsupported QEP quantizers See merge request onecomp/onecomp-lab!71
…ation-progress-eta feat: quantization progress logs with ETA
* refactoring : QuantizationProgressTracker * update CHANGELOG.md --------- Co-authored-by: FKKimura <50981196+FKKimura@users.noreply.github.com>
Adds a --check-env CLI flag that collects physical hardware characteristics (GPU VRAM, CPU RAM, disk space) and model memory estimates before quantization starts, then classifies OOM risk as safe/warning/danger. Exits with code 1 on danger; otherwise prints a report and proceeds with quantization. - onecomp/utils/vram_estimator.py: new EnvironmentSnapshot, ModelMemoryProfile, EnvCheckResult dataclasses; check_environment() and print_env_report() functions reusing existing weight_memory_gb() and estimate_target_bitwidth() - onecomp/utils/__init__.py: export 5 new public symbols - onecomp/cli.py: --check-env argparse flag with preflight invocation - onecomp/runner.py: check_env=False kwarg in auto_run() for library API use - pyproject.toml: optional extras [check-env] = ["psutil>=5.9"] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
29a83f5 to
1eab710
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
While quantizing a large-scale LLM (70B+ parameters), the process crashed midway through with an out-of-memory error after running for several hours. There was no way to know in advance whether the available GPU VRAM was sufficient — the failure only surfaced deep into the quantization loop, wasting significant compute time.
This PR introduces a
--check-envpreflight flag that detects OOM risk before quantization starts, based on the physical characteristics of the execution environment.Summary
--check-envCLI flag (also available asRunner.auto_run(check_env=True)in the Python API)metadevice (zero GPU/CPU memory) to count parameterspsutil), disk spaceweight_memory_gb(), plus a calibration overhead factorsafe/warning/dangerand prints a human-readable reportdanger: exits with code 1 (CLI) or raisesRuntimeError(library API), stopping quantization before it wastes GPU timesafe/warning: prints the report and continues with quantization as normal (preflight behavior)Example output
Risk thresholds
safefree_vram ≥ fp16_size × 1.2warningfree_vram ≥ 4-bit size + calib overheaddangerChanged files
onecomp/utils/vram_estimator.pyEnvironmentSnapshot,ModelMemoryProfile,EnvCheckResult) +check_environment()+print_env_report()onecomp/utils/__init__.pyonecomp/cli.py--check-envargparse flag + preflight invocationonecomp/runner.pycheck_env: bool = Falsekwarg inauto_run()pyproject.tomlpip install ".[check-env]"addspsutilfor CPU RAM infoTest plan
onecomp <model_id> --check-env --no-eval— verify report is printed and quantization continuesonecomp <model_id> --check-env --total-vram-gb 1.0— verifydangertriggers exit code 1onecomp <model_id> --check-env --total-vram-gb 6.0 --no-eval— verifywarningcontinues--total-vram-gb— verify graceful handling without CUDApsutilinstalled — verifyn/afallback message appearsRunner.auto_run(..., check_env=True)— verifyRuntimeErrorraised on danger🤖 Generated with Claude Code