Skip to content

feat: add --check-env preflight flag for OOM risk detection before quantization#14

Open
sotanengel wants to merge 7 commits into
FujitsuResearch:mainfrom
sotanengel:feature/check-env-preflight
Open

feat: add --check-env preflight flag for OOM risk detection before quantization#14
sotanengel wants to merge 7 commits into
FujitsuResearch:mainfrom
sotanengel:feature/check-env-preflight

Conversation

@sotanengel
Copy link
Copy Markdown

Background

While quantizing a large-scale LLM (70B+ parameters), the process crashed midway through with an out-of-memory error after running for several hours. There was no way to know in advance whether the available GPU VRAM was sufficient — the failure only surfaced deep into the quantization loop, wasting significant compute time.

This PR introduces a --check-env preflight flag that detects OOM risk before quantization starts, based on the physical characteristics of the execution environment.

Summary

  • Adds --check-env CLI flag (also available as Runner.auto_run(check_env=True) in the Python API)
  • Loads the model architecture on a meta device (zero GPU/CPU memory) to count parameters
  • Collects hardware info: GPU VRAM (total & free), CPU RAM (via optional psutil), disk space
  • Estimates memory requirements at 2-bit, 4-bit, and 8-bit quantization using existing weight_memory_gb(), plus a calibration overhead factor
  • Classifies OOM risk as safe / warning / danger and prints a human-readable report
  • On danger: exits with code 1 (CLI) or raises RuntimeError (library API), stopping quantization before it wastes GPU time
  • On safe / warning: prints the report and continues with quantization as normal (preflight behavior)

Example output

============================================================
  OneComp Environment Check
============================================================

Hardware
  GPU count              : 1
  GPU name               : NVIDIA A100 80GB PCIe
  GPU VRAM (total)       : 80.0 GB
  GPU VRAM (free)        : 78.3 GB
  CPU RAM (total)        : 251.6 GB
  CPU RAM (avail)        : 230.1 GB
  Disk (avail)           : 320.4 GB  [/home/user/output]

Model: meta-llama/Llama-2-7b-hf
  Parameters             : 6,738,415,616
  FP16 footprint         : 12.54 GB

Memory Estimates
  2-bit quantized        :  1.96 GB
  4-bit quantized        :  3.77 GB
  8-bit quantized        :  7.28 GB
  Calib. overhead        :  1.88 GB  (15% of FP16)
  4-bit + overhead       :  5.65 GB

OOM Risk Assessment
  Risk level             : WARNING
  Detail                 : Free VRAM (78.3 GB) fits 4-bit quantized
                           weights but is tight (calibration overhead included).

  Recommended wbits      :  3.84  (VRAM-estimated)
============================================================

Risk thresholds

Level Condition Action
safe free_vram ≥ fp16_size × 1.2 Report + continue
warning free_vram ≥ 4-bit size + calib overhead Report + continue
danger otherwise Report + stop (exit 1 / RuntimeError)

Changed files

File Change
onecomp/utils/vram_estimator.py New dataclasses (EnvironmentSnapshot, ModelMemoryProfile, EnvCheckResult) + check_environment() + print_env_report()
onecomp/utils/__init__.py Export 5 new public symbols
onecomp/cli.py --check-env argparse flag + preflight invocation
onecomp/runner.py check_env: bool = False kwarg in auto_run()
pyproject.toml Optional extras: pip install ".[check-env]" adds psutil for CPU RAM info

Test plan

  • onecomp <model_id> --check-env --no-eval — verify report is printed and quantization continues
  • onecomp <model_id> --check-env --total-vram-gb 1.0 — verify danger triggers exit code 1
  • onecomp <model_id> --check-env --total-vram-gb 6.0 --no-eval — verify warning continues
  • CPU-only environment with --total-vram-gb — verify graceful handling without CUDA
  • Without psutil installed — verify n/a fallback message appears
  • Runner.auto_run(..., check_env=True) — verify RuntimeError raised on danger

🤖 Generated with Claude Code

FKKimura and others added 2 commits May 7, 2026 21:20
Add QuantizationProgressTracker and wire it through calibration,
chunked calibration, multi-GPU phase 2, QEP general and arch-aware
paths. Runner gains quantization_progress flag (default on).

Includes unit tests for ETA formatting and thread-safe stepping.

Co-authored-by: Cursor <cursoragent@cursor.com>
@sotanengel
Copy link
Copy Markdown
Author

全く急ぎではないのですが、自分で使っていて気になった点があったためPRを出させていただきました。
もし不要な場合はCloseしていただいて大丈夫です🙇

FKKimura and others added 5 commits May 18, 2026 13:31
Raise clear error for unsupported QEP quantizers 

See merge request onecomp/onecomp-lab!71
…ation-progress-eta

feat: quantization progress logs with ETA
* refactoring : QuantizationProgressTracker

* update CHANGELOG.md

---------

Co-authored-by: FKKimura <50981196+FKKimura@users.noreply.github.com>
Adds a --check-env CLI flag that collects physical hardware characteristics
(GPU VRAM, CPU RAM, disk space) and model memory estimates before quantization
starts, then classifies OOM risk as safe/warning/danger. Exits with code 1 on
danger; otherwise prints a report and proceeds with quantization.

- onecomp/utils/vram_estimator.py: new EnvironmentSnapshot, ModelMemoryProfile,
  EnvCheckResult dataclasses; check_environment() and print_env_report() functions
  reusing existing weight_memory_gb() and estimate_target_bitwidth()
- onecomp/utils/__init__.py: export 5 new public symbols
- onecomp/cli.py: --check-env argparse flag with preflight invocation
- onecomp/runner.py: check_env=False kwarg in auto_run() for library API use
- pyproject.toml: optional extras [check-env] = ["psutil>=5.9"]

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sotanengel sotanengel force-pushed the feature/check-env-preflight branch from 29a83f5 to 1eab710 Compare May 19, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants