feat(dynamo_mocker): add GPU-free LLM inference simulation workload#895
feat(dynamo_mocker): add GPU-free LLM inference simulation workload#895saivishal1999 wants to merge 8 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a DynamoMocker workload: configs, Pydantic command models and test definition, shell orchestration for NATS/frontend/mockers, genai-perf/aiperf wrappers, a standalone command-gen strategy that builds wrapper invocations, report parsing gating, registry wiring, and tests for command generation. ChangesDynamoMocker Workload
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/workloads/dynamo_mocker/aiperf.sh`:
- Around line 115-129: Before launching aiperf, ensure any stale artifacts are
removed so process_results cannot pick up old *aiperf*.csv files: locate the
artifact_dir variable and the aiperf invocation (the block using "${run_cmd[@]}"
with --artifact-dir "$artifact_dir") and add a cleanup step that creates/clears
that directory (e.g., mkdir -p and remove existing aiperf artifacts or delete
and recreate artifact_dir) so it’s empty prior to running; this prevents
process_results from reading stale CSVs produced by prior runs.
In `@src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh`:
- Around line 280-285: In resolve_python(), currently when venv_python is set
but not executable the script silently falls back to system python; change it to
fail fast: if venv_python is non-empty and not -x, print a clear error to stderr
(including the value of venv_python) and exit with a non-zero status; otherwise
keep the existing branch that sets PYTHON and prepends dirname(venv_python) to
PATH. Reference symbols: resolve_python, venv_python, PYTHON, PATH.
- Around line 774-778: The current conditional uses [[ "$benchmark_tool" ==
"aiperf" ]] and falls through to launch_genai_perf for any other value; change
it to explicitly handle valid options and reject unknown values: check
benchmark_tool for "aiperf" to call launch_aiperf, check for the expected genai
option (e.g., "genai" or the exact value used elsewhere) to call
launch_genai_perf, and in an else branch print a clear error (including the
invalid value) to stderr and exit with a non-zero status; refer to the
benchmark_tool variable and the launch_aiperf / launch_genai_perf functions to
locate and update the logic.
- Around line 183-273: The parse_args function reads $2 directly in many case
branches (e.g., --venv-python, --engine-*, --mocker-*, --prefill-extra-args,
--prefill-args-*, --aiperf-*, --genai-perf-*) which breaks under set -u when an
option value is missing; add a small validator (e.g., require_arg or
check_next_arg) invoked before assigning each "$2" that ensures a next argument
exists and is not another flag (starts with "-" ), prints a clear error/usage
message, and returns non-zero; update all branches that currently do direct
assignments to call this validator first and only then assign and shift 2 so
missing values are handled gracefully.
In `@src/cloudai/workloads/dynamo_mocker/genai_perf.sh`:
- Around line 113-123: Before launching genai-perf, remove or empty any prior
artifacts in the artifact directory to avoid copying stale CSVs: ensure the
created local variable artifact_dir (used when calling "${run_cmd[@]}"
--artifact-dir "$artifact_dir") is cleaned (rm -rf "$artifact_dir"/* or recreate
the directory) and recreated as an empty directory prior to invoking the
command, and keep process_results unchanged so it consumes only the fresh
artifacts.
- Around line 54-66: The process_args function currently takes flag values (for
--result-dir, --model, --port, --report-name, --cmd) blindly from $2 which lets
malformed invocations shift parsing; update process_args to validate each flag's
argument before shifting: for each case (--result-dir, --model, --report-name,
--cmd) check that "$2" exists and does not start with '-' (e.g. [[ -n "$2" &&
${2:0:1} != "-" ]]) and on failure print a clear error to stderr and exit
non‑zero; for --port additionally validate numeric (e.g. a regex or [[ "$2" =~
^[0-9]+$ ]]) before assigning and shifting; preserve the existing --) handling
and ensure you only call shift 2 when the validation passes.
In `@src/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.py`:
- Around line 171-175: The wrapper command currently interpolates unquoted paths
and filenames, which can break when self._script_path, stdout_file, or
stderr_file contain spaces or shell metacharacters; update the construction of
wrapper_lines (and keep quoted_args as is) so that self._script_path,
stdout_file, and stderr_file are each safely shell-quoted (e.g., via
shlex.quote) before interpolation into the f-string that builds the bash
invocation line in wrapper_lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 07cd659f-a1a4-41e4-bcde-6e4ce91ff3d1
📒 Files selected for processing (13)
conf/experimental/dynamo_mocker/system/standalone_system.tomlconf/experimental/dynamo_mocker/test/dynamo_mocker.tomlconf/experimental/dynamo_mocker/test_scenario/dynamo_mocker.tomlsrc/cloudai/registration.pysrc/cloudai/workloads/dynamo_mocker/__init__.pysrc/cloudai/workloads/dynamo_mocker/aiperf.shsrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.pysrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.shsrc/cloudai/workloads/dynamo_mocker/genai_perf.shsrc/cloudai/workloads/dynamo_mocker/report_generation_strategy.pysrc/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.pytests/workloads/dynamo_mocker/__init__.pytests/workloads/dynamo_mocker/test_command_gen_strategy_standalone.py
9654f3a to
19a2f75
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/workloads/dynamo_mocker/aiperf.sh`:
- Line 64: The --port case currently assigns any non-empty token into port after
calling _require_value; add a numeric validation step in the --port) branch (the
case handling the --port option) to check $2 matches a whole-number regex (e.g.
^[0-9]+$) before assigning port="$2", and on failure print an error to stderr
and exit with non-zero status (or call the script's existing error helper) so
only valid numeric ports are accepted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: bf45ad1e-9704-4a2e-b0ce-e337b811aa0f
📒 Files selected for processing (13)
conf/experimental/dynamo_mocker/system/standalone_system.tomlconf/experimental/dynamo_mocker/test/dynamo_mocker.tomlconf/experimental/dynamo_mocker/test_scenario/dynamo_mocker.tomlsrc/cloudai/registration.pysrc/cloudai/workloads/dynamo_mocker/__init__.pysrc/cloudai/workloads/dynamo_mocker/aiperf.shsrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.pysrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.shsrc/cloudai/workloads/dynamo_mocker/genai_perf.shsrc/cloudai/workloads/dynamo_mocker/report_generation_strategy.pysrc/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.pytests/workloads/dynamo_mocker/__init__.pytests/workloads/dynamo_mocker/test_command_gen_strategy_standalone.py
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
src/cloudai/workloads/dynamo_mocker/genai_perf.sh (1)
64-66:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
--portas numeric before building the URL.Right now any non-empty token becomes
localhost:${port}, so a typo is deferred into a less actionable genai-perf failure instead of being rejected here.🔧 Suggested fix
- --port) _require_value "$1" "${2-}"; port="$2"; shift 2 ;; + --port) + _require_value "$1" "${2-}" + [[ "$2" =~ ^[0-9]+$ ]] || { log "ERROR: --port must be numeric (got: '$2')" >&2; exit 1; } + port="$2" + shift 2 + ;;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/dynamo_mocker/genai_perf.sh` around lines 64 - 66, The --port parsing currently accepts any non-empty token into variable port, so add a numeric validation step right after the --port assignment (the parsing branch that calls _require_value and sets port="$2") to reject non-numeric or out-of-range values before the script uses "localhost:${port}"; implement a check (e.g., regex to ensure all digits and optionally enforce 1-65535) and if invalid print a clear error and exit non-zero, referencing the --port option and the port variable so the parsing block will fail fast on typos.src/cloudai/workloads/dynamo_mocker/aiperf.sh (1)
62-65:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject non-numeric
--portvalues early.
localhost:${port}is assembled unconditionally, so bad input only fails later inside aiperf with a less direct error.🔧 Suggested fix
- --port) _require_value "$1" "${2-}"; port="$2"; shift 2 ;; + --port) + _require_value "$1" "${2-}" + [[ "$2" =~ ^[0-9]+$ ]] || { log "ERROR: --port must be numeric (got: '$2')" >&2; exit 1; } + port="$2" + shift 2 + ;;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/dynamo_mocker/aiperf.sh` around lines 62 - 65, The --port parsing branch currently assigns port="$2" without validation, so non-numeric values later break when assembling localhost:${port}; after the existing _require_value call in the --port case (the branch that sets port="$2" and shift 2), validate that port contains only digits (e.g., use a regex like ^[0-9]+$ or an integer check) and if it fails, print a clear error and exit non‑zero; update the error path near the --port handling (not the aiperf invocation) so invalid ports are rejected early.src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh (1)
219-273:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard the remaining passthrough branches before reading
$2.The fixed-value branches are protected now, but these wildcard/raw passthrough cases still consume
$2blindly. A malformed invocation like--engine-foo --frontend-bar bazwill still shift parsing or die underset -ubefore you emit a controlled error.🔧 Suggested fix pattern
--engine-*) + _require_value "$1" "${2-}" engine_extra_args["--${1#--engine-}"]="$2" shift 2 ;; --mocker-*) + _require_value "$1" "${2-}" mocker_extra_args["--${1#--mocker-}"]="$2" shift 2 ;;Apply the same guard to the other
"$2"consumers in this block as well.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh` around lines 219 - 273, The wildcard/raw passthrough branches (e.g., the lines that set engine_extra_args["--${1#--engine-}"]="$2"], mocker_extra_args["--${1#--mocker-}"]="$2"], frontend_extra_args["--${1#--frontend-}"]="$2"], prefill_extra_args_raw="$2", prefill_args_extra["--${1#--prefill-args-}"]="$2"], decode_extra_args_raw="$2", decode_args_extra["--${1#--decode-args-}"]="$2"], aiperf_cmd="$2", aiperf_extra_args_raw="$2", aiperf_extra_args["--${1#--aiperf-}"]="$2"], genai_perf_cmd="$2", genai_perf_extra_args_raw="$2", genai_perf_extra_args["--${1#--genai-perf-}"]="$2"]) read $2 unguarded and will break under malformed input; add the same guard used in the fixed-value branches before consuming $2 (check that $# -ge 2 and/or that $2 is not another flag) and emit a clear error and exit when the next argument is missing or looks like a flag, then only assign and shift 2 when the guard passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/workloads/dynamo_mocker/aiperf.sh`:
- Around line 106-113: The override branch that builds run_cmd from the provided
cmd variable doesn't include the required "profile" subcommand, causing custom
--cmd to run without "profile"; update the code that processes cmd (variable cmd
and run_cmd) to ensure the "profile" subcommand is present—parse read -ra
run_cmd <<< "$cmd", then if the first positional element is not "profile" (or if
"profile" is not present in run_cmd), append "profile" to run_cmd so both the
default path (run_cmd=("$AIPERF" profile)) and the custom --cmd path always
execute the "profile" subcommand when invoking AIPERF.
In `@src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh`:
- Around line 397-405: The write_node_roles() block currently hardcodes
"genai_perf" into the roles file; change it to record the chosen benchmark tool
instead by replacing the fixed echo "$(hostname),genai_perf" with an echo that
uses the benchmark selection variable (e.g., echo
"$(hostname),${benchmark_tool}" or the script's variable that holds the
--benchmark-tool value). Keep the rest of the function (roles_file,
disaggregation_mode branches, and other roles) unchanged so the roles file
reflects the actual selected benchmark (e.g., "aiperf") rather than always
"genai_perf".
In `@src/cloudai/workloads/dynamo_mocker/genai_perf.sh`:
- Around line 111-118: When a custom --cmd is provided the script currently sets
run_cmd from cmd but omits the required "profile" subcommand; update the cmd
branch in genai_perf.sh so after read -ra run_cmd <<< "$cmd" you check whether
run_cmd already contains "profile" and if not insert "profile" immediately after
the executable (i.e. at index 1) so run_cmd becomes ("<cmd-exe>" "profile" ...).
Use the existing variables run_cmd, cmd and GENAI_PERF to locate the logic to
modify.
In `@src/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.py`:
- Around line 1-2: Update the copyright header year from 2025 to 2026 in this
file; specifically change the two header comment lines containing "2025 NVIDIA
CORPORATION & AFFILIATES" to "2026 NVIDIA CORPORATION & AFFILIATES" so the
top-of-file SPDX and copyright declarations match CI expectations.
In `@tests/test_test_scenario.py`:
- Line 627: The test currently only asserts len(Registry().reports_map) == 21
which can miss wrong or shifted registrations; update the test to also assert
that the specific DynamoMocker reporter key exists in Registry().reports_map and
that its value is the expected reporter (e.g., instance/class name or type) —
locate the Registry() usage and reports_map access in the test and add a
key-level assertion like checking "DynamoMocker" (or the exact key used during
registration) in Registry().reports_map and assert the mapped object is of the
expected class/type.
---
Duplicate comments:
In `@src/cloudai/workloads/dynamo_mocker/aiperf.sh`:
- Around line 62-65: The --port parsing branch currently assigns port="$2"
without validation, so non-numeric values later break when assembling
localhost:${port}; after the existing _require_value call in the --port case
(the branch that sets port="$2" and shift 2), validate that port contains only
digits (e.g., use a regex like ^[0-9]+$ or an integer check) and if it fails,
print a clear error and exit non‑zero; update the error path near the --port
handling (not the aiperf invocation) so invalid ports are rejected early.
In `@src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh`:
- Around line 219-273: The wildcard/raw passthrough branches (e.g., the lines
that set engine_extra_args["--${1#--engine-}"]="$2"],
mocker_extra_args["--${1#--mocker-}"]="$2"],
frontend_extra_args["--${1#--frontend-}"]="$2"], prefill_extra_args_raw="$2",
prefill_args_extra["--${1#--prefill-args-}"]="$2"], decode_extra_args_raw="$2",
decode_args_extra["--${1#--decode-args-}"]="$2"], aiperf_cmd="$2",
aiperf_extra_args_raw="$2", aiperf_extra_args["--${1#--aiperf-}"]="$2"],
genai_perf_cmd="$2", genai_perf_extra_args_raw="$2",
genai_perf_extra_args["--${1#--genai-perf-}"]="$2"]) read $2 unguarded and will
break under malformed input; add the same guard used in the fixed-value branches
before consuming $2 (check that $# -ge 2 and/or that $2 is not another flag) and
emit a clear error and exit when the next argument is missing or looks like a
flag, then only assign and shift 2 when the guard passes.
In `@src/cloudai/workloads/dynamo_mocker/genai_perf.sh`:
- Around line 64-66: The --port parsing currently accepts any non-empty token
into variable port, so add a numeric validation step right after the --port
assignment (the parsing branch that calls _require_value and sets port="$2") to
reject non-numeric or out-of-range values before the script uses
"localhost:${port}"; implement a check (e.g., regex to ensure all digits and
optionally enforce 1-65535) and if invalid print a clear error and exit
non-zero, referencing the --port option and the port variable so the parsing
block will fail fast on typos.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 6cafdb9c-7b84-4efb-8b55-362180f9fd39
📒 Files selected for processing (6)
src/cloudai/workloads/dynamo_mocker/aiperf.shsrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.shsrc/cloudai/workloads/dynamo_mocker/genai_perf.shsrc/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.pytests/test_init.pytests/test_test_scenario.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/workloads/dynamo_mocker/test_command_gen_strategy_standalone.py`:
- Around line 418-501: The tests in test_command_gen_strategy_standalone.py rely
on forwarded --prefill-*/--decode-* flags but do not explicitly set
disaggregation_mode, making behavior ambiguous; update each affected
DynamoMockerCmdArgs construction in the listed tests (e.g.,
test_prefill_args_extra_forwarded, test_decode_args_extra_forwarded,
test_prefill_and_decode_different_values, test_prefill_args_not_in_decode_flags,
test_decode_args_not_in_prefill_flags, test_prefill_args_bool_lowercased and the
ones around lines 533-554) to include disaggregation_mode="prefill_decode" on
the top-level DynamoMockerCmdArgs so the tests explicitly exercise the
disaggregated prefill/decode forwarding path when building args via
strategy._build_script_args.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: daab1138-36f1-4549-99b9-15b8f1f241ec
📒 Files selected for processing (9)
conf/experimental/dynamo_mocker/system/standalone_system.tomlconf/experimental/dynamo_mocker/test/dynamo_mocker.tomlconf/experimental/dynamo_mocker/test_scenario/dynamo_mocker.tomlsrc/cloudai/workloads/dynamo_mocker/__init__.pysrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.pysrc/cloudai/workloads/dynamo_mocker/report_generation_strategy.pysrc/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.pytests/workloads/dynamo_mocker/__init__.pytests/workloads/dynamo_mocker/test_command_gen_strategy_standalone.py
|
@saivishal1999 please resolve coderabbit comments and ping me directly again |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh (1)
2-2:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUpdate the copyright header year to 2026.
This file is new but still carries
2024in the copyright line.🔧 Proposed fix
-# Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES All rights reserved. +# Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh` at line 2, Update the copyright header year from 2024 to 2026 by editing the top-of-file copyright line (the single comment that currently reads "Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES All rights reserved.") so it reads 2026 instead of 2024; ensure the exact punctuation/capitalization is preserved except for the year change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/workloads/dynamo_mocker/aiperf.sh`:
- Line 2: Update the copyright header in the aiperf.sh file from "2024" to
"2026" so it matches repo header expectations; locate the top-of-file header
comment in src/cloudai/workloads/dynamo_mocker/aiperf.sh and replace the year
token "2024" with "2026" in that copyright line.
In `@src/cloudai/workloads/dynamo_mocker/genai_perf.sh`:
- Line 2: Update the copyright header string "Copyright (c) 2024 NVIDIA
CORPORATION & AFFILIATES All rights reserved." in genai_perf.sh to use the year
2026 instead of 2024 so the file header matches current source header checks.
In `@src/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.py`:
- Around line 199-200: test_cmd is built with naive double-quote wrapping which
breaks when args contain quotes/backslashes; update the assembly in
StandaloneCommandGenStrategy to use shell-safe quoting (e.g., shlex.quote) for
self._script_path and each item in self._script_args and similarly quote
self._wrapper_path when building full_cmd so the persisted test_cmd/ full_cmd is
a valid, safe shell command even when args contain special characters.
---
Duplicate comments:
In `@src/cloudai/workloads/dynamo_mocker/dynamo_mocker.sh`:
- Line 2: Update the copyright header year from 2024 to 2026 by editing the
top-of-file copyright line (the single comment that currently reads "Copyright
(c) 2024 NVIDIA CORPORATION & AFFILIATES All rights reserved.") so it reads 2026
instead of 2024; ensure the exact punctuation/capitalization is preserved except
for the year change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: e8fedb74-ba19-434f-84a0-c77564d66165
📒 Files selected for processing (6)
conf/experimental/dynamo_mocker/test/dynamo_mocker.tomlsrc/cloudai/workloads/dynamo_mocker/aiperf.shsrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.pysrc/cloudai/workloads/dynamo_mocker/dynamo_mocker.shsrc/cloudai/workloads/dynamo_mocker/genai_perf.shsrc/cloudai/workloads/dynamo_mocker/standalone_command_gen_strategy.py
Adds the DynamoMocker workload to cloudai — a lightweight GPU-free LLM inference simulator using dynamo.mocker + dynamo.frontend, benchmarked with genai-perf or aiperf. - Pydantic models for engine, worker, frontend, and benchmark config - StandaloneCommandGenStrategy that invokes dynamo_mocker.sh via an isolated uv-managed venv (PythonEnvironment installable) - ReportGenerationStrategy reusing AIDynamoReportGenerationStrategy for benchmark_report.csv parsing - Shell scripts: dynamo_mocker.sh, genai_perf.sh, aiperf.sh - Supports combined and disaggregated (prefill/decode) topologies - Supports round_robin and kv_router frontend modes - Staging configs under conf/experimental/dynamo_mocker/ - 55 unit tests covering all CLI arg paths and flag passthrough
- tests/test_init.py: add DynamoMocker to CMD_GEN_STRATEGIES and test_definitions list; update expected counts (25→26, +1 strategy) - tests/test_test_scenario.py: update reports_map expected count (20→21) - dynamo_mocker.sh: add _require_value() validation for all value-bearing flags; fail-fast on invalid --venv-python; explicit error for unknown benchmark_tool - genai_perf.sh: add _require_value() validation; remove stale artifact dir before each run - aiperf.sh: remove stale artifact dir before each run - standalone_command_gen_strategy.py: apply shlex.quote() to script_path, stdout_file, and stderr_file in wrapper script generation
…L files All dynamo_mocker Python files had 2025 in their copyright headers, but they were first committed to cloudai in 2026 — the test derives the expected year from git log in this repo, so 2026 is correct. The three TOML conf files were missing the Apache-2.0 copyright header entirely; add it with year 2026.
…ommand nats-server is not bundled with ai-dynamo so users must supply it. Mirrors ai_dynamo's nats_cmd pattern: add nats_cmd field to DynamoMockerCmdArgs (default "nats-server -js") so users can supply a full path when nats-server is not on PATH. Wire --nats-cmd through standalone_command_gen_strategy.py and dynamo_mocker.sh. Also append 'profile' subcommand after parsing a custom --cmd override in aiperf.sh and genai_perf.sh, making --cmd a binary-path override consistent with the default path. Update field descriptions accordingly.
…test_run quoting Update copyright year from 2024 to 2026 in dynamo_mocker.sh, aiperf.sh, and genai_perf.sh. Replace naive double-quote wrapping in store_test_run() with shlex.quote() for both script_path and all args, matching the shell-safe quoting used in gen_exec_command().
…ine wrap Add regex validation (^[0-9]+$) for --port in aiperf.sh and genai_perf.sh, and for --http-port in dynamo_mocker.sh, so invalid values fail early with a clear message rather than at tool invocation. Reformat store_test_run() shlex.quote expression to satisfy ruff E501.
…orter test; explicit disaggregation_mode in tests
dynamo_mocker.sh: replace hardcoded 'genai_perf' in write_node_roles()
with \${benchmark_tool} so the roles file reflects the actual selected
benchmark tool (aiperf or genai_perf).
test_test_scenario.py: add (DynamoMockerTestDefinition,
{DynamoMockerReportGenerationStrategy}) to test_custom_reporters
parametrize list, providing key-level registration verification.
test_command_gen_strategy_standalone.py: add
disaggregation_mode="prefill_decode" to the 8 tests in
TestPrefillDecodeExtraArgs that exercise --prefill-args-* /
--decode-args-* / --prefill-extra-args / --decode-extra-args
forwarding, making the intended disaggregated-mode contract explicit.
Move dynamo_mocker import to correct alphabetical position (after deepep, before jax_toolbox) to satisfy ruff I001 import-order lint rule.
66c68d5 to
fe16067
Compare
Adds the DynamoMocker workload to cloudai — a lightweight GPU-free LLM inference simulator using dynamo.mocker + dynamo.frontend, benchmarked with genai-perf or aiperf.
Summary
Provide a concise summary of the changes introduced by this pull request. Detail the purpose and scope of the changes, referencing any relevant issues or discussions. Explain how these changes address the problem or improve the project.
Test Plan
In this section, describe the testing you have performed to verify the changes. Include:
This information is crucial for reviewers to understand how the changes have been validated.
Additional Notes
Include any other notes or comments about the pull request here. This can include challenges faced, future considerations, or context that reviewers might find helpful.