test(onboard): add strict tool-call probe e2e by cv · Pull Request #4541 · NVIDIA/NemoClaw

cv · 2026-05-29T18:24:29Z

Summary

Adds a hermetic regression E2E for the strict Chat Completions tool-call validation path introduced around Local Ollama onboarding. The test uses a local OpenAI-compatible mock to assert the bounded payload shape, transient retry behavior, and fail-closed behavior when no structured tool call is returned.

Related Issue

Fixes #4537

Changes

Add test/e2e/test-strict-tool-call-probe.sh to exercise strict tool-call validation against a mock OpenAI-compatible endpoint.
Wire strict-tool-call-probe-e2e into .github/workflows/regression-e2e.yaml for workflow dispatch.
Document the thinking-model assumption near the strict probe max_tokens cap in src/lib/inference/onboard-probes.ts.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Added an end-to-end test harness validating strict tool-call probe behavior: success, transient-error recovery, and plain-text fallback; captures logs and reports pass/fail.
Chores
- Updated regression workflow to include and gate the strict probe job, run it in CI (no-sleep mode), and upload probe logs on failure.
Documentation
- Clarified an inline comment about the strict tool-call probe token cap and its current scope.

coderabbitai · 2026-05-29T18:24:41Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0bff4170-9443-498f-9fa9-3d659e32c914

📥 Commits

Reviewing files that changed from the base of the PR and between e55d82f and 44b00b9.

📒 Files selected for processing (1)

src/lib/inference/onboard-probes.ts

✅ Files skipped from review due to trivial changes (1)

src/lib/inference/onboard-probes.ts

📝 Walkthrough

Walkthrough

Adds a hermetic Bash+Node E2E that validates strict chat-completions tool-call payloads and retry/recovery behavior against a local mock server, and wires the test into the regression-e2e CI with selector gating and failure-only log upload.

Changes

Strict Tool-Call Probe E2E Coverage

Layer / File(s)	Summary
E2E test harness and mock server `test/e2e/test-strict-tool-call-probe.sh`, `src/lib/inference/onboard-probes.ts`	Bash harness builds the CLI and runs an embedded Node mock server in three modes (success, transient-502, plain-text). Records requests to JSONL and asserts outbound payload shape (`sessions_send` presence, `tool_choice`, `max_tokens`, `stream`, `temperature`), validates retry counts and recovery hook invocation, and updates the onboard-probes comment about Local Ollama `max_tokens`.
Regression E2E workflow integration `.github/workflows/regression-e2e.yaml`	Adds `strict-tool-call-probe-e2e` to valid dispatch names, exposes `strict_tool_call_probe` from the selector, sets that output true/false based on requested jobs, gates the new job on that output, runs the test script with `NEMOCLAW_TEST_NO_SLEEP=1`, and configures failure-only artifact upload for logs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

NVIDIA/NemoClaw#4532: Bounds the strict chat tool-call probe with max_tokens: 256 and stream: false, which this PR validates end-to-end.
NVIDIA/NemoClaw#4250: Related E2E changes around the Local Ollama/no-tools behavior that affect strict probe execution.

Suggested labels

CI/CD, E2E, NemoClaw CLI

Suggested reviewers

ericksoa

Poem

🐰 A tiny test hops out to see,
A mock, some retries, and payloads agree.
Logs tucked neatly, failures shown clear,
The probe now checks what we hold dear.
Hooray — the rabbit gives a joyful cheer!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(onboard): add strict tool-call probe e2e' directly and concisely describes the main change: adding an end-to-end test for the strict tool-call probe in onboarding.
Linked Issues check	✅ Passed	The PR comprehensively addresses all coding requirements from issue `#4537`: hermetic E2E test added to test/e2e/, integrated into regression-e2e.yaml workflow, validates payload shape (tool_choice/max_tokens/stream), verifies success/retry behavior, and documents thinking-model assumption in onboard-probes.ts.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#4537` requirements: new E2E test script, workflow integration, and documentation comment. No extraneous modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/strict-tool-call-probe-e2e

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-29T18:25:34Z

E2E Advisor Recommendation

Required E2E: strict-tool-call-probe-e2e
Optional E2E: onboard-inference-smoke-e2e

Dispatch hint: strict-tool-call-probe-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

strict-tool-call-probe-e2e (low; hermetic mock endpoint, no GPU/Ollama infrastructure or external inference secret required): This PR adds the regression job and its test script for strict Chat Completions tool-call probing. Run it to verify the new workflow selection, CLI build, mock endpoint harness, bounded payload assertions, transient retry, and fail-closed behavior all work in CI.

Optional E2E

onboard-inference-smoke-e2e (low to medium; hermetic/CI local mock-style smoke without external model-router secret): Adjacent confidence check for onboard inference validation behavior. The runtime source change is comment-only, so this is not merge-blocking, but it covers the broader onboarding path that reports success only after a real chat completion.

New E2E recommendations

None.

Dispatch hint

Workflow: regression-e2e.yaml
jobs input: strict-tool-call-probe-e2e

github-actions · 2026-05-29T18:25:35Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. No scenario E2E is recommended. The scenario workflows/catalog/runtime under test/e2e-scenario/ were not changed; the regression workflow and test/e2e script are non-scenario legacy E2E surfaces, and the src/lib/inference/onboard-probes.ts diff is comment-only with no behavioral effect on scenario E2E.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-05-29T18:28:07Z

PR Review Advisor

Findings: 1 needs attention, 2 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 2 still apply, 0 new items found

Review findings

🛠️ Needs attention

E2E still bypasses the Local Ollama onboarding caller (test/e2e/test-strict-tool-call-probe.sh:93): Issue Add hermetic E2E coverage for strict chat-completions tool-call validation #4537 asks for PR-safe hermetic E2E coverage that validates strict Chat Completions tool-call behavior through the onboarding flow and says the test should "onboards against an OpenAI-compatible mock endpoint". This harness verifies the validation helper and strict payload shape, but it constructs validation helpers and calls `validateOpenAiLikeSelection` directly. It would still pass if the production Local Ollama branch in `src/lib/onboard.ts` stopped invoking validation, passed the wrong selected model or validation base URL, or threaded the wrong `allowToolsIncompatible` value into `buildOllamaProbeOptions`.
- Recommendation: Exercise the production Local Ollama onboarding selection/caller path against the mock endpoint, or add a focused integration/contract test at the real caller boundary in `src/lib/onboard.ts` that proves the selected Local Ollama model and validation URL are passed through `localInference.buildOllamaProbeOptions(false)` into strict Chat Completions validation. Keep the existing mock assertions for payload shape, retry, and fail-closed behavior.
- Evidence: `test/e2e/test-strict-tool-call-probe.sh` defines `validate(endpoint, recoveryCalls = [])`, creates `createInferenceSelectionValidationHelpers(...)`, and calls `helpers.validateOpenAiLikeSelection("Local Ollama", endpoint, "mock-tool-model", ..., strictOllamaProbeOptions())` directly. The production Local Ollama path remains in `src/lib/onboard.ts`, where validation is called after `prepareOllamaModel(...)`, `getLocalProviderValidationBaseUrl(provider)`, and `localInference.buildOllamaProbeOptions(allowToolsIncompatible)` are combined.

🔎 Worth checking

Source-of-truth review needed: test/e2e/test-strict-tool-call-probe.sh strict Local Ollama validation coverage: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `validate(endpoint, recoveryCalls = [])` calls `helpers.validateOpenAiLikeSelection(...)` directly with `strictOllamaProbeOptions()`, while the real Local Ollama caller remains in `src/lib/onboard.ts`.
Security regression guard stops at the helper boundary (test/e2e/test-strict-tool-call-probe.sh:93): The strict tool-call gate is a fail-closed validation path for Local Ollama onboarding, but the new E2E only tests the lower validation helper. That weakens the security regression guard and can create a false sense of coverage for the production onboarding path.
- Recommendation: Add caller-boundary coverage for the real Local Ollama onboarding path so the security-relevant strict validation requirement cannot be bypassed by future caller changes.
- Evidence: The test validates success, transient retry, and plain-text fail-closed behavior through `validateOpenAiLikeSelection`, while the real onboarding caller that decides whether strict validation is required remains outside the exercised path.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: test/e2e/test-strict-tool-call-probe.sh strict Local Ollama validation coverage: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `validate(endpoint, recoveryCalls = [])` calls `helpers.validateOpenAiLikeSelection(...)` directly with `strictOllamaProbeOptions()`, while the real Local Ollama caller remains in `src/lib/onboard.ts`.
E2E still bypasses the Local Ollama onboarding caller (test/e2e/test-strict-tool-call-probe.sh:93): Issue Add hermetic E2E coverage for strict chat-completions tool-call validation #4537 asks for PR-safe hermetic E2E coverage that validates strict Chat Completions tool-call behavior through the onboarding flow and says the test should "onboards against an OpenAI-compatible mock endpoint". This harness verifies the validation helper and strict payload shape, but it constructs validation helpers and calls `validateOpenAiLikeSelection` directly. It would still pass if the production Local Ollama branch in `src/lib/onboard.ts` stopped invoking validation, passed the wrong selected model or validation base URL, or threaded the wrong `allowToolsIncompatible` value into `buildOllamaProbeOptions`.
- Recommendation: Exercise the production Local Ollama onboarding selection/caller path against the mock endpoint, or add a focused integration/contract test at the real caller boundary in `src/lib/onboard.ts` that proves the selected Local Ollama model and validation URL are passed through `localInference.buildOllamaProbeOptions(false)` into strict Chat Completions validation. Keep the existing mock assertions for payload shape, retry, and fail-closed behavior.
- Evidence: `test/e2e/test-strict-tool-call-probe.sh` defines `validate(endpoint, recoveryCalls = [])`, creates `createInferenceSelectionValidationHelpers(...)`, and calls `helpers.validateOpenAiLikeSelection("Local Ollama", endpoint, "mock-tool-model", ..., strictOllamaProbeOptions())` directly. The production Local Ollama path remains in `src/lib/onboard.ts`, where validation is called after `prepareOllamaModel(...)`, `getLocalProviderValidationBaseUrl(provider)`, and `localInference.buildOllamaProbeOptions(allowToolsIncompatible)` are combined.
Security regression guard stops at the helper boundary (test/e2e/test-strict-tool-call-probe.sh:93): The strict tool-call gate is a fail-closed validation path for Local Ollama onboarding, but the new E2E only tests the lower validation helper. That weakens the security regression guard and can create a false sense of coverage for the production onboarding path.
- Recommendation: Add caller-boundary coverage for the real Local Ollama onboarding path so the security-relevant strict validation requirement cannot be bypassed by future caller changes.
- Evidence: The test validates success, transient retry, and plain-text fail-closed behavior through `validateOpenAiLikeSelection`, while the real onboarding caller that decides whether strict validation is required remains outside the exercised path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/regression-e2e.yaml:
- Around line 270-276: Update the GitHub Actions E2E job to harden the Checkout
and Node setup steps: in the checkout step (the actions/checkout usage) add
persist-credentials: false to prevent credential persistence, and replace the
floating actions/setup-node@v6 reference with a pinned full commit SHA for
setup-node to avoid unexpected changes; locate the lines using the
actions/checkout and actions/setup-node symbols and apply those two changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3a6bca40-67ad-44ed-a275-361bc2ad75e4

📥 Commits

Reviewing files that changed from the base of the PR and between 6fb96bd and 6fd193f.

📒 Files selected for processing (3)

.github/workflows/regression-e2e.yaml
src/lib/inference/onboard-probes.ts
test/e2e/test-strict-tool-call-probe.sh

wscurran · 2026-05-29T21:36:01Z

✨
Related open issues:

#4537 Add hermetic E2E coverage for strict chat-completions tool-call validation

test(onboard): add strict tool-call probe e2e

6fd193f

cv self-assigned this May 29, 2026

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread .github/workflows/regression-e2e.yaml

cv added 2 commits May 29, 2026 11:41

Merge branch 'main' into test/strict-tool-call-probe-e2e

f99a43b

test(onboard): harden strict probe e2e wiring

e55d82f

cv added the v0.0.55 Release target label May 29, 2026

jyaunches added R3 v0.0.56 Release target and removed v0.0.55 Release target R3 labels May 29, 2026

Merge branch 'main' into test/strict-tool-call-probe-e2e

44b00b9

wscurran added the E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps label May 29, 2026

wscurran added the fix label May 29, 2026

Conversation

cv commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wscurran commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cv commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading