Add testbot: AI-powered test generation pipeline#743
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive AI-driven testbot system for automated test generation and PR review. It includes two GitHub Actions workflows, a testbot package with LangGraph-based pipeline stages (analyze coverage, generate tests, validate, review, create PR), LLM provider plugins, prompt templates, review response automation, and extensive test coverage for all components. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant GitHub
participant Testbot as Testbot Pipeline
participant Codecov
participant LLM
participant BuildSystem as Build/Test System
participant Git
User->>GitHub: Trigger testbot (schedule/manual)
GitHub->>Testbot: Start workflow
Testbot->>Codecov: fetch_coverage(token)
Codecov-->>Testbot: Coverage report
Testbot->>Testbot: analyze_coverage()<br/>select_targets()
loop For each target
Testbot->>Testbot: write_test()<br/>(get_llm().generate_test())
LLM-->>Testbot: GeneratedTest
Testbot->>BuildSystem: validate_test()<br/>(run_test)
BuildSystem-->>Testbot: ValidationResult
alt Validation Failed
Testbot->>LLM: generate_test()<br/>(with retry_context)
LLM-->>Testbot: Regenerated test
end
Testbot->>LLM: review_test()
LLM-->>Testbot: Review verdict
end
Testbot->>Git: create_pr()<br/>(commit & push)
Git-->>Testbot: Branch pushed
Testbot->>GitHub: gh pr create<br/>(ai-generated label)
GitHub-->>Testbot: PR URL
Testbot-->>User: Complete with PR link
sequenceDiagram
participant GitHub
participant Testbot as Testbot Respond
participant GraphQL as GitHub GraphQL
participant TestFile as Generated Test File
participant LLM
participant BuildSystem as Build/Test System
GitHub->>Testbot: Review comment created
Testbot->>GraphQL: Query review threads<br/>(unresolved, PR context)
GraphQL-->>Testbot: Thread list
Testbot->>Testbot: Filter actionable threads<br/>(not testbot source, not resolved)
loop Per file with comments
Testbot->>TestFile: Read current content
TestFile-->>Testbot: File content
Testbot->>LLM: build_batch_respond_prompt()<br/>(all file comments + context)
LLM-->>Testbot: JSON(fix, replies[])
alt Fix provided
Testbot->>TestFile: Apply fix
Testbot->>BuildSystem: validate_test()<br/>(run_test)
BuildSystem-->>Testbot: Pass/Fail
alt Validation passes
Testbot->>GitHub: Commit & push fix
else Validation fails (retries remain)
Testbot->>LLM: Retry with error context
end
end
end
loop For each reply
Testbot->>GitHub: Post comment reply
Testbot->>GraphQL: Resolve thread<br/>(if indicated)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (20)
src/scripts/coverage_agent/__init__.py-1-2 (1)
1-2:⚠️ Potential issue | 🟡 MinorAdd
# pylint: disable=line-too-longfor the copyright header.The copyright line is 103 characters, exceeding the 100-character limit and causing the pylint failure. As per coding guidelines, add the pylint disable comment instead of breaking the line.
Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/__init__.py` around lines 1 - 2, Add a pylint disable for line-too-long to the module so the 103-character SPDX copyright header doesn't fail linting; insert the comment "# pylint: disable=line-too-long" in the top of the file near the SPDX header lines (the two lines starting with "# SPDX-FileCopyrightText" and "# SPDX-License-Identifier") so the long copyright line is ignored by pylint.src/scripts/coverage_agent/nodes/__init__.py-1-2 (1)
1-2:⚠️ Potential issue | 🟡 MinorAdd
# pylint: disable=line-too-longfor the copyright header.Same pylint C0301 failure as reported. Add the disable comment per coding guidelines.
Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/nodes/__init__.py` around lines 1 - 2, Add a module-level pylint disable for the long-line rule to silence C0301 for the SPDX copyright header: insert the comment "# pylint: disable=line-too-long" at the top of src/scripts/coverage_agent/nodes/__init__.py (immediately after or on the same block as the existing SPDX header lines) so the copyright header no longer triggers the lint failure.src/scripts/coverage_agent/tools/file_ops.py-20-21 (1)
20-21:⚠️ Potential issue | 🟡 MinorHandle edge case when path has no directory component.
os.path.dirname("filename.txt")returns an empty string"", andos.makedirs("", exist_ok=True)raisesFileNotFoundError. This will happen when writing to a file in the current directory without a path prefix.Proposed fix
try: - os.makedirs(os.path.dirname(path), exist_ok=True) + dir_name = os.path.dirname(path) + if dir_name: + os.makedirs(dir_name, exist_ok=True) with open(path, "w") as file:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tools/file_ops.py` around lines 20 - 21, When creating parent directories for a file path, guard the os.makedirs call so it doesn't get passed an empty string; check the directory component returned by os.path.dirname(path) and only call os.makedirs(dirpath, exist_ok=True) if dirpath is non-empty (or otherwise treat empty as current directory) to avoid FileNotFoundError when path has no directory component; update the logic around the existing os.path.dirname(path) / os.makedirs invocation in file_ops.py to perform that check.src/scripts/coverage_agent/prompts/go_test.py-30-30 (1)
30-30:⚠️ Potential issue | 🟡 MinorRemove extraneous f-string prefix.
Line 30 uses an f-string but contains no placeholders. This was flagged by Ruff (F541).
Proposed fix
- prompt = f"Generate Go unit tests for the following source file.\n\n" + prompt = "Generate Go unit tests for the following source file.\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/go_test.py` at line 30, The assignment to the variable prompt uses an f-string with no placeholders (remove the unnecessary "f" prefix) — locate the prompt = f"Generate Go unit tests for the following source file.\n\n" statement and change it to a plain string literal (prompt = "Generate Go unit tests for the following source file.\n\n") so the extraneous f-string prefix flagged by Ruff (F541) is removed.src/scripts/coverage_agent/prompts/ui_test.py-31-31 (1)
31-31:⚠️ Potential issue | 🟡 MinorRemove extraneous f-string prefix.
Line 31 uses an f-string but contains no placeholders. This was flagged by Ruff (F541).
Proposed fix
- prompt = f"Generate Vitest tests for the following TypeScript source file.\n\n" + prompt = "Generate Vitest tests for the following TypeScript source file.\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/ui_test.py` at line 31, The prompt assignment in ui_test.py uses an unnecessary f-string (variable prompt) with no placeholders; replace the f-string literal with a regular string literal (remove the leading f) in the prompt = ... statement so it becomes a normal string to resolve the Ruff F541 warning.src/scripts/coverage_agent/prompts/python_test.py-45-45 (1)
45-45:⚠️ Potential issue | 🟡 MinorRemove extraneous f-string prefix.
Line 45 uses an f-string but contains no placeholders. This was flagged by Ruff (F541).
Proposed fix
- prompt = f"Generate unit tests for the following Python source file.\n\n" + prompt = "Generate unit tests for the following Python source file.\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/python_test.py` at line 45, The assignment to the variable prompt uses an unnecessary f-string prefix (prompt = f"...") even though there are no interpolation placeholders; update the prompt assignment in python_test.py to use a regular string literal (remove the leading "f") so the value is identical but avoids Ruff F541; locate the prompt variable assignment in the file (named prompt) and replace the f-string with a plain string literal.src/scripts/coverage_agent/prompts/go_test.py-4-4 (1)
4-4:⚠️ Potential issue | 🟡 MinorRemove unnecessary f-string prefix at line 30.
Line 30 uses an f-string without any placeholders. Change
f"Generate Go unit tests for the following source file.\n\n"to a regular string literal:prompt = "Generate Go unit tests for the following source file.\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/go_test.py` at line 4, Change the unnecessary f-string assignment to a normal string: locate the prompt variable in go_test.py (the line setting prompt = f"Generate Go unit tests for the following source file.\n\n") and remove the leading f so it becomes prompt = "Generate Go unit tests for the following source file.\n\n"; no other logic changes are needed.src/scripts/coverage_agent/plugins/openai_writer.py-1-3 (1)
1-3:⚠️ Potential issue | 🟡 MinorAdd module docstring to fix pylint C0114.
Pylint flagged the missing module docstring.
Proposed fix
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +"""OpenAI-based test writer plugin using ChatOpenAI from langchain-openai.""" import logging🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/plugins/openai_writer.py` around lines 1 - 3, Add a module-level docstring to resolve pylint C0114 by placing a short descriptive string at the top of the openai_writer.py module (first non-shebang/comment lines); the docstring should briefly state the module's purpose (e.g., "Utilities for writing coverage data to OpenAI/remote writer" or similar) and be separated by a blank line from the following imports/definitions so pylint recognizes it. Ensure the docstring appears before any code, comments aside from SPDX headers, and update any module-level documentation if needed.src/scripts/coverage_agent/main.py-1-3 (1)
1-3:⚠️ Potential issue | 🟡 MinorAdd module docstring to fix pylint C0114.
The pipeline failure indicates a missing module docstring.
Proposed fix
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +"""CLI entrypoint for the AI Coverage Agent pipeline.""" import argparse🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/main.py` around lines 1 - 3, Add a proper module-level docstring at the top of this module (coverage_agent.main / main.py) to satisfy pylint C0114: include a short description of the module’s responsibility and, if helpful, a brief note about key classes/functions or high-level behavior; place the docstring as the first statement in the file (before the SPDX headers or immediately after if headers must remain) so pylint recognizes it.src/scripts/coverage_agent/main.py-18-21 (1)
18-21:⚠️ Potential issue | 🟡 MinorFix line length violations flagged by pylint (C0301).
Lines exceed 100 characters. Per coding guidelines, add
# pylint: disable=line-too-longor shorten the lines.Proposed fix — shorten help strings
- parser.add_argument("--lcov-path", default="bazel-out/_coverage/_coverage_report.dat", - help="Path to backend LCOV coverage file (Python/Go)") - parser.add_argument("--ui-lcov-path", default="src/ui/coverage/lcov.info", - help="Path to UI LCOV coverage file (Vitest/TypeScript)") + parser.add_argument( + "--lcov-path", + default="bazel-out/_coverage/_coverage_report.dat", + help="Path to backend LCOV file (Python/Go)", + ) + parser.add_argument( + "--ui-lcov-path", + default="src/ui/coverage/lcov.info", + help="Path to UI LCOV file (Vitest/TypeScript)", + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/main.py` around lines 18 - 21, The two argument definitions for parser.add_argument (flags --lcov-path and --ui-lcov-path) have help strings that push the source lines past the 100-character limit; shorten the help text to be concise (e.g., "Path to backend LCOV file" and "Path to UI LCOV file") or move the long help text to a nearby constant and reference that constant, ensuring each resulting source line is <=100 chars; alternatively, if you intentionally want to keep the long descriptions, add a per-line pylint exemption comment (# pylint: disable=line-too-long) next to the offending parser.add_argument calls (prefer shortening first)..github/workflows/coverage-agent-respond.yaml-89-92 (1)
89-92:⚠️ Potential issue | 🟡 MinorMissing
OPENAI_API_KEYenvironment variable.The main
coverage-agent.yamlworkflow passesOPENAI_API_KEY(line 148), but this workflow omits it. If the respond handler supports theopenaiprovider, this secret should be included for consistency.Proposed fix
env: NVIDIA_API_KEY: ${{ secrets.NVIDIA_NIM_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/coverage-agent-respond.yaml around lines 89 - 92, Add the missing OPENAI_API_KEY env entry to the workflow's env block so the respond handler has parity with the main coverage-agent.yaml; specifically, update the env list that currently contains NVIDIA_API_KEY, ANTHROPIC_API_KEY, and GH_TOKEN to also include OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} so any code paths that use the openai provider will receive the secret.src/scripts/coverage_agent/lcov_parser.py-1-6 (1)
1-6:⚠️ Potential issue | 🟡 MinorAdd pylint disable for long copyright line and module docstring.
The pipeline reports
Line too long (103/100)andMissing module docstring. Per coding guidelines, add# pylint: disable=line-too-longfor the copyright line and add a module docstring.Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +# pylint: enable=line-too-long +"""LCOV coverage file parser for the coverage agent pipeline.""" import dataclasses import fnmatch🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/lcov_parser.py` around lines 1 - 6, Add a pylint disable and a module docstring at the top of src/scripts/coverage_agent/lcov_parser.py: append a comment like "# pylint: disable=line-too-long" to the long copyright header line and add a short module-level docstring (a one- or two-sentence triple-quoted string) directly after the SPDX header and before the imports (i.e., before the existing import dataclasses / fnmatch) that briefly describes the purpose of the lcov_parser module.src/scripts/coverage_agent/tools/test_runner.py-14-30 (1)
14-30:⚠️ Potential issue | 🟡 MinorReturn type annotation is misleading;
_file_path_to_bazel_targetalways returns a string.The function signature declares
Optional[str]but the fallback on lines 28-30 guarantees a non-Nonereturn. This causes the mypy error at line 54 whereshlex.quote(target)receivesstr | None. Either:
- Remove
Optionalsince the function always returns a string (preferred), or- Add a
Nonecheck before callingshlex.quote.Proposed fix
-def _file_path_to_bazel_target(file_path: str) -> Optional[str]: +def _file_path_to_bazel_target(file_path: str) -> str:And remove the unused
Optionalimport if no longer needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tools/test_runner.py` around lines 14 - 30, The function _file_path_to_bazel_target always returns a string (either a discovered target or the fallback), so change its return type annotation from Optional[str] to str and remove the now-unused Optional import; this will fix the mypy error when passing its result to shlex.quote (no None check needed). Locate _file_path_to_bazel_target and update its signature and imports accordingly, and run type checks to ensure no other callers relied on Optional.src/scripts/coverage_agent/tests/test_lcov_parser.py-224-229 (1)
224-229:⚠️ Potential issue | 🟡 MinorMove import to module top level.
The import inside
test_ui_files_detected_as_ui_typeviolates coding guidelines which state: "All imports must be at the top level of the module after the module docstring. No exceptions."Proposed fix
Move the import to the top of the file with other imports:
from coverage_agent.lcov_parser import CoverageEntry, parse_lcov +from coverage_agent.plugins.base import TestType, detect_test_typeThen remove line 224 and update the test:
def test_ui_files_detected_as_ui_type(self): """Verifies that UI files from Vitest LCOV get detect_test_type == UI. Note: generated.ts IS a valid UI file by path — it's filtered out by the LCOV parser's _is_ignored(), not by detect_test_type(). This test verifies both layers work correctly together. """ - from coverage_agent.plugins.base import TestType, detect_test_type - self.assertEqual(detect_test_type("src/ui/src/lib/utils.ts"), TestType.UI)As per coding guidelines: "All imports must be at the top level of the module after the module docstring. No exceptions: imports inside functions are not allowed."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tests/test_lcov_parser.py` around lines 224 - 229, The inline import of TestType and detect_test_type inside the test function test_ui_files_detected_as_ui_type violates the rule requiring top-level imports; move the line "from coverage_agent.plugins.base import TestType, detect_test_type" to the module import section with the other imports (so detect_test_type and TestType are imported at top-level), remove the in-function import, and ensure the test still references detect_test_type and TestType unchanged.src/scripts/coverage_agent/plugins/__init__.py-1-3 (1)
1-3:⚠️ Potential issue | 🟡 MinorAdd pylint disable for long copyright line and module docstring.
The pipeline reports
Line too long (103/100)andMissing module docstring.Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +# pylint: enable=line-too-long +"""Plugin registry for coverage agent writer implementations.""" from coverage_agent.plugins.base import (🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/plugins/__init__.py` around lines 1 - 3, Add a short module docstring or a pylint disable directive at the top of src/scripts/coverage_agent/plugins/__init__.py to silence the "Missing module docstring" and "Line too long (103/100)" warnings; for example, add a one-line docstring or prepend a comment like "# pylint: disable=line-too-long,missing-module-docstring" immediately after the SPDX header so the linter no longer raises those two errors for this module.src/scripts/coverage_agent/nodes/create_pr.py-86-95 (1)
86-95:⚠️ Potential issue | 🟡 MinorPR title/body with special characters may break shell command.
The PR title and body are embedded directly in a shell command string. If file paths contain quotes or other shell metacharacters, this could fail or cause injection. Consider using
gh pr createwith--body-fileor passing arguments as a list.♻️ Proposed fix using --body-file
+ import tempfile + with tempfile.NamedTemporaryFile(mode="w", suffix=".md", delete=False) as f: + f.write(pr_body) + body_file = f.name + result = run_shell( - f'gh pr create --title "{pr_title}" --body "{pr_body}" --label ai-generated' + f'gh pr create --title {shlex.quote(pr_title)} --body-file {body_file} --label ai-generated' )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/nodes/create_pr.py` around lines 86 - 95, The current use of run_shell with f'gh pr create --title "{pr_title}" --body "{pr_body}" --label ai-generated' embeds pr_title/pr_body into a shell string and can break or allow injection; change create_pr logic to write pr_body to a temporary file and call run_shell with a safe argument form using --body-file pointing to that temp file (or switch run_shell to accept an args list and invoke gh directly), ensure pr_title is sanitized or passed as an argument (not interpolated into a shell string), and keep the existing label-creation retry flow by re-running the same safe invocation (using the same --body-file or args list) after creating the ai-generated label; reference run_shell, pr_title, pr_body and the retry block for changes.src/scripts/coverage_agent/nodes/analyze.py-99-100 (1)
99-100:⚠️ Potential issue | 🟡 MinorLine exceeds 100 character limit.
Per the pipeline failure, this line is 119 characters. Break it up for compliance.
♻️ Proposed fix
for target in targets: - logger.info(" %.1f%% %s (existing_test=%s)", target.coverage_pct, target.file_path, target.existing_test_path) + logger.info( + " %.1f%% %s (existing_test=%s)", + target.coverage_pct, target.file_path, target.existing_test_path, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/nodes/analyze.py` around lines 99 - 100, The logger.info call inside the for target in targets loop exceeds 100 chars; refactor the long log statement by constructing a shorter message or splitting arguments before the call — e.g., build a short msg/fields like coverage = f"{target.coverage_pct:.1f}%" and path = target.file_path (and existing = target.existing_test_path), then call logger.info with a shorter format string or multiple concatenated pieces instead of the single 119-char line so the final logger.info invocation is under 100 characters.src/scripts/coverage_agent/nodes/analyze.py-122-127 (1)
122-127:⚠️ Potential issue | 🟡 MinorMutation of input data may cause side effects.
Directly modifying
entry.file_pathmutates the originalCoverageEntryobjects, which could cause unexpected behavior if the same entries are used elsewhere. Consider creating a copy or normalizing during target creation.♻️ Proposed fix - normalize during target creation
for entry in ui_entries: - if not entry.file_path.startswith("src/ui/"): - entry.file_path = f"src/ui/{entry.file_path}" + pass # Normalization moved to select_targets logger.info("UI: %d coverage entries", len(ui_entries)) all_entries.extend(ui_entries)Then in
select_targets, normalize when creating the target:file_path = entry.file_path if not file_path.startswith("src/ui/") and test_type == TestType.UI: file_path = f"src/ui/{file_path}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/nodes/analyze.py` around lines 122 - 127, The loop in analyze.py mutates CoverageEntry.file_path (ui_entries loop) which can cause side effects; stop changing entry.file_path and instead normalize the path when constructing targets (e.g., inside select_targets). Change target creation to use a local variable: file_path = entry.file_path; if test_type == TestType.UI and not file_path.startswith("src/ui/"): file_path = f"src/ui/{file_path}"; then use file_path for the target creation (or return a copied/normalized CoverageEntry) so original CoverageEntry objects remain unchanged.src/scripts/coverage_agent/respond.py-128-132 (1)
128-132:⚠️ Potential issue | 🟡 MinorFragile error detection via string prefix.
Checking
file_content.startswith("Error")is unreliable—a legitimate file could start with "Error". Consider returning a tuple(success, content)or raising an exception fromread_file.♻️ Proposed fix
- file_content = read_file(comment.file_path) - if file_content.startswith("Error"): + try: + file_content = read_file(comment.file_path) + except (FileNotFoundError, OSError) as e: _reply_to_comment(pr_number, comment_id, f"Unable to read file: {comment.file_path}") returnThis requires
read_fileto raise exceptions on failure instead of returning error strings.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/respond.py` around lines 128 - 132, The code currently treats read_file returning an "Error" string by checking file_content.startswith("Error"), which is fragile; update read_file to either raise an exception on failure or return a (success, content) tuple, and change the caller in respond.py to handle that explicit contract: if read_file raises, catch the exception around the call in the function that uses file_content (keeping _reply_to_comment and pr_number/comment_id usage), log/reply with the exception message and return; or if you adopt (success, content), destructure into success, file_content and branch on success instead of doing startswith checks. Ensure you modify read_file's signature and all its call sites consistently (identify read_file and the caller that currently checks file_content.startswith("Error")).src/scripts/coverage_agent/plugins/base.py-1-8 (1)
1-8:⚠️ Potential issue | 🟡 MinorFix pylint failures: add line-length disable and module docstring.
The pipeline reports two pylint violations:
- Line 1 exceeds 100 characters (103/100)
- Missing module docstring (C0114)
Per coding guidelines, add the pylint disable comment for the copyright line rather than breaking it.
Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +# pylint: enable=line-too-long +"""Base plugin abstractions and shared type utilities for the coverage agent pipeline.""" import dataclasses🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/plugins/base.py` around lines 1 - 8, Append a pylint disable for line length to the long SPDX copyright line (e.g., add " # pylint: disable=line-too-long" to the existing SPDX line) so you don't wrap it, and add a short module docstring immediately after the SPDX/header lines (for example a one-line description like "Base classes and types for coverage agent plugins") to satisfy C0114; keep the rest of imports and code unchanged.
🧹 Nitpick comments (21)
src/scripts/coverage_agent/tools/file_ops.py (2)
1-2: Add# pylint: disable=line-too-longfor the copyright header.Same 103-character copyright line as other files in this package.
Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tools/file_ops.py` around lines 1 - 2, The copyright header line in src/scripts/coverage_agent/tools/file_ops.py exceeds pylint's default line length; add a module-level pylint disable for line-too-long (e.g., add "# pylint: disable=line-too-long") near the top of the file (alongside the SPDX header) so pylint will ignore that long copyright line while leaving other checks intact.
10-11: Specify explicit encoding for file operations.Both
open()calls rely on the platform default encoding, which can vary. Explicitly specifyencoding="utf-8"for consistent behavior across environments, especially since this handles source code files.Proposed fix
- with open(path) as file: + with open(path, encoding="utf-8") as file: return file.read()- with open(path, "w") as file: + with open(path, "w", encoding="utf-8") as file: file.write(content)Also applies to: 22-23
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tools/file_ops.py` around lines 10 - 11, The open() calls in src/scripts/coverage_agent/tools/file_ops.py use the platform default encoding; update both occurrences (the open(path) in the read routine and the other open(...) at lines ~22-23) to pass encoding="utf-8" so source files are read consistently across environments (i.e., replace open(path) with open(path, encoding="utf-8") in the functions that read files).src/scripts/coverage_agent/tests/__init__.py (1)
1-2: Add# pylint: disable=line-too-longfor consistency.Same 103-character copyright line. Add the pylint disable comment for consistency with the coding guidelines.
Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tests/__init__.py` around lines 1 - 2, The file src/scripts/coverage_agent/tests/__init__.py has a long copyright line and needs the pylint directive to suppress line-too-long; add a single-line comment "# pylint: disable=line-too-long" at the top of the module (e.g., immediately after the SPDX header in __init__.py) so pylint won't flag the 103-character copyright line.src/scripts/coverage_agent/tools/__init__.py (1)
1-2: Add# pylint: disable=line-too-longfor consistency.Same 103-character copyright line as other
__init__.pyfiles. Add the pylint disable comment for consistency and to prevent future linting failures. As per coding guidelines, copyright lines exceeding 100 characters should include this disable comment.Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tools/__init__.py` around lines 1 - 2, Add a pylint disable for line-too-long at the top of this package init file to match other modules: insert the comment "# pylint: disable=line-too-long" immediately after the SPDX copyright lines in src/scripts/coverage_agent/tools/__init__.py so the long copyright string is exempt from the 100-character lint rule.src/scripts/coverage_agent/prompts/__init__.py (1)
1-2: Add# pylint: disable=line-too-longfor consistency.Same 103-character copyright line. Add the pylint disable comment for consistency with the coding guidelines.
Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/__init__.py` around lines 1 - 2, Insert the pylint disable comment to suppress line-too-long warnings in this module by adding the line "# pylint: disable=line-too-long" near the top of src/scripts/coverage_agent/prompts/__init__.py (e.g., immediately after the SPDX header lines), so the long copyright line won't trigger pylint; ensure the comment text matches exactly and keep formatting consistent.src/scripts/coverage_agent/prompts/quality_rules.py (1)
1-2: Add# pylint: disable=line-too-longfor the copyright header.Same 103-character copyright line as other files in this package.
Proposed fix
+# pylint: disable=line-too-long # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/quality_rules.py` around lines 1 - 2, The file-level copyright header exceeds the linter line length: add a module-level pylint disable to suppress this check by inserting "# pylint: disable=line-too-long" near the top of the module (adjacent to the existing SPDX header) so the long 103-character copyright line is accepted by pylint; ensure the directive is placed before or immediately after the SPDX lines to apply to the whole file.src/scripts/coverage_agent/plugins/openai_writer.py (1)
10-14: Remove unused imports.Lines 10-14 import modules that are not used in this file:
go_test,python_test,ui_test,file_ops, andshell. These are inherited fromNemotronWriterand don't need to be re-imported here.Proposed fix
from coverage_agent.plugins.base import GeneratedTest, ValidationResult, WriterPlugin from coverage_agent.plugins.nemotron import NemotronWriter -from coverage_agent.prompts.go_test import GO_TEST_SYSTEM_PROMPT, build_go_prompt -from coverage_agent.prompts.python_test import PYTHON_TEST_SYSTEM_PROMPT, build_python_prompt -from coverage_agent.prompts.ui_test import UI_TEST_SYSTEM_PROMPT, build_ui_prompt -from coverage_agent.tools.file_ops import read_file, write_file -from coverage_agent.tools.shell import run_shell🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/plugins/openai_writer.py` around lines 10 - 14, Remove the unused imports at the top of openai_writer.py: delete the references to GO_TEST_SYSTEM_PROMPT and build_go_prompt (from coverage_agent.prompts.go_test), PYTHON_TEST_SYSTEM_PROMPT and build_python_prompt (from coverage_agent.prompts.python_test), UI_TEST_SYSTEM_PROMPT and build_ui_prompt (from coverage_agent.prompts.ui_test), and read_file/write_file (from coverage_agent.tools.file_ops) and run_shell (from coverage_agent.tools.shell) since those utilities are provided by or inherited from NemotronWriter and are not referenced in this module; keep only imports actually used in this file to eliminate lint warnings and unused dependency clutter.src/scripts/coverage_agent/BUILD (1)
9-9: Glob pattern includes test files in the binary.
glob(["**/*.py"])will include all Python files includingtests/*.py. Test files typically shouldn't be part of the production binary.Proposed fix: Exclude test files
srcs = glob( - ["**/*.py"], + ["**/*.py"], + exclude = ["tests/**"], ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/BUILD` at line 9, The srcs glob in the BUILD rule currently uses glob(["**/*.py"]) which pulls test files into the binary; change the srcs definition to exclude test patterns by replacing glob(["**/*.py"]) with a glob that adds an exclude list (e.g., exclude=["**/tests/*.py", "**/*_test.py", "**/test_*.py"]) so the BUILD's srcs no longer include test modules.src/scripts/coverage_agent/state.py (1)
16-16: Consider usingLiteraltype fortest_type.The comment indicates valid values are
"python" | "go" | "ui". UsingLiteralwould provide type safety and better IDE support.Proposed enhancement
import dataclasses -from typing import Optional, TypedDict +from typing import Literal, Optional, TypedDict from coverage_agent.plugins.base import GeneratedTest +TestType = Literal["python", "go", "ui"] + + `@dataclasses.dataclass` class CoverageTarget: file_path: str uncovered_ranges: list[tuple[int, int]] coverage_pct: float existing_test_path: Optional[str] - test_type: str # "python" | "go" | "ui" + test_type: TestType build_package: str # Bazel package path, e.g. "//src/utils/job"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/state.py` at line 16, Change the test_type annotation from a plain str to a Literal to enforce the allowed values; import Literal (from typing or typing_extensions as appropriate) and update the annotation to Literal['python', 'go', 'ui'] for the test_type field (refer to the test_type declaration in the dataclass/module) so IDEs and type checkers will validate the three allowed options.src/scripts/coverage_agent/tests/test_graph.py (1)
10-33: Consider addinglcov_pathto the defaults for consistency.The
_make_statehelper includesui_lcov_pathbut omitslcov_path, which is present inmain.py'sinitial_state. While this may not affect the routing functions under test, adding it ensures the helper produces a completeCoverageStatefor future test scenarios.Suggested addition
"dry_run": False, + "lcov_path": "bazel-out/_coverage/_coverage_report.dat", "ui_lcov_path": "src/ui/coverage/lcov.info", "errors": [],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tests/test_graph.py` around lines 10 - 33, The test helper _make_state returns a CoverageState lacking the lcov_path key present in the application's initial_state, so add a sensible default "lcov_path" (e.g., same value as "ui_lcov_path" or the app's expected lcov path) to the defaults dict in _make_state to keep the test CoverageState consistent with main.py's initial_state; update the defaults variable inside the _make_state function (which constructs CoverageState) to include "lcov_path": "<appropriate default>".src/scripts/coverage_agent/tests/test_analyze.py (1)
38-39: Consider usingPath.touch()for cleaner file creation.The
open(...).close()pattern works butpathlib.Path.touch()is more idiomatic for creating empty files.Suggested refactor
+from pathlib import Path + # In test methods: - open(source, "w").close() - open(test_file, "w").close() + Path(source).touch() + Path(test_file).touch()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tests/test_analyze.py` around lines 38 - 39, Replace the open(...).close() calls with pathlib.Path.touch() for idiomatic empty-file creation: convert the variables source and test_file to Path objects if they are strings (e.g., Path(source).touch()) or call source.touch()/test_file.touch() if they are already Path instances, and add "from pathlib import Path" import if missing; ensure parents exist beforehand (Path(...).parent.mkdir(parents=True, exist_ok=True)) when needed..github/workflows/coverage-agent.yaml (1)
121-127: Redundant error suppression.Line 126 has
|| truein the command, and line 127 hascontinue-on-error: true. Only one is needed—the step-levelcontinue-on-erroris cleaner as it preserves the actual exit code in logs for debugging.Proposed fix
run: | pnpm install --frozen-lockfile # Ensure lcov reporter is included (produces coverage/lcov.info for the agent) - npx vitest run --coverage --coverage.reporter=lcov --coverage.reporter=text || true + npx vitest run --coverage --coverage.reporter=lcov --coverage.reporter=text continue-on-error: true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/coverage-agent.yaml around lines 121 - 127, Remove the redundant command-level error suppression by deleting the trailing "|| true" from the run line that invokes "npx vitest run --coverage --coverage.reporter=lcov --coverage.reporter=text || true" in the "Run UI coverage" step; keep the step-level "continue-on-error: true" so the workflow preserves the real exit code in logs while still allowing the job to continue.src/scripts/coverage_agent/lcov_parser.py (1)
69-69: Specify explicit encoding when opening files.
open(lcov_path)should specifyencoding="utf-8"for consistent cross-platform behavior.Proposed fix
- with open(lcov_path) as file: + with open(lcov_path, encoding="utf-8") as file:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/lcov_parser.py` at line 69, Replace the bare file open call "with open(lcov_path) as file:" so it explicitly sets the encoding (e.g., encoding="utf-8") when opening the LCOV file; locate the open call that uses the lcov_path variable in lcov_parser.py and update it to open with encoding="utf-8" (and optionally errors="replace") to ensure consistent cross-platform handling of the file read.src/scripts/coverage_agent/tools/test_runner.py (1)
65-68: Specify explicit encoding when opening files.
open(part)uses the system default encoding, which can vary. Specifyencoding="utf-8"for consistent behavior.Proposed fix
- with open(part) as log_file: + with open(part, encoding="utf-8") as log_file:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tools/test_runner.py` around lines 65 - 68, The file-reading block uses open(part) without an explicit encoding; change the open call in the try block that reads test log parts (the one creating log_file and reading into output using part and log_file) to specify encoding="utf-8" (e.g., open(part, encoding="utf-8")) so file reads are consistent across environments.src/scripts/coverage_agent/tests/test_tools.py (2)
29-49: Temp directories created bymkdtemp()are not cleaned up.
tempfile.mkdtemp()creates directories that persist after tests. Consider usingtempfile.TemporaryDirectory()as a context manager or adding cleanup infinallyblocks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tests/test_tools.py` around lines 29 - 49, The tests in TestWriteFile (test_write_file_creates_file and test_write_file_creates_parent_dirs) use tempfile.mkdtemp() which leaves temp directories behind; update these tests to use tempfile.TemporaryDirectory() as a context manager (or explicitly remove the created temp dir in the finally block) so the temporary directories are removed after the test, and ensure the existing cleanup that unlinks the file still runs; modify usages of mkdtemp() to TemporaryDirectory() around the path construction for write_file to guarantee automatic cleanup.
4-10: Unused import:signal.The
signalmodule is imported but never used in this file.Proposed fix
import os -import signal import tempfile import unittest🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tests/test_tools.py` around lines 4 - 10, Remove the unused import of the signal module in the test file: locate the import line that includes "import signal" in the top of the tests (alongside imports for os, tempfile, unittest, read_file/write_file, run_shell) and delete that token so only required modules are imported; run the test linter to confirm no unused-import warnings remain.src/scripts/coverage_agent/plugins/claude_code.py (1)
227-239: Specify explicit encoding when opening files.The
open()calls at lines 228 and 235 don't specify encoding, which can cause issues on systems with non-UTF-8 default encodings.♻️ Proposed fix
try: - with open(test_file_path) as file: + with open(test_file_path, encoding="utf-8") as file: test_content = file.read() build_dir = os.path.dirname(test_file_path) build_path = os.path.join(build_dir, "BUILD") build_entry = None try: - with open(build_path) as file: + with open(build_path, encoding="utf-8") as file: build_entry = file.read() except FileNotFoundError: pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/plugins/claude_code.py` around lines 227 - 239, The open() calls reading test_file_path and build_path do not specify an encoding; update the file reads (where test_content is assigned and where build_entry is read) to open files with an explicit encoding (e.g., encoding="utf-8") and explicit mode ("r") to avoid platform-dependent defaults—modify the open(...) usages that reference test_file_path and build_path in claude_code.py so they include these parameters and keep the existing FileNotFoundError handling for build_entry.src/scripts/coverage_agent/nodes/quality_gate.py (3)
62-72: Regex won't match async test methods.The pattern
def (test_\w+)won't captureasync def test_*methods, which are common in pytest-asyncio codebases. Consider supporting both sync and async patterns if you expect to generate async tests.♻️ Optional fix to support async test methods
- method_pattern = r"def (test_\w+)\(self.*?\):(.*?)(?=\n def |\nclass |\Z)" + method_pattern = r"(?:async\s+)?def (test_\w+)\(self.*?\):(.*?)(?=\n\s+(?:async\s+)?def |\nclass |\Z)"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/nodes/quality_gate.py` around lines 62 - 72, The existing method_pattern only matches "def (test_...)" and thus misses async test methods; update the regex used in the test_type == "python" branch (the variable method_pattern) to accept an optional "async" prefix (e.g., allow "(?:async\s+)?" before "def") while keeping DOTALL and the same lookahead so methods = re.findall(method_pattern, content, re.DOTALL) still returns (name, body); then run the same assertion check against PYTHON_ASSERTION_PATTERNS and append to issues as before for name/body pairs that lack assertions.
208-211: Warnings appended to errors list creates semantic confusion.Quality warnings are being appended to the
errorslist (line 211), mixing different severity levels. Consider using a separatewarningsfield inCoverageStateor prefixing with a clear marker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/nodes/quality_gate.py` around lines 208 - 211, The code is appending result.warnings into the errors list (errors.append(...)) which mixes severities; change handling to record warnings separately: add or use a warnings collection on CoverageState (e.g., CoverageState.warnings) and append f"Quality warnings for {file_path}: {'; '.join(result.warnings)}" to that list instead of errors, keeping the existing logger.info(" Warning: %s", warning) loop unchanged; if you cannot add a field, alternatively prefix the message (e.g., "WARNING: ...") before appending to errors so severity is explicit.
192-198: Specify explicit encoding when opening files.The
open()call doesn't specify encoding, which can cause issues on systems with non-UTF-8 default encodings.♻️ Proposed fix
- with open(file_path) as file: + with open(file_path, encoding="utf-8") as file: content = file.read()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/nodes/quality_gate.py` around lines 192 - 198, The open(file_path) call in the Quality Gate code does not specify an encoding which can lead to platform-dependent decoding errors; update the file reading in the try block that uses file_path to open the file with an explicit encoding (e.g., encoding="utf-8") and keep the existing FileNotFoundError handling and logger.warning / errors.append behavior intact so logger and errors usage remains unchanged.src/scripts/coverage_agent/tests/test_integration.py (1)
7-7: Remove unused imports.
MagicMockandpatchare imported but not used in this test file.♻️ Proposed fix
-from unittest.mock import MagicMock, patch🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tests/test_integration.py` at line 7, Remove the unused imports MagicMock and patch from the top-level import statement in this test file (the "from unittest.mock import MagicMock, patch" line in test_integration.py); simply delete the unused names so the file only imports what it actually uses to avoid lint errors and clutter.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9550ee2d-3490-4186-9fdd-919a505db35b
📒 Files selected for processing (41)
.github/workflows/coverage-agent-respond.yaml.github/workflows/coverage-agent.yamlsrc/agent_requirements.txtsrc/scripts/coverage_agent/BUILDsrc/scripts/coverage_agent/__init__.pysrc/scripts/coverage_agent/graph.pysrc/scripts/coverage_agent/lcov_parser.pysrc/scripts/coverage_agent/main.pysrc/scripts/coverage_agent/nodes/__init__.pysrc/scripts/coverage_agent/nodes/analyze.pysrc/scripts/coverage_agent/nodes/create_pr.pysrc/scripts/coverage_agent/nodes/quality_gate.pysrc/scripts/coverage_agent/nodes/validate.pysrc/scripts/coverage_agent/nodes/write.pysrc/scripts/coverage_agent/plugins/__init__.pysrc/scripts/coverage_agent/plugins/base.pysrc/scripts/coverage_agent/plugins/claude_code.pysrc/scripts/coverage_agent/plugins/nemotron.pysrc/scripts/coverage_agent/plugins/openai_writer.pysrc/scripts/coverage_agent/prompts/__init__.pysrc/scripts/coverage_agent/prompts/go_test.pysrc/scripts/coverage_agent/prompts/python_test.pysrc/scripts/coverage_agent/prompts/quality_rules.pysrc/scripts/coverage_agent/prompts/ui_test.pysrc/scripts/coverage_agent/respond.pysrc/scripts/coverage_agent/state.pysrc/scripts/coverage_agent/tests/__init__.pysrc/scripts/coverage_agent/tests/test_analyze.pysrc/scripts/coverage_agent/tests/test_graph.pysrc/scripts/coverage_agent/tests/test_integration.pysrc/scripts/coverage_agent/tests/test_lcov_parser.pysrc/scripts/coverage_agent/tests/test_plugins_base.pysrc/scripts/coverage_agent/tests/test_quality_gate.pysrc/scripts/coverage_agent/tests/test_respond.pysrc/scripts/coverage_agent/tests/test_state.pysrc/scripts/coverage_agent/tests/test_tools.pysrc/scripts/coverage_agent/tools/__init__.pysrc/scripts/coverage_agent/tools/bazel_query.pysrc/scripts/coverage_agent/tools/file_ops.pysrc/scripts/coverage_agent/tools/shell.pysrc/scripts/coverage_agent/tools/test_runner.py
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (1)
src/scripts/coverage_agent/respond.py (1)
95-101:⚠️ Potential issue | 🟠 MajorCommand injection risk persists—escaping is inadequate.
The escaping on line 97 (
message.replace('"', '\\"')) only handles double quotes. Shell metacharacters like backticks,$(), newlines, and others remain unescaped, allowing command injection via malicious comment bodies.🔒 Proposed fix using subprocess directly
+import subprocess + def _reply_to_comment(pr_number: int, comment_id: int, message: str) -> None: """Post a reply to a review comment.""" - escaped_message = message.replace('"', '\\"') - run_shell( - f'gh api repos/{{owner}}/{{repo}}/pulls/{pr_number}/comments/{comment_id}/replies ' - f'-f body="{escaped_message}"' - ) + subprocess.run( + [ + "gh", "api", + f"repos/{{owner}}/{{repo}}/pulls/{pr_number}/comments/{comment_id}/replies", + "-f", f"body={message}", + ], + check=False, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/respond.py` around lines 95 - 101, _reply_to_comment currently sanitizes only double quotes and builds a shell string passed to run_shell, leaving it vulnerable to shell metacharacters and command injection; change it to avoid shell interpolation by calling the GitHub API with a safe subprocess call (or the existing run_shell abstraction) that accepts an argument list or uses stdin/JSON payloads rather than a formatted shell string: stop building f'... -f body="{escaped_message}"', instead pass the body as a separate argument (or send JSON via stdin with subprocess.run([...], input=body_bytes, check=True)) so the message content is never executed by a shell; update references to _reply_to_comment and the call site of run_shell to use the safe-argument approach.
🧹 Nitpick comments (6)
src/scripts/coverage_agent/prompts/python_test.py (1)
46-46: Remove extraneousfprefix from string literal.Same issue as
go_test.py- line 46 uses an f-string without placeholders.Proposed fix
- prompt = f"Generate unit tests for the following Python source file.\n\n" + prompt = "Generate unit tests for the following Python source file.\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/python_test.py` at line 46, The string assigned to variable prompt in python_test.py is using an unnecessary f-string with no placeholders; change the assignment of prompt to use a regular string literal (remove the leading f) so it's a normal string ("Generate unit tests for the following Python source file.\n\n"), ensuring the prompt variable remains unchanged otherwise.src/scripts/coverage_agent/tools/test_runner.py (1)
20-36: Return typeOptional[str]is misleading; function always returns a string.The function
_file_path_to_bazel_targetdeclares return typeOptional[str], but it always returns a string—either the discovered target (line 32) or the fallback (line 36). The fallback path is unconditional, soNoneis never returned.Proposed fix
-def _file_path_to_bazel_target(file_path: str) -> Optional[str]: +def _file_path_to_bazel_target(file_path: str) -> str:And remove the unused
Optionalimport if no longer needed elsewhere.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/tools/test_runner.py` around lines 20 - 36, The function _file_path_to_bazel_target currently annotates its return type as Optional[str] but always returns a string (either the discovered target or the fallback), so change the return annotation to str; also remove the Optional import from the module if it's no longer used elsewhere, and leave the implementation of _file_path_to_bazel_target (package, basename, run_shell call, fallback) unchanged.src/scripts/coverage_agent/prompts/go_test.py (1)
31-31: Remove extraneousfprefix from string literal.Line 31 uses an f-string without any placeholders. This is flagged by Ruff (F541).
Proposed fix
- prompt = f"Generate Go unit tests for the following source file.\n\n" + prompt = "Generate Go unit tests for the following source file.\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/go_test.py` at line 31, The string assigned to variable prompt in src/scripts/coverage_agent/prompts/go_test.py is using an unnecessary f-string (no placeholders); remove the leading f so the assignment uses a plain string literal (e.g., change prompt = f"Generate Go unit tests..." to prompt = "Generate Go unit tests...") to satisfy the Ruff F541 warning and keep the rest of the logic in the same function/module unchanged.src/scripts/coverage_agent/prompts/ui_test.py (1)
32-32: Remove extraneousfprefix from string literal.Same issue as the other prompt modules—line 32 uses an f-string without placeholders.
Proposed fix
- prompt = f"Generate Vitest tests for the following TypeScript source file.\n\n" + prompt = "Generate Vitest tests for the following TypeScript source file.\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/prompts/ui_test.py` at line 32, The prompt string assigned to the variable prompt in ui_test.py is using an unnecessary f-string (f"Generate Vitest tests for the following TypeScript source file.\n\n") even though there are no placeholders; remove the extraneous f prefix and make it a normal string (e.g., "Generate Vitest tests for the following TypeScript source file.\n\n") so the prompt assignment in that module/function (prompt variable) uses a plain string literal.src/scripts/coverage_agent/graph.py (1)
7-11: Add type annotations for fallback assignments.The fallback assignments for
ENDandStateGraphshould include type ignore comments to satisfy type checkers. This was partially addressed from the previous review but the type annotations are missing.♻️ Proposed fix
try: from langgraph.graph import END, StateGraph except ImportError: - END = None - StateGraph = None + END = None # type: ignore[misc,assignment] + StateGraph = None # type: ignore[misc,assignment]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/graph.py` around lines 7 - 11, In the try/except fallback in graph.py, annotate the fallback assignments for END and StateGraph so the type checker is satisfied: either import typing.Any and declare END: Any = None and StateGraph: Any = None, or append explicit assignment comments like "END = None # type: ignore" and "StateGraph = None # type: ignore" inside the except ImportError block to silence type errors when the import fails.src/scripts/coverage_agent/plugins/nemotron.py (1)
10-13: Add type annotation forChatNVIDIAfallback assignment.The fallback
ChatNVIDIA = Noneon line 13 lacks a type annotation, which may cause mypy issues. Add a type ignore comment.♻️ Proposed fix
try: from langchain_nvidia_ai_endpoints import ChatNVIDIA except ImportError: - ChatNVIDIA = None + ChatNVIDIA = None # type: ignore[misc,assignment]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/plugins/nemotron.py` around lines 10 - 13, The fallback assignment for ChatNVIDIA lacks a type annotation and can trigger mypy errors; update the except block to annotate the fallback (e.g., change "ChatNVIDIA = None" to "ChatNVIDIA = None # type: ignore" or "ChatNVIDIA: Any = None # type: ignore" and add an appropriate import for Any if used) so the ChatNVIDIA symbol has a mypy-safe fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/scripts/coverage_agent/nodes/create_pr.py`:
- Around line 87-96: The gh CLI invocation embeds pr_title and pr_body in double
quotes which breaks for quotes, backticks, or $ chars; fix by passing a safely
quoted title (use shlex.quote on pr_title before building the command) and
writing pr_body to a temporary file then using gh's --body-file flag (or,
alternately, extend run_shell to accept an input parameter and pipe the body via
stdin). Update both the initial call and the retry after label creation in
create_pr.py (references: run_shell, pr_title, pr_body) to use the quoted title
and --body-file with proper tempfile creation and cleanup, and leave
label-creation logic unchanged except to re-run the new command form.
- Around line 38-66: Sanitize all interpolated shell arguments before calling
run_shell to prevent shell injection: use shlex.quote on branch_name wherever
used (the git checkout and git push calls) and on each entry of
state["generated_files"] when building the files string for git add (quote every
filename individually); also pass a safely quoted commit_message to git commit
by shlex.quote(commit_message) instead of embedding raw double quotes. Locate
uses of run_shell, branch_name, commit_message, and state["generated_files"] in
create_pr.py and replace direct f-string interpolation with shlex.quote-wrapped
values.
In `@src/scripts/coverage_agent/nodes/quality_gate.py`:
- Around line 63-72: The current method_pattern assumes exactly four spaces ("\n
def ") which breaks on tabs or different indentation; update method_pattern to
use a whitespace-aware delimiter (e.g. use "\n\s+def " and "\n\s*class " in the
lookahead) and enable re.MULTILINE when calling re.findall so boundaries are
detected regardless of indentation. Specifically, change the method_pattern
variable and the re.findall call that uses it (referencing method_pattern and
the re.findall(..., re.DOTALL) invocation) so the lookahead uses "\n\s+def " and
"\n\s*class " (or equivalent \s patterns) and add re.MULTILINE to the flags.
In `@src/scripts/coverage_agent/respond.py`:
- Around line 133-136: The current check using file_content.startswith("Error")
is fragile; change read_file to use an explicit error signaling mechanism (e.g.,
raise an exception on failure or return a (success, content) tuple/None) and
then update respond.py to handle that explicitly: call read_file inside a
try/except (or inspect the success flag/None), and on failure call
_reply_to_comment(pr_number, comment_id, f"Unable to read file:
{comment.file_path}") and return; otherwise proceed with the valid file_content.
Ensure you update all usages of read_file accordingly and remove the
startswith("Error") check.
- Around line 178-181: The shell commands use unsanitized comment.file_path when
building git commands in run_shell, which can be unsafe; update the calls that
build commands (the three run_shell invocations around git add, git commit, and
git push in respond.py) to quote the file path using
shlex.quote(comment.file_path) before interpolation so paths are safely escaped,
and ensure the quoted value is used in the commit message and git add invocation
while leaving git push unchanged.
In `@src/scripts/coverage_agent/tools/shell.py`:
- Around line 18-41: The shell injection risk is that callers pass unescaped
strings into run_shell(...) which uses shell=True; update all call sites in
respond.py to consistently escape shell-injected variables using shlex.quote():
replace the ad-hoc message.replace('"', '\\"') in _reply_to_comment with either
JSON-safe invocation (use gh api -f with a JSON payload) or wrap the message
argument with shlex.quote(message) when invoking the shell, and for git
operations wrap comment.file_path with shlex.quote(comment.file_path) (used in
the git add / git commit calls). Use the same pattern as test_runner.py
(shlex.quote(...)) so all strings passed into shell commands are properly
escaped.
---
Duplicate comments:
In `@src/scripts/coverage_agent/respond.py`:
- Around line 95-101: _reply_to_comment currently sanitizes only double quotes
and builds a shell string passed to run_shell, leaving it vulnerable to shell
metacharacters and command injection; change it to avoid shell interpolation by
calling the GitHub API with a safe subprocess call (or the existing run_shell
abstraction) that accepts an argument list or uses stdin/JSON payloads rather
than a formatted shell string: stop building f'... -f body="{escaped_message}"',
instead pass the body as a separate argument (or send JSON via stdin with
subprocess.run([...], input=body_bytes, check=True)) so the message content is
never executed by a shell; update references to _reply_to_comment and the call
site of run_shell to use the safe-argument approach.
---
Nitpick comments:
In `@src/scripts/coverage_agent/graph.py`:
- Around line 7-11: In the try/except fallback in graph.py, annotate the
fallback assignments for END and StateGraph so the type checker is satisfied:
either import typing.Any and declare END: Any = None and StateGraph: Any = None,
or append explicit assignment comments like "END = None # type: ignore" and
"StateGraph = None # type: ignore" inside the except ImportError block to
silence type errors when the import fails.
In `@src/scripts/coverage_agent/plugins/nemotron.py`:
- Around line 10-13: The fallback assignment for ChatNVIDIA lacks a type
annotation and can trigger mypy errors; update the except block to annotate the
fallback (e.g., change "ChatNVIDIA = None" to "ChatNVIDIA = None # type:
ignore" or "ChatNVIDIA: Any = None # type: ignore" and add an appropriate
import for Any if used) so the ChatNVIDIA symbol has a mypy-safe fallback.
In `@src/scripts/coverage_agent/prompts/go_test.py`:
- Line 31: The string assigned to variable prompt in
src/scripts/coverage_agent/prompts/go_test.py is using an unnecessary f-string
(no placeholders); remove the leading f so the assignment uses a plain string
literal (e.g., change prompt = f"Generate Go unit tests..." to prompt =
"Generate Go unit tests...") to satisfy the Ruff F541 warning and keep the rest
of the logic in the same function/module unchanged.
In `@src/scripts/coverage_agent/prompts/python_test.py`:
- Line 46: The string assigned to variable prompt in python_test.py is using an
unnecessary f-string with no placeholders; change the assignment of prompt to
use a regular string literal (remove the leading f) so it's a normal string
("Generate unit tests for the following Python source file.\n\n"), ensuring the
prompt variable remains unchanged otherwise.
In `@src/scripts/coverage_agent/prompts/ui_test.py`:
- Line 32: The prompt string assigned to the variable prompt in ui_test.py is
using an unnecessary f-string (f"Generate Vitest tests for the following
TypeScript source file.\n\n") even though there are no placeholders; remove the
extraneous f prefix and make it a normal string (e.g., "Generate Vitest tests
for the following TypeScript source file.\n\n") so the prompt assignment in that
module/function (prompt variable) uses a plain string literal.
In `@src/scripts/coverage_agent/tools/test_runner.py`:
- Around line 20-36: The function _file_path_to_bazel_target currently annotates
its return type as Optional[str] but always returns a string (either the
discovered target or the fallback), so change the return annotation to str; also
remove the Optional import from the module if it's no longer used elsewhere, and
leave the implementation of _file_path_to_bazel_target (package, basename,
run_shell call, fallback) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1449cf70-001d-4ee6-82f7-f7baebe2b92a
📒 Files selected for processing (37)
src/scripts/coverage_agent/__init__.pysrc/scripts/coverage_agent/graph.pysrc/scripts/coverage_agent/lcov_parser.pysrc/scripts/coverage_agent/main.pysrc/scripts/coverage_agent/nodes/__init__.pysrc/scripts/coverage_agent/nodes/analyze.pysrc/scripts/coverage_agent/nodes/create_pr.pysrc/scripts/coverage_agent/nodes/quality_gate.pysrc/scripts/coverage_agent/nodes/validate.pysrc/scripts/coverage_agent/nodes/write.pysrc/scripts/coverage_agent/plugins/__init__.pysrc/scripts/coverage_agent/plugins/base.pysrc/scripts/coverage_agent/plugins/claude_code.pysrc/scripts/coverage_agent/plugins/nemotron.pysrc/scripts/coverage_agent/plugins/openai_writer.pysrc/scripts/coverage_agent/prompts/__init__.pysrc/scripts/coverage_agent/prompts/go_test.pysrc/scripts/coverage_agent/prompts/python_test.pysrc/scripts/coverage_agent/prompts/quality_rules.pysrc/scripts/coverage_agent/prompts/ui_test.pysrc/scripts/coverage_agent/respond.pysrc/scripts/coverage_agent/state.pysrc/scripts/coverage_agent/tests/__init__.pysrc/scripts/coverage_agent/tests/test_analyze.pysrc/scripts/coverage_agent/tests/test_graph.pysrc/scripts/coverage_agent/tests/test_integration.pysrc/scripts/coverage_agent/tests/test_lcov_parser.pysrc/scripts/coverage_agent/tests/test_plugins_base.pysrc/scripts/coverage_agent/tests/test_quality_gate.pysrc/scripts/coverage_agent/tests/test_respond.pysrc/scripts/coverage_agent/tests/test_state.pysrc/scripts/coverage_agent/tests/test_tools.pysrc/scripts/coverage_agent/tools/__init__.pysrc/scripts/coverage_agent/tools/bazel_query.pysrc/scripts/coverage_agent/tools/file_ops.pysrc/scripts/coverage_agent/tools/shell.pysrc/scripts/coverage_agent/tools/test_runner.py
✅ Files skipped from review due to trivial changes (15)
- src/scripts/coverage_agent/tests/init.py
- src/scripts/coverage_agent/prompts/init.py
- src/scripts/coverage_agent/nodes/init.py
- src/scripts/coverage_agent/init.py
- src/scripts/coverage_agent/tools/init.py
- src/scripts/coverage_agent/tools/file_ops.py
- src/scripts/coverage_agent/tools/bazel_query.py
- src/scripts/coverage_agent/plugins/openai_writer.py
- src/scripts/coverage_agent/tests/test_analyze.py
- src/scripts/coverage_agent/prompts/quality_rules.py
- src/scripts/coverage_agent/tests/test_plugins_base.py
- src/scripts/coverage_agent/tests/test_state.py
- src/scripts/coverage_agent/tests/test_graph.py
- src/scripts/coverage_agent/state.py
- src/scripts/coverage_agent/plugins/base.py
🚧 Files skipped from review as they are similar to previous changes (10)
- src/scripts/coverage_agent/main.py
- src/scripts/coverage_agent/nodes/write.py
- src/scripts/coverage_agent/nodes/validate.py
- src/scripts/coverage_agent/tests/test_tools.py
- src/scripts/coverage_agent/lcov_parser.py
- src/scripts/coverage_agent/tests/test_quality_gate.py
- src/scripts/coverage_agent/tests/test_respond.py
- src/scripts/coverage_agent/tests/test_lcov_parser.py
- src/scripts/coverage_agent/tests/test_integration.py
- src/scripts/coverage_agent/nodes/analyze.py
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
src/scripts/coverage_agent/respond.py (1)
95-100:⚠️ Potential issue | 🔴 CriticalStop interpolating GitHub-controlled text into shell commands.
messageandcomment.file_pathboth come from the GitHub API, and this file already relies on shell parsing (2>/dev/null,|| true). Replacing"alone is not enough here; a crafted review body or path can still alter the executed command.Also applies to: 179-180
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/coverage_agent/respond.py` around lines 95 - 100, The code in _reply_to_comment (and the other comment-posting call around lines 179-180) dangerously interpolates GitHub-controlled strings into a shell f-string; stop building a single shell command with message or comment.file_path inserted. Instead invoke the GitHub API without shell interpolation by using subprocess.run (or subprocess.check_call) with a list of arguments (e.g., ["gh","api","repos/{owner}/{repo}/pulls/{pr_number}/comments/{comment_id}/replies","-f","body="+message]) or by passing the body via stdin/--input file (gh supports -f body=@- or reading from a temporary file) so that message and file_path are never parsed by the shell; update both _reply_to_comment and the other comment-posting site to construct the command as an argument list or feed the message via stdin to prevent command injection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/scripts/coverage_agent/plugins/claude_code.py`:
- Line 195: The parameter named `_build_package` should be renamed to
`build_package` in the function/method signature (e.g., the function that
currently declares `_build_package`) and then explicitly marked unused inside
the function body with a `del build_package` statement to satisfy pylint's
invalid-name rule; apply the same pattern for other underscore-prefixed
parameters in that signature (rename to no-leading-underscore and add
corresponding `del <name>` lines inside the function).
In `@src/scripts/coverage_agent/prompts/python_test.py`:
- Around line 47-59: The prompt builder inserts raw content into triple-backtick
fences, so any backticks inside source_content, existing_test_content, or
reference_test_content can close the fence and corrupt the prompt; add a helper
function _format_code_block(content: str, language: str = "python") that picks a
backtick fence long enough to not appear in content (e.g., start with "```" and
append backticks while the fence exists in content), then return the fenced
block with the language tag and content, and replace direct string interpolation
in build_python_prompt (references: build_python_prompt, source_content,
existing_test_content, reference_test_content) to use _format_code_block(...)
for each inserted block.
In `@src/scripts/coverage_agent/respond.py`:
- Around line 83-92: The mutating shell calls in _update_response_count (and the
similar block around the other label/edit operations) are treated as
fire-and-forget via run_shell, so failures are ignored; change them to check the
command result and abort/raise on the first non-zero exit so the bot doesn't
pretend a change landed. Modify run_shell usage (or update run_shell itself) to
return exit status/raise on failure, then in _update_response_count call the
commands sequentially (remove-label, label create, add-label) and if any returns
non-zero propagate an exception or return an error so the caller can stop
processing and log/surface the failure; also apply the same pattern to the other
mutating GH/GIT command sequence referenced in the diff. Ensure error messages
include the failing command and its output.
- Around line 121-122: The current quota check (using _get_response_count and
should_skip_comment) races because it is only read up front and only updated on
the final success path; to fix, implement an atomic "reserve slot" step that
increments the stored agent-responses count before any automated reply
(including error and quality-gate replies) or serialize per-PR operations; add a
new helper (e.g., reserve_response_slot(pr_number) or acquire_response_quota)
that attempts to increment/persist the counter and returns success/failure, call
it at the top of every reply path that currently reads
_get_response_count/should_skip_comment (including the branches referenced
around the current call and at the other spots you noted) and only proceed to
send any comment if the reservation succeeded; ensure the final success path
keeps the counter consistent (or releases on early failures) so concurrent jobs
cannot both proceed from a stale count.
- Around line 150-156: The call to writer.generate_test is passing reviewer
guidance via retry_context (retry_context=fix_prompt), which causes the writer
to treat it as prior-failure context; instead, remove fix_prompt from
retry_context for the initial generation (set retry_context to None/empty) and
pass the reviewer guidance through a dedicated field (e.g.,
reviewer_instructions, initial_prompt, or review_guidance) supported by
writer.generate_test; update the call site that constructs generated to use the
new parameter and ensure retry_context is only used for true retry/failure
state.
---
Duplicate comments:
In `@src/scripts/coverage_agent/respond.py`:
- Around line 95-100: The code in _reply_to_comment (and the other
comment-posting call around lines 179-180) dangerously interpolates
GitHub-controlled strings into a shell f-string; stop building a single shell
command with message or comment.file_path inserted. Instead invoke the GitHub
API without shell interpolation by using subprocess.run (or
subprocess.check_call) with a list of arguments (e.g.,
["gh","api","repos/{owner}/{repo}/pulls/{pr_number}/comments/{comment_id}/replies","-f","body="+message])
or by passing the body via stdin/--input file (gh supports -f body=@- or reading
from a temporary file) so that message and file_path are never parsed by the
shell; update both _reply_to_comment and the other comment-posting site to
construct the command as an argument list or feed the message via stdin to
prevent command injection.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d04f7157-cc65-43ae-ab7e-459c0adccae2
📒 Files selected for processing (7)
src/scripts/coverage_agent/plugins/base.pysrc/scripts/coverage_agent/plugins/claude_code.pysrc/scripts/coverage_agent/plugins/openai_writer.pysrc/scripts/coverage_agent/prompts/go_test.pysrc/scripts/coverage_agent/prompts/python_test.pysrc/scripts/coverage_agent/prompts/ui_test.pysrc/scripts/coverage_agent/respond.py
✅ Files skipped from review due to trivial changes (2)
- src/scripts/coverage_agent/prompts/ui_test.py
- src/scripts/coverage_agent/plugins/base.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/scripts/coverage_agent/plugins/openai_writer.py
- src/scripts/coverage_agent/prompts/go_test.py
…t escaping Fixes 7 valid findings from CodeRabbit review on PR #743: Shell injection hardening (create_pr.py, respond.py): - All file paths now wrapped with shlex.quote() in git/gh commands - Commit messages written to temp files, passed via git commit -F (avoids shell metacharacter issues with inline -m) - PR body written to temp file, passed via gh pr create --body-file - Reply message written to temp file, passed via gh api -F body=@file - Temp files cleaned up in finally blocks Prompt fence escaping (prompts/): - Add escape_fenced_content() in prompts/__init__.py that inserts zero-width space in ``` sequences to prevent source code with triple backticks from closing the prompt's markdown fences - Applied to all three prompt builders (python, go, ui) Documentation: - Add coverage_agent to AGENTS.md Scripts section per project convention: "If adding a service, module, or major component, update the Codebase Structure section" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t escaping Fixes 7 valid findings from CodeRabbit review on PR #743: Shell injection hardening (create_pr.py, respond.py): - All file paths now wrapped with shlex.quote() in git/gh commands - Commit messages written to temp files, passed via git commit -F (avoids shell metacharacter issues with inline -m) - PR body written to temp file, passed via gh pr create --body-file - Reply message written to temp file, passed via gh api -F body=@file - Temp files cleaned up in finally blocks Prompt fence escaping (prompts/): - Add escape_fenced_content() in prompts/__init__.py that inserts zero-width space in ``` sequences to prevent source code with triple backticks from closing the prompt's markdown fences - Applied to all three prompt builders (python, go, ui) Documentation: - Add coverage_agent to AGENTS.md Scripts section per project convention: "If adding a service, module, or major component, update the Codebase Structure section" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dad89f5 to
b4e3b3f
Compare
47b7f09 to
2fe512f
Compare
bcf2fb6 to
508af79
Compare
Testbot analyzes coverage gaps, generates tests via LLM, validates them with bazel, and opens PRs for human review. Pipeline: analyze → write → validate → review → create_pr - Analyze: fetches coverage from Codecov API, selects low-coverage targets, caps uncovered lines per target (default 200) - Write: calls LLM to generate tests with quality rules in the prompt - Validate: syntax check then bazel test (Python/Go) or vitest (UI) - Review: LLM-based quality review (PASS/FAIL with feedback) - Create PR: branches from current HEAD, targets main Respond module: batch-processes PR review comments per file - Fetches review threads via GraphQL (includes resolution status) - Groups unresolved threads by file, one LLM call per file - Includes source file context for informed responses - LLM returns JSON with fix, replies, and resolve verdict per comment - Validates fix with retries, commits+pushes if tests pass - Resolves threads the LLM marks as addressed Key design decisions: - Codecov API for coverage data (seconds vs 15+ min bazel coverage) - LLM review as sole quality gate (no brittle regex static checks) - Retry loop: validation/review failures feed back to LLM (max 3) - BUILD entries auto-generated from existing deps (not LLM-generated) Plugin architecture: - LLMProvider (ABC) with LLMClient implementation - Two providers: claude (default), nemotron (via NVIDIA inference API) - Single env var: LLM_API_KEY Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
testbot.yaml: - Triggers: weekday schedule (6AM UTC), manual dispatch - Fetches coverage from Codecov API (no bazel coverage needed) - Sets up Node.js/pnpm for UI test validation, Bazel for Python/Go - Creates PR against main with ai-generated label - Uses SVC_OSMO_CI_TOKEN so created PRs trigger CI testbot-respond.yaml: - Triggers on pull_request_review_comment for ai-generated PRs - Sets up pnpm + Bazel for test validation - Batch processes all unresolved threads grouped by file - Resolves threads after LLM addresses them Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@coderabbitai resolve |
✅ Actions performedComments resolved. Approval is disabled; enable |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 18
♻️ Duplicate comments (4)
src/scripts/testbot/tools/bazel_query.py (1)
8-10:⚠️ Potential issue | 🔴 CriticalValidate
packagebefore building the shell command.
packageis directly interpolated into a shell command. Without strict validation, crafted input can inject additional commands.Suggested patch
+import re + from testbot.tools.shell import run_shell +_VALID_BAZEL_PACKAGE_PATTERN = re.compile(r"^//[A-Za-z0-9/_\-\.:]+$") + def list_build_targets(package: str) -> str: """List all build targets in a Bazel package.""" + if not _VALID_BAZEL_PACKAGE_PATTERN.fullmatch(package): + return f"Error: Invalid Bazel package: {package}" result = run_shell(f"bazel query 'kind(rule, {package}/...)'") if result.returncode != 0: return f"Error querying {package}: {result.stderr}" return result.stdout🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/tools/bazel_query.py` around lines 8 - 10, The function list_build_targets interpolates package into a shell command, allowing command injection; update list_build_targets to validate and escape package before calling run_shell: first reject any package that doesn't match a safe pattern (e.g. regex like r'^[A-Za-z0-9_\-./]+$' or whatever Bazel package grammar you accept) in list_build_targets, then use shlex.quote(package) (or pass an argv-style command to a subprocess wrapper if run_shell supports it) when constructing the bazel query string; reference function name list_build_targets and the call to run_shell to locate and fix the interpolation.src/scripts/testbot/respond.py (3)
220-223:⚠️ Potential issue | 🟡 MinorFragile error detection:
startswith("Error")could match legitimate content.This pattern was flagged in a previous review and remains unaddressed. If a file legitimately starts with "Error" (e.g., error-handling module docstring), this check incorrectly treats it as a failure. The
read_filefunction should signal errors explicitly (e.g., raise an exception or return a result type).🔧 Suggested approach
Have
read_fileraiseFileNotFoundErroror a custom exception on failure, then use try/except:- file_content = read_file(file_path) - if file_content.startswith("Error"): - logger.error("Cannot read %s: %s", file_path, file_content) - return 0 + try: + file_content = read_file(file_path) + except FileNotFoundError as exc: + logger.error("Cannot read %s: %s", file_path, exc) + return 0This requires updating
read_fileto raise instead of returning an error string.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/respond.py` around lines 220 - 223, The current fragile check using file_content.startswith("Error") should be replaced by making read_file raise a clear exception on failure (e.g., FileNotFoundError or a custom ReadFileError) and updating the caller in respond.py to wrap the call in try/except; call read_file(file_path) inside a try block, catch the specific exception(s), log the error via logger.error including the caught exception details, and return 0 on exception—remove the startswith check and any reliance on special error strings.
233-236:⚠️ Potential issue | 🟡 MinorSame fragile error check repeated for source content.
The
startswith("Error")pattern is repeated here for source file reading. Apply the same fix as above.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/respond.py` around lines 233 - 236, Replace the fragile startswith("Error") check on source_content with the same robust pattern used elsewhere: have read_file(signal) return None on failure (or raise) and update this caller to check for None (or catch the exception) instead of using source_content.startswith("Error"); update the block that references source_path, source_content and read_file to treat a None return as a failed read (source_content = None) or catch and log the exception accordingly so you no longer rely on the "Error" string sentinel.
368-374:⚠️ Potential issue | 🟠 MajorThe MAX_AUTO_RESPONSES quota check has a race condition.
The quota is checked once per file iteration, but concurrent workflow runs on the same PR can both proceed from the same count. This was noted in a previous review. Consider implementing an atomic reservation mechanism or serializing per-PR operations to prevent exceeding the quota.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/respond.py` around lines 368 - 374, The loop using total_replied and MAX_AUTO_RESPONSES has a race condition; replace the non-atomic check with an atomic reservation or per-PR serialization so concurrent runs cannot exceed the quota. Implement a reservation API (e.g., reserve_responses(pr_number, count=1) backed by a small persistent/atomic store or a per-PR lock) and call it before processing each file/thread batch in the loop that invokes _process_file_threads(file_path, file_threads, provider); only proceed if the reservation grants space and decrement/track based on the granted amount, updating total_replied from the actual reserved count, or alternatively acquire a per-PR mutex around the entire by_file iteration to serialize operations for pr_number. Ensure MAX_AUTO_RESPONSES, total_replied, _process_file_threads, by_file and pr_number are used consistently with the new reservation/lock mechanism.
🧹 Nitpick comments (7)
src/scripts/testbot/prompts/quality_rules.py (1)
5-5: Add an explicit type annotation for the exported preamble constant.This improves readability and static checking for downstream prompt/template consumers.
♻️ Suggested change
-QUALITY_RULES_PREAMBLE = """\ +QUALITY_RULES_PREAMBLE: str = """\As per coding guidelines, "Add type annotations where they improve code clarity and catch errors in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/prompts/quality_rules.py` at line 5, The exported constant QUALITY_RULES_PREAMBLE lacks an explicit type annotation; add a clear type hint (e.g., declare QUALITY_RULES_PREAMBLE as type str) to the constant definition so static checkers and readers know it's a string, updating the declaration of QUALITY_RULES_PREAMBLE in src/scripts/testbot/prompts/quality_rules.py accordingly.src/scripts/testbot/tests/test_create_pr.py (1)
11-17: Type-annotate_make_stateto make the test fixture contract explicit.This helper is reused across multiple tests and benefits from a typed return.
♻️ Suggested change
+from typing import Sequence + from testbot.nodes.create_pr import build_pr_body, build_pr_title -from testbot.state import TestTarget +from testbot.state import TestTarget, TestbotState @@ -def _make_state(targets, generated_files): +def _make_state(targets: Sequence[TestTarget], generated_files: Sequence[str]) -> TestbotState: """Create minimal state for PR building."""As per coding guidelines, "Add type annotations where they improve code clarity and catch errors in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/tests/test_create_pr.py` around lines 11 - 17, The helper function _make_state lacks type annotations; add parameter and return type hints to make the fixture contract explicit — annotate targets and generated_files (e.g., List[str] or Iterable[str] as appropriate) and the return type as a mapping (e.g., Dict[str, Any] or a specific TypedDict/PRState dataclass) so callers and static checkers know the expected shape; update the signature of _make_state and import any typing symbols (List, Dict, Any, TypedDict) needed to reflect that the returned dict contains "targets", "generated_files", and "provider".src/scripts/testbot/plugins/llm_client.py (1)
247-260: Edge case: empty lines treated as code.
_looks_like_codereturnsTruefor empty lines (stripped == ""), which means if the response starts with blank lines,_find_code_startwill return 0 and skip no prose. While likely benign, this could retain prose if the LLM starts with blank lines followed by explanatory text.♻️ Consider starting search from first non-empty line
def _find_code_start(lines: list[str]) -> int: """Find the line index where actual code begins after prose preamble. Skips blank lines between prose and code — requires a non-empty code line. """ for i, line in enumerate(lines): - if line.strip() and _looks_like_code(line): + stripped = line.strip() + if stripped and _looks_like_code(line): return i return 0And adjust
_looks_like_codeto not returnTruefor empty:def _looks_like_code(line: str) -> bool: """Check if a line looks like the start of source code.""" stripped = line.strip() + if not stripped: + return False return ( stripped.startswith("#") or stripped.startswith("//") or stripped.startswith("import ") or stripped.startswith("from ") or stripped.startswith("package ") or stripped.startswith("def ") or stripped.startswith("class ") or stripped.startswith("func ") - or stripped == "" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/plugins/llm_client.py` around lines 247 - 260, The helper _looks_like_code currently treats empty lines as code (stripped == ""), causing _find_code_start to misidentify code start when responses begin with blank lines; update _looks_like_code to remove the empty-line check (do not return True for stripped == "") and modify _find_code_start to begin its search from the first non-empty line (skip leading blank lines) before calling _looks_like_code on subsequent lines so prose preceded by blank lines isn't mistaken for code.src/scripts/testbot/graph.py (1)
119-123: Consider definingdone_checkbefore node registration for consistency.The
done_checkfunction is defined in the middle of graph construction (after adding edges forwrite_test). While functional, defining it alongside other transition helpers at lines 90-104 or before the graph construction block would improve readability.♻️ Suggested refactor
+ def done_check(state: TestbotState) -> TestbotState: + """No-op node — just a routing point for done/abort decisions.""" + return state + graph = StateGraph(TestbotState) graph.add_node("analyze", analyze_coverage) graph.add_node("write_test", write_test) graph.add_node("validate", validate_with_transition) graph.add_node("review", review_with_transition) graph.add_node("create_pr", create_pr) + graph.add_node("done_check", done_check) graph.set_entry_point("analyze") graph.add_conditional_edges("analyze", route_analyze, { "write_test": "write_test", "done": END, }) graph.add_edge("write_test", "validate") - def done_check(state: TestbotState) -> TestbotState: - """No-op node — just a routing point for done/abort decisions.""" - return state - - graph.add_node("done_check", done_check)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/graph.py` around lines 119 - 123, The done_check no-op routing function is defined immediately before graph.add_node("done_check") which breaks consistency with other transition helpers; move the def done_check(state: TestbotState) -> TestbotState: ... implementation to the same helper section where other transition helper functions are declared (near the other helper defs) and then keep the existing graph.add_node("done_check", done_check) registration in the graph construction block so the function definitions are grouped and registrations remain in the graph build.src/scripts/testbot/nodes/create_pr.py (1)
90-93: Long PR titles may occur with many targets.If
state["targets"]contains many files, the title could become excessively long. Consider truncating with an ellipsis or count.♻️ Suggested truncation
def build_pr_title(state: TestbotState) -> str: """Build PR title from target file paths.""" target_files = [t.file_path for t in state["targets"]] - return f"[testbot] Add tests for {', '.join(target_files)}" + if len(target_files) <= 2: + return f"[testbot] Add tests for {', '.join(target_files)}" + return f"[testbot] Add tests for {target_files[0]} and {len(target_files) - 1} more"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/nodes/create_pr.py` around lines 90 - 93, build_pr_title currently joins all file paths from state["targets"] which can create excessively long PR titles; update build_pr_title to limit items included (e.g., take first 3 file names), then append an ellipsis or a suffix like " (+N more)" when there are additional targets, and ensure the final string remains within a reasonable length; locate the build_pr_title function and change the join logic to slice target_files and compute remaining_count to append the truncated indicator.src/scripts/testbot/nodes/write.py (1)
69-96: Consider atomic file operations for BUILD updates.The function reads the BUILD file (lines 86-87), then appends in a separate operation (lines 94-95). If the process is interrupted between read and write, or if concurrent runs target the same BUILD file, inconsistencies could occur.
♻️ Suggested atomic approach
with open(build_path, encoding="utf-8") as build_file: build_content = build_file.read() if test_basename in build_content: logger.info("BUILD already contains entry for %s", test_basename) return entry = _generate_default_build_entry(test_name, test_basename, source_path, build_content) logger.info("BUILD entry to append for %s:\n%s", test_name, entry) - with open(build_path, "a", encoding="utf-8") as build_file: - build_file.write("\n" + entry + "\n") + # Write atomically to avoid partial writes + new_content = build_content.rstrip() + "\n\n" + entry + "\n" + with open(build_path, "w", encoding="utf-8") as build_file: + build_file.write(new_content) logger.info("Appended BUILD entry for %s to %s", test_name, build_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/nodes/write.py` around lines 69 - 96, The _apply_build_entry function currently reads BUILD into build_content and later appends, which risks race conditions or partial updates; change it to perform an atomic update: after computing entry with _generate_default_build_entry (using test_name, test_basename, source_path, build_content), open a temporary file in the same directory (e.g., BUILD.temp or use tempfile.NamedTemporaryFile with dir=os.path.dirname(build_path)), write the original build_content plus the new entry into the temp file, flush and fsync, then atomically replace the original BUILD with os.replace(temp_path, build_path); ensure proper cleanup on error and consider using a file lock (e.g., fcntl or portalocker) around the read/generate/write/replace sequence to prevent concurrent writers.src/scripts/testbot/respond.py (1)
193-208: JSON parsing could be more robust.The slice
text[text.find("{"):text.rfind("}") + 1]can produce malformed JSON if there are nested braces or if{appears in a string value before the actual JSON object. Consider using a more robust extraction pattern.♻️ More robust JSON extraction
def parse_llm_json_response(response_text: str) -> dict: """Parse JSON LLM response. Returns dict with 'fix' and 'replies' keys.""" text = response_text.strip() - for attempt_text in [text, text[text.find("{"):text.rfind("}") + 1] if "{" in text else ""]: + # Try parsing the full text first + try: + data = json.loads(text) + if isinstance(data, dict): + return data + except (json.JSONDecodeError, TypeError): + pass + + # Try to extract JSON by finding balanced braces + if "{" in text: + start = text.find("{") + depth = 0 + for i, char in enumerate(text[start:], start): + if char == "{": + depth += 1 + elif char == "}": + depth -= 1 + if depth == 0: + try: + data = json.loads(text[start:i + 1]) + if isinstance(data, dict): + return data + except (json.JSONDecodeError, TypeError): + pass + break + + logger.warning("Failed to parse LLM JSON response") + return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/respond.py` around lines 193 - 208, The current parse_llm_json_response() uses text[text.find("{"):text.rfind("}")+1] which breaks on nested braces or braces inside strings; replace that naive slice with a robust extractor that scans from the first '{' and finds its matching closing '}' using a brace-counter that ignores braces inside quoted strings and handles escaped quotes, then json.loads that balanced substring (falling back to trying the whole text), returning the parsed dict if successful; update references in parse_llm_json_response to use this balanced-brace extraction routine before attempting json.loads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 36-46: The checkout step that uses ref:
github.event.pull_request.head.ref ("Checkout PR branch") is pulling and running
untrusted PR head code while exposing secrets SVC_OSMO_CI_TOKEN and
NVIDIA_NIM_KEY; change the workflow to avoid checking out or executing untrusted
PR code with repository secrets by gating that step: replace using the PR head
ref with a safe source (e.g., use the merge commit or github.sha /
pull_request.target ref) or only use the service secrets when a trusted
label/actor is present, and for untrusted PRs use the default GITHUB_TOKEN
without extra secrets and set persist-credentials: false; ensure both
occurrences (the checkout step at the top and the similar one later) are updated
accordingly.
In @.github/workflows/testbot.yaml:
- Around line 57-63: The workflow currently checks out the repo with a
credentialed checkout (actions/checkout@...) and exposes service tokens (token:
${{ secrets.SVC_OSMO_CI_TOKEN }}) to the same job that runs LLM-generated test
validation; change the flow so the validation runs in an unprivileged
job/container by setting persist-credentials: false on the checkout used for
test validation (or use a separate job that performs the checkout with
persist-credentials: false) and remove any secret/token injection (LLM_API_KEY,
CODECOV_TOKEN, GH_TOKEN, SVC_OSMO_CI_TOKEN) from that job; then add a separate,
later job/step that performs a credentialed checkout or uses token: ${{
secrets.SVC_OSMO_CI_TOKEN }} only when creating the PR or pushing results (the
PR-creation step), ensuring secrets are injected only into that write-only job.
In `@src/scripts/testbot/BUILD`:
- Around line 11-12: Update the stale comment referencing
src/agent_requirements.txt to point to the new requirements file added by this
PR (src/scripts/testbot/requirements.txt); locate the comment block that
mentions "langchain/langgraph deps are installed via pip in CI (not
Bazel-managed for v1)." and replace the path so it accurately documents where
those deps are declared, keeping the rest of the comment intact.
In `@src/scripts/testbot/main.py`:
- Around line 24-35: The CLI numeric flags defined via parser.add_argument
(--max-targets, --max-lines, --max-retries) currently allow invalid values; add
explicit validation after args = parser.parse_args() to enforce sensible bounds
(e.g., args.max_targets > 0, args.max_lines > 0, args.max_retries >= 0) and call
parser.error(...) with a clear message if any check fails; alternatively,
enforce constraints directly in the add_argument calls (use a custom
type/validator or choices) so that parser rejects negatives/zeros for
parser.add_argument("--max-targets"), parser.add_argument("--max-lines"), and
parser.add_argument("--max-retries") before proceeding.
In `@src/scripts/testbot/nodes/validate.py`:
- Around line 42-44: The call to llm.validate_test (from
get_llm(state["provider"])) can throw and currently will abort the node; wrap
the llm.validate_test(last_generated) invocation in a try/except that catches
all exceptions, log the exception, and convert it into a standardized failed
validation result (including the error message and a flag/field indicating it
should be retried) instead of letting the exception propagate; ensure the
returned/assigned variable used by this node follows the same validation-result
shape the pipeline expects so downstream retry handling can operate (refer to
get_llm, llm.validate_test, state["provider"], and last_generated to locate and
update the code).
In `@src/scripts/testbot/nodes/write.py`:
- Line 15: The current _DEPS_RE only captures the first quoted dep; update it to
capture the entire deps bracket (e.g. match from "deps" through the closing ]
with a multiline/ DOTALL-aware pattern) and then extract individual entries by
running a second re.findall for quoted strings (or split on commas) on the
captured block; modify uses of _DEPS_RE in this file (e.g., where deps are
parsed in write.py) to iterate over all extracted dep strings instead of taking
a single group.
In `@src/scripts/testbot/plugins/__init__.py`:
- Around line 25-34: The get_llm function can raise KeyError because
_register_defaults() is never invoked; update get_llm to call
_register_defaults() at the start (before checking PLUGINS) so default providers
are registered, then proceed to check PLUGINS and use _instances as before; make
the call idempotent (since _register_defaults likely already guards against
re-registration). Apply the same change to the other provider-retrieval function
in this module (the one around the other provider-check block referencing
PLUGINS/_instances) so both entry points initialize defaults before validating
providers.
In `@src/scripts/testbot/plugins/llm_client.py`:
- Around line 134-140: Add a defensive guard around accessing response.choices
in llm_client.py: check that response and response.choices exist and that
len(response.choices) > 0 before using response.choices[0]; if empty, log a
clear message via logger (including any available response metadata), set
content/finish_reason to safe defaults (e.g., "" and None) or return/raise a
well-defined error, and ensure downstream code that uses content/finish_reason
handles this case; update the block that assigns choice, content, and
finish_reason to use this guard and handle the empty-choices path gracefully.
- Around line 92-93: The if-check that raises RuntimeError("OpenAI client not
available. pip install openai") is unreachable because self.client is
unconditionally set in __init__ (OpenAI(...) at initialization); remove this
dead branch from the method containing that check (the runtime validation
referencing self.client) so the code no longer contains the impossible if
self.client is None path — or if deferred initialization was intended, replace
with a proper lazy-init pattern, but by default delete the unreachable check and
its raise to clean up the code.
In `@src/scripts/testbot/prompts/respond.py`:
- Around line 33-49: Replace the untyped list[dict] review payload with a
dataclass to improve type-safety: define a ReviewComment dataclass (fields
comment_id, author, line, body) and import it into this module, change the
function signature from comments: list[dict] to comments: list[ReviewComment]
(or List[ReviewComment]) and update the comprehension that builds comments_text
to access attributes (e.g., c.comment_id, c.author, c.line, c.body) instead of
dictionary keys; also update the docstring to describe the ReviewComment type
and add the necessary dataclasses/typing imports.
In `@src/scripts/testbot/prompts/review.py`:
- Around line 49-55: The prompt builds a markdown-wrapped string using
source_content and test_content raw, which can break outer fences if those
strings contain ```; modify the return in review.py to first produce
escaped_source_content and escaped_test_content (e.g., replace all occurrences
of "```" with "`\\`\\`\\`" or otherwise escape triple backticks/newlines) and
then use those escaped variables in the f-string in place of source_content and
test_content so the outer ``` fences are preserved; update the return expression
to reference escaped_source_content and escaped_test_content and ensure escaping
logic runs before the f-string is constructed.
In `@src/scripts/testbot/README.md`:
- Around line 9-13: Add language identifiers to the three unlabeled fenced code
blocks in the README so markdownlint MD040 is satisfied: change the fences for
the block containing "Codecov API → analyze → write → validate → review →
create_pr" to use ```text, change the fence for the block that begins "PR
comment → GraphQL fetch threads → filter unresolved" to ```text, and change the
fence for the block showing the "src/scripts/testbot/ ├── main.py ... └──
tests/" tree to ```text; ensure each opening ``` has the corresponding ```text
label and leave block contents unchanged.
In `@src/scripts/testbot/respond.py`:
- Around line 56-61: The helper _get_repo_nwo currently swallows failures from
run_shell("gh repo view...") and returns the hardcoded "NVIDIA/OSMO", which can
cause actions against the wrong repo; update _get_repo_nwo to surface failures
instead of silently defaulting: when result.returncode != 0 or stdout is empty,
either log a warning with the captured stdout/stderr and the failing command
(use the module's logger if available) and then raise a RuntimeError (or a
custom exception) explaining the gh CLI failure and suggesting
authentication/repo context, or at minimum log the full error and re-raise;
reference _get_repo_nwo and run_shell to locate and change the return/default
behavior.
In `@src/scripts/testbot/tests/test_integration.py`:
- Around line 5-14: The tests call review_test() which calls
testbot.nodes.review.get_llm() and can hit real LLM services; instead stub
get_llm to a deterministic fake LLM client in these tests. Use
unittest.mock.patch to patch testbot.nodes.review.get_llm (or patch.object on
the review module) in the test module setup or per-test, returning a lightweight
fake object implementing the same interface used by review_test (e.g., a
generate/respond method that returns fixed outputs), apply the same patch for
the other tests around lines 91-117, and ensure the fake is used before calling
review_test() so no real network/credentials are accessed.
In `@src/scripts/testbot/tests/test_tools.py`:
- Around line 94-96: The test currently swallows PermissionError in the
kill-check except block (the "except PermissionError: pass" after the kill
signal attempt), which can mask a still-running process; remove that bare pass
so PermissionError is not treated as success — either let PermissionError
propagate (so the test fails) or include PermissionError in the outer except
tuple (e.g., except (FileNotFoundError, ValueError, PermissionError):) and
handle it as a failure case; update the exception handling around the kill/send
signal logic (the except blocks shown in test_tools.py) accordingly.
In `@src/scripts/testbot/tools/file_ops.py`:
- Around line 22-25: The write_file function fails when path is filename-only
and returns an incorrect byte count; fix it in write_file by computing dir =
os.path.dirname(path) and only calling os.makedirs(dir, exist_ok=True) if dir is
non-empty (or use dir or "."), then write the content as before but report the
actual number of bytes by computing byte_count = len(content.encode("utf-8"))
(or open in binary and write content.encode("utf-8") and use the written byte
length); update the return message to use byte_count and reference the
write_file function and the variables path and content so the change is easy to
locate.
In `@src/scripts/testbot/tools/test_runner.py`:
- Around line 125-129: The retry_hint currently uses only result.stderr which
omits important Vitest details printed to stdout; update the return value in
test_runner.py (the ValidationResult construction) so that retry_hint uses the
combined output (result.stdout + result.stderr) when result.returncode != 0,
while keeping passed as result.returncode == 0 and output as the combined
output; locate the ValidationResult instantiation in the function that returns
it and replace the retry_hint expression to reference the same combined output
string.
---
Duplicate comments:
In `@src/scripts/testbot/respond.py`:
- Around line 220-223: The current fragile check using
file_content.startswith("Error") should be replaced by making read_file raise a
clear exception on failure (e.g., FileNotFoundError or a custom ReadFileError)
and updating the caller in respond.py to wrap the call in try/except; call
read_file(file_path) inside a try block, catch the specific exception(s), log
the error via logger.error including the caught exception details, and return 0
on exception—remove the startswith check and any reliance on special error
strings.
- Around line 233-236: Replace the fragile startswith("Error") check on
source_content with the same robust pattern used elsewhere: have
read_file(signal) return None on failure (or raise) and update this caller to
check for None (or catch the exception) instead of using
source_content.startswith("Error"); update the block that references
source_path, source_content and read_file to treat a None return as a failed
read (source_content = None) or catch and log the exception accordingly so you
no longer rely on the "Error" string sentinel.
- Around line 368-374: The loop using total_replied and MAX_AUTO_RESPONSES has a
race condition; replace the non-atomic check with an atomic reservation or
per-PR serialization so concurrent runs cannot exceed the quota. Implement a
reservation API (e.g., reserve_responses(pr_number, count=1) backed by a small
persistent/atomic store or a per-PR lock) and call it before processing each
file/thread batch in the loop that invokes _process_file_threads(file_path,
file_threads, provider); only proceed if the reservation grants space and
decrement/track based on the granted amount, updating total_replied from the
actual reserved count, or alternatively acquire a per-PR mutex around the entire
by_file iteration to serialize operations for pr_number. Ensure
MAX_AUTO_RESPONSES, total_replied, _process_file_threads, by_file and pr_number
are used consistently with the new reservation/lock mechanism.
In `@src/scripts/testbot/tools/bazel_query.py`:
- Around line 8-10: The function list_build_targets interpolates package into a
shell command, allowing command injection; update list_build_targets to validate
and escape package before calling run_shell: first reject any package that
doesn't match a safe pattern (e.g. regex like r'^[A-Za-z0-9_\-./]+$' or whatever
Bazel package grammar you accept) in list_build_targets, then use
shlex.quote(package) (or pass an argv-style command to a subprocess wrapper if
run_shell supports it) when constructing the bazel query string; reference
function name list_build_targets and the call to run_shell to locate and fix the
interpolation.
---
Nitpick comments:
In `@src/scripts/testbot/graph.py`:
- Around line 119-123: The done_check no-op routing function is defined
immediately before graph.add_node("done_check") which breaks consistency with
other transition helpers; move the def done_check(state: TestbotState) ->
TestbotState: ... implementation to the same helper section where other
transition helper functions are declared (near the other helper defs) and then
keep the existing graph.add_node("done_check", done_check) registration in the
graph construction block so the function definitions are grouped and
registrations remain in the graph build.
In `@src/scripts/testbot/nodes/create_pr.py`:
- Around line 90-93: build_pr_title currently joins all file paths from
state["targets"] which can create excessively long PR titles; update
build_pr_title to limit items included (e.g., take first 3 file names), then
append an ellipsis or a suffix like " (+N more)" when there are additional
targets, and ensure the final string remains within a reasonable length; locate
the build_pr_title function and change the join logic to slice target_files and
compute remaining_count to append the truncated indicator.
In `@src/scripts/testbot/nodes/write.py`:
- Around line 69-96: The _apply_build_entry function currently reads BUILD into
build_content and later appends, which risks race conditions or partial updates;
change it to perform an atomic update: after computing entry with
_generate_default_build_entry (using test_name, test_basename, source_path,
build_content), open a temporary file in the same directory (e.g., BUILD.temp or
use tempfile.NamedTemporaryFile with dir=os.path.dirname(build_path)), write the
original build_content plus the new entry into the temp file, flush and fsync,
then atomically replace the original BUILD with os.replace(temp_path,
build_path); ensure proper cleanup on error and consider using a file lock
(e.g., fcntl or portalocker) around the read/generate/write/replace sequence to
prevent concurrent writers.
In `@src/scripts/testbot/plugins/llm_client.py`:
- Around line 247-260: The helper _looks_like_code currently treats empty lines
as code (stripped == ""), causing _find_code_start to misidentify code start
when responses begin with blank lines; update _looks_like_code to remove the
empty-line check (do not return True for stripped == "") and modify
_find_code_start to begin its search from the first non-empty line (skip leading
blank lines) before calling _looks_like_code on subsequent lines so prose
preceded by blank lines isn't mistaken for code.
In `@src/scripts/testbot/prompts/quality_rules.py`:
- Line 5: The exported constant QUALITY_RULES_PREAMBLE lacks an explicit type
annotation; add a clear type hint (e.g., declare QUALITY_RULES_PREAMBLE as type
str) to the constant definition so static checkers and readers know it's a
string, updating the declaration of QUALITY_RULES_PREAMBLE in
src/scripts/testbot/prompts/quality_rules.py accordingly.
In `@src/scripts/testbot/respond.py`:
- Around line 193-208: The current parse_llm_json_response() uses
text[text.find("{"):text.rfind("}")+1] which breaks on nested braces or braces
inside strings; replace that naive slice with a robust extractor that scans from
the first '{' and finds its matching closing '}' using a brace-counter that
ignores braces inside quoted strings and handles escaped quotes, then json.loads
that balanced substring (falling back to trying the whole text), returning the
parsed dict if successful; update references in parse_llm_json_response to use
this balanced-brace extraction routine before attempting json.loads.
In `@src/scripts/testbot/tests/test_create_pr.py`:
- Around line 11-17: The helper function _make_state lacks type annotations; add
parameter and return type hints to make the fixture contract explicit — annotate
targets and generated_files (e.g., List[str] or Iterable[str] as appropriate)
and the return type as a mapping (e.g., Dict[str, Any] or a specific
TypedDict/PRState dataclass) so callers and static checkers know the expected
shape; update the signature of _make_state and import any typing symbols (List,
Dict, Any, TypedDict) needed to reflect that the returned dict contains
"targets", "generated_files", and "provider".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 647c4d70-c896-4bf0-a289-69a591406c76
📒 Files selected for processing (47)
.github/workflows/testbot-respond.yaml.github/workflows/testbot.yamlAGENTS.mdsrc/scripts/testbot/BUILDsrc/scripts/testbot/README.mdsrc/scripts/testbot/__init__.pysrc/scripts/testbot/codecov_client.pysrc/scripts/testbot/graph.pysrc/scripts/testbot/lcov_parser.pysrc/scripts/testbot/main.pysrc/scripts/testbot/nodes/__init__.pysrc/scripts/testbot/nodes/analyze.pysrc/scripts/testbot/nodes/create_pr.pysrc/scripts/testbot/nodes/review.pysrc/scripts/testbot/nodes/validate.pysrc/scripts/testbot/nodes/write.pysrc/scripts/testbot/plugins/__init__.pysrc/scripts/testbot/plugins/base.pysrc/scripts/testbot/plugins/llm_client.pysrc/scripts/testbot/prompts/__init__.pysrc/scripts/testbot/prompts/go_test.pysrc/scripts/testbot/prompts/python_test.pysrc/scripts/testbot/prompts/quality_rules.pysrc/scripts/testbot/prompts/respond.pysrc/scripts/testbot/prompts/review.pysrc/scripts/testbot/prompts/ui_test.pysrc/scripts/testbot/requirements.txtsrc/scripts/testbot/respond.pysrc/scripts/testbot/state.pysrc/scripts/testbot/tests/__init__.pysrc/scripts/testbot/tests/test_analyze.pysrc/scripts/testbot/tests/test_codecov_client.pysrc/scripts/testbot/tests/test_create_pr.pysrc/scripts/testbot/tests/test_graph.pysrc/scripts/testbot/tests/test_integration.pysrc/scripts/testbot/tests/test_lcov_parser.pysrc/scripts/testbot/tests/test_llm_client.pysrc/scripts/testbot/tests/test_plugins_base.pysrc/scripts/testbot/tests/test_respond.pysrc/scripts/testbot/tests/test_state.pysrc/scripts/testbot/tests/test_tools.pysrc/scripts/testbot/tests/test_write_build_entry.pysrc/scripts/testbot/tools/__init__.pysrc/scripts/testbot/tools/bazel_query.pysrc/scripts/testbot/tools/file_ops.pysrc/scripts/testbot/tools/shell.pysrc/scripts/testbot/tools/test_runner.py
| if: contains(github.event.pull_request.labels.*.name, 'ai-generated') | ||
| timeout-minutes: 15 | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout PR branch | ||
| uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 | ||
| with: | ||
| ref: ${{ github.event.pull_request.head.ref }} | ||
| fetch-depth: 0 | ||
| token: ${{ secrets.SVC_OSMO_CI_TOKEN }} | ||
|
|
There was a problem hiding this comment.
Harden trust boundary before running PR-head code with secrets.
At Lines 40-46 and Lines 82-89, the job executes code from github.event.pull_request.head.ref while exposing SVC_OSMO_CI_TOKEN and NVIDIA_NIM_KEY. If an untrusted PR receives ai-generated, this can become a secret-exfil path.
🔒 Suggested guardrail
- if: contains(github.event.pull_request.labels.*.name, 'ai-generated')
+ if: >
+ contains(github.event.pull_request.labels.*.name, 'ai-generated') &&
+ github.event.pull_request.head.repo.full_name == github.repository &&
+ github.event.pull_request.user.login == 'testbot[bot]'Also applies to: 82-89
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/testbot-respond.yaml around lines 36 - 46, The checkout
step that uses ref: github.event.pull_request.head.ref ("Checkout PR branch") is
pulling and running untrusted PR head code while exposing secrets
SVC_OSMO_CI_TOKEN and NVIDIA_NIM_KEY; change the workflow to avoid checking out
or executing untrusted PR code with repository secrets by gating that step:
replace using the PR head ref with a safe source (e.g., use the merge commit or
github.sha / pull_request.target ref) or only use the service secrets when a
trusted label/actor is present, and for untrusted PRs use the default
GITHUB_TOKEN without extra secrets and set persist-credentials: false; ensure
both occurrences (the checkout step at the top and the similar one later) are
updated accordingly.
| - name: Checkout | ||
| uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 | ||
| with: | ||
| lfs: true | ||
| fetch-depth: 0 | ||
| # Use service account PAT so git push and PR creation trigger CI workflows | ||
| token: ${{ secrets.SVC_OSMO_CI_TOKEN }} |
There was a problem hiding this comment.
Don't validate generated tests in the same credentialed job.
This workflow eventually executes LLM-generated test code, but the job also exposes LLM_API_KEY, CODECOV_TOKEN, GH_TOKEN, and a persisted service-account checkout credential to that same process tree. A prompt-injected or malicious test can read/exfiltrate those secrets during validation. Please run validation in an unprivileged job/container with persist-credentials: false, and only inject the write token in the later PR-creation step.
Also applies to: 111-122
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/testbot.yaml around lines 57 - 63, The workflow currently
checks out the repo with a credentialed checkout (actions/checkout@...) and
exposes service tokens (token: ${{ secrets.SVC_OSMO_CI_TOKEN }}) to the same job
that runs LLM-generated test validation; change the flow so the validation runs
in an unprivileged job/container by setting persist-credentials: false on the
checkout used for test validation (or use a separate job that performs the
checkout with persist-credentials: false) and remove any secret/token injection
(LLM_API_KEY, CODECOV_TOKEN, GH_TOKEN, SVC_OSMO_CI_TOKEN) from that job; then
add a separate, later job/step that performs a credentialed checkout or uses
token: ${{ secrets.SVC_OSMO_CI_TOKEN }} only when creating the PR or pushing
results (the PR-creation step), ensuring secrets are injected only into that
write-only job.
| # langchain/langgraph deps are installed via pip in CI (not Bazel-managed for v1). | ||
| # See src/agent_requirements.txt. |
There was a problem hiding this comment.
Fix stale requirements path in comment.
The comment points to src/agent_requirements.txt, but this PR adds src/scripts/testbot/requirements.txt. This is easy to misread during maintenance.
Suggested patch
- # See src/agent_requirements.txt.
+ # See src/scripts/testbot/requirements.txt.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # langchain/langgraph deps are installed via pip in CI (not Bazel-managed for v1). | |
| # See src/agent_requirements.txt. | |
| # langchain/langgraph deps are installed via pip in CI (not Bazel-managed for v1). | |
| # See src/scripts/testbot/requirements.txt. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scripts/testbot/BUILD` around lines 11 - 12, Update the stale comment
referencing src/agent_requirements.txt to point to the new requirements file
added by this PR (src/scripts/testbot/requirements.txt); locate the comment
block that mentions "langchain/langgraph deps are installed via pip in CI (not
Bazel-managed for v1)." and replace the path so it accurately documents where
those deps are declared, keeping the rest of the comment intact.
| parser.add_argument("--max-targets", type=int, default=1, | ||
| help="Maximum number of files to target per run") | ||
| parser.add_argument("--max-lines", type=int, default=200, | ||
| help="Maximum uncovered lines per target (larger files are partially covered)") | ||
| parser.add_argument("--max-retries", type=int, default=3, | ||
| help="Maximum retries per target on test failure") | ||
| parser.add_argument("--provider", default="claude", | ||
| choices=["nemotron", "claude"], | ||
| help="LLM provider for test generation") | ||
| parser.add_argument("--dry-run", action="store_true", | ||
| help="Generate tests but don't create a PR") | ||
| args = parser.parse_args() |
There was a problem hiding this comment.
Validate CLI numeric bounds early.
--max-targets, --max-lines, and --max-retries currently accept invalid negatives/zero values, which can put the pipeline in an invalid state.
✅ Suggested guard
args = parser.parse_args()
+ if args.max_targets < 1:
+ parser.error("--max-targets must be >= 1")
+ if args.max_lines < 1:
+ parser.error("--max-lines must be >= 1")
+ if args.max_retries < 0:
+ parser.error("--max-retries must be >= 0")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parser.add_argument("--max-targets", type=int, default=1, | |
| help="Maximum number of files to target per run") | |
| parser.add_argument("--max-lines", type=int, default=200, | |
| help="Maximum uncovered lines per target (larger files are partially covered)") | |
| parser.add_argument("--max-retries", type=int, default=3, | |
| help="Maximum retries per target on test failure") | |
| parser.add_argument("--provider", default="claude", | |
| choices=["nemotron", "claude"], | |
| help="LLM provider for test generation") | |
| parser.add_argument("--dry-run", action="store_true", | |
| help="Generate tests but don't create a PR") | |
| args = parser.parse_args() | |
| parser.add_argument("--max-targets", type=int, default=1, | |
| help="Maximum number of files to target per run") | |
| parser.add_argument("--max-lines", type=int, default=200, | |
| help="Maximum uncovered lines per target (larger files are partially covered)") | |
| parser.add_argument("--max-retries", type=int, default=3, | |
| help="Maximum retries per target on test failure") | |
| parser.add_argument("--provider", default="claude", | |
| choices=["nemotron", "claude"], | |
| help="LLM provider for test generation") | |
| parser.add_argument("--dry-run", action="store_true", | |
| help="Generate tests but don't create a PR") | |
| args = parser.parse_args() | |
| if args.max_targets < 1: | |
| parser.error("--max-targets must be >= 1") | |
| if args.max_lines < 1: | |
| parser.error("--max-lines must be >= 1") | |
| if args.max_retries < 0: | |
| parser.error("--max-retries must be >= 0") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scripts/testbot/main.py` around lines 24 - 35, The CLI numeric flags
defined via parser.add_argument (--max-targets, --max-lines, --max-retries)
currently allow invalid values; add explicit validation after args =
parser.parse_args() to enforce sensible bounds (e.g., args.max_targets > 0,
args.max_lines > 0, args.max_retries >= 0) and call parser.error(...) with a
clear message if any check fails; alternatively, enforce constraints directly in
the add_argument calls (use a custom type/validator or choices) so that parser
rejects negatives/zeros for parser.add_argument("--max-targets"),
parser.add_argument("--max-lines"), and parser.add_argument("--max-retries")
before proceeding.
| if source_content.startswith("Error"): | ||
| logger.warning("Cannot read source file for review: %s", target.file_path) | ||
| return True, "" | ||
|
|
||
| llm = get_llm(state["provider"]) | ||
| if llm.client is None: | ||
| logger.warning("LLM client not available, skipping LLM review") | ||
| return True, "" |
There was a problem hiding this comment.
Fail the review stage closed when the reviewer can't run.
Unreadable source files, missing clients, API exceptions, and unparsable replies all currently return (True, ""). That means the quality gate silently degrades to PASS exactly when it is least trustworthy, and the graph continues toward PR creation.
Return a failed review and surface the reason to the retry path
if source_content.startswith("Error"):
logger.warning("Cannot read source file for review: %s", target.file_path)
- return True, ""
+ return False, f"Cannot read source file for review: {target.file_path}"
@@
llm = get_llm(state["provider"])
if llm.client is None:
logger.warning("LLM client not available, skipping LLM review")
- return True, ""
+ return False, "LLM client not available for review"
@@
except Exception as exc: # pylint: disable=broad-except
- logger.warning("LLM review failed: %s. Skipping.", exc)
- return True, ""
+ logger.warning("LLM review failed: %s", exc)
+ return False, f"LLM review failed: {exc}"
@@
verdict_match = re.search(r"VERDICT:\s*(PASS|FAIL)", review_text, re.IGNORECASE)
if not verdict_match:
- logger.warning("Could not parse VERDICT from LLM review. Assuming PASS.")
- return True, ""
+ logger.warning("Could not parse VERDICT from LLM review")
+ return False, "LLM review returned an unparseable verdict."Also applies to: 81-101
| def _get_repo_nwo() -> str: | ||
| """Get the owner/repo from gh CLI.""" | ||
| result = run_shell("gh repo view --json nameWithOwner --jq .nameWithOwner") | ||
| if result.returncode == 0 and result.stdout.strip(): | ||
| return result.stdout.strip() | ||
| return "NVIDIA/OSMO" |
There was a problem hiding this comment.
_get_repo_nwo silently falls back to hardcoded default.
If gh repo view fails (e.g., not in a git repo or gh not authenticated), this silently returns "NVIDIA/OSMO". This could cause the workflow to operate on the wrong repository. Consider logging a warning or raising an error.
🛡️ Suggested fix
def _get_repo_nwo() -> str:
"""Get the owner/repo from gh CLI."""
result = run_shell("gh repo view --json nameWithOwner --jq .nameWithOwner")
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip()
+ logger.warning("Failed to detect repo from gh CLI, falling back to NVIDIA/OSMO")
return "NVIDIA/OSMO"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scripts/testbot/respond.py` around lines 56 - 61, The helper
_get_repo_nwo currently swallows failures from run_shell("gh repo view...") and
returns the hardcoded "NVIDIA/OSMO", which can cause actions against the wrong
repo; update _get_repo_nwo to surface failures instead of silently defaulting:
when result.returncode != 0 or stdout is empty, either log a warning with the
captured stdout/stderr and the failing command (use the module's logger if
available) and then raise a RuntimeError (or a custom exception) explaining the
gh CLI failure and suggesting authentication/repo context, or at minimum log the
full error and re-raise; reference _get_repo_nwo and run_shell to locate and
change the return/default behavior.
| import os | ||
| import tempfile | ||
| import unittest | ||
|
|
||
| from testbot.graph import route_done, route_review, route_validation | ||
| from testbot.lcov_parser import CoverageEntry | ||
| from testbot.nodes.analyze import select_targets | ||
| from testbot.nodes.review import review_test | ||
| from testbot.plugins.base import GeneratedTest | ||
| from testbot.state import TestbotState, TestTarget |
There was a problem hiding this comment.
Stub get_llm() in these tests instead of relying on ambient env state.
review_test() calls testbot.nodes.review.get_llm() directly. If CI or a developer shell has real LLM credentials configured, these tests stop being deterministic integration tests and start making live network calls with provider-dependent outcomes.
Patch the review client to a deterministic fake
import os
import tempfile
import unittest
+from unittest import mock
@@
- result = review_test(state)
+ with mock.patch(
+ "testbot.nodes.review.get_llm",
+ return_value=mock.Mock(client=None, model="nemotron"),
+ ):
+ result = review_test(state)
@@
- result = review_test(state)
+ with mock.patch(
+ "testbot.nodes.review.get_llm",
+ return_value=mock.Mock(client=None, model="nemotron"),
+ ):
+ result = review_test(state)Also applies to: 91-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scripts/testbot/tests/test_integration.py` around lines 5 - 14, The tests
call review_test() which calls testbot.nodes.review.get_llm() and can hit real
LLM services; instead stub get_llm to a deterministic fake LLM client in these
tests. Use unittest.mock.patch to patch testbot.nodes.review.get_llm (or
patch.object on the review module) in the test module setup or per-test,
returning a lightweight fake object implementing the same interface used by
review_test (e.g., a generate/respond method that returns fixed outputs), apply
the same patch for the other tests around lines 91-117, and ensure the fake is
used before calling review_test() so no real network/credentials are accessed.
| except PermissionError: | ||
| pass # also acceptable: process exists but we can't signal it | ||
| except (FileNotFoundError, ValueError): |
There was a problem hiding this comment.
Don’t treat PermissionError as a successful kill check.
This can hide a still-running process and make the test pass incorrectly.
Suggested fix
- except PermissionError:
- pass # also acceptable: process exists but we can't signal it
+ except PermissionError as permission_error:
+ self.fail(f"Unable to verify process termination: {permission_error}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except PermissionError: | |
| pass # also acceptable: process exists but we can't signal it | |
| except (FileNotFoundError, ValueError): | |
| except PermissionError as permission_error: | |
| self.fail(f"Unable to verify process termination: {permission_error}") | |
| except (FileNotFoundError, ValueError): |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scripts/testbot/tests/test_tools.py` around lines 94 - 96, The test
currently swallows PermissionError in the kill-check except block (the "except
PermissionError: pass" after the kill signal attempt), which can mask a
still-running process; remove that bare pass so PermissionError is not treated
as success — either let PermissionError propagate (so the test fails) or include
PermissionError in the outer except tuple (e.g., except (FileNotFoundError,
ValueError, PermissionError):) and handle it as a failure case; update the
exception handling around the kill/send signal logic (the except blocks shown in
test_tools.py) accordingly.
| os.makedirs(os.path.dirname(path), exist_ok=True) | ||
| with open(path, "w", encoding="utf-8") as file: | ||
| file.write(content) | ||
| return f"Written {len(content)} bytes to {path}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import os
print("dirname:", repr(os.path.dirname("output.txt")))
try:
os.makedirs(os.path.dirname("output.txt"), exist_ok=True)
print("makedirs succeeded")
except Exception as e:
print("makedirs error:", type(e).__name__, str(e))
PYRepository: NVIDIA/OSMO
Length of output: 140
🏁 Script executed:
git ls-files src/scripts/testbot/tools/Repository: NVIDIA/OSMO
Length of output: 247
🏁 Script executed:
cat -n src/scripts/testbot/tools/file_ops.pyRepository: NVIDIA/OSMO
Length of output: 1279
🏁 Script executed:
python3 << 'EOF'
# Test 1: ASCII string
ascii_content = "hello world"
print(f"ASCII string '{ascii_content}':")
print(f" len(content): {len(ascii_content)}")
print(f" len(content.encode('utf-8')): {len(ascii_content.encode('utf-8'))}")
# Test 2: Non-ASCII string
non_ascii_content = "hello 世界" # Contains Chinese characters (3 bytes each in UTF-8)
print(f"\nNon-ASCII string '{non_ascii_content}':")
print(f" len(content): {len(non_ascii_content)}")
print(f" len(content.encode('utf-8')): {len(non_ascii_content.encode('utf-8'))}")
# Test 3: Emoji
emoji_content = "Hello 👋" # Emoji is 4 bytes in UTF-8
print(f"\nEmoji string '{emoji_content}':")
print(f" len(content): {len(emoji_content)}")
print(f" len(content.encode('utf-8')): {len(emoji_content.encode('utf-8'))}")
EOFRepository: NVIDIA/OSMO
Length of output: 298
Fix filename-only path handling and byte count accuracy in write_file.
The function fails on filename-only paths (e.g., "output.txt") because os.path.dirname() returns an empty string, causing os.makedirs("") to raise FileNotFoundError. Additionally, the return message reports bytes but counts characters using len(content), which is inaccurate for non-ASCII content—UTF-8 encoded text may be longer in bytes than in characters.
Suggested patch
def write_file(path: str, content: str) -> str:
"""Write content to a file, creating parent directories if needed."""
try:
- os.makedirs(os.path.dirname(path), exist_ok=True)
+ parent_directory = os.path.dirname(path)
+ if parent_directory:
+ os.makedirs(parent_directory, exist_ok=True)
with open(path, "w", encoding="utf-8") as file:
file.write(content)
- return f"Written {len(content)} bytes to {path}"
+ return f"Written {len(content.encode('utf-8'))} bytes to {path}"
except OSError as exception:
return f"Error writing {path}: {exception}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| os.makedirs(os.path.dirname(path), exist_ok=True) | |
| with open(path, "w", encoding="utf-8") as file: | |
| file.write(content) | |
| return f"Written {len(content)} bytes to {path}" | |
| parent_directory = os.path.dirname(path) | |
| if parent_directory: | |
| os.makedirs(parent_directory, exist_ok=True) | |
| with open(path, "w", encoding="utf-8") as file: | |
| file.write(content) | |
| return f"Written {len(content.encode('utf-8'))} bytes to {path}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scripts/testbot/tools/file_ops.py` around lines 22 - 25, The write_file
function fails when path is filename-only and returns an incorrect byte count;
fix it in write_file by computing dir = os.path.dirname(path) and only calling
os.makedirs(dir, exist_ok=True) if dir is non-empty (or use dir or "."), then
write the content as before but report the actual number of bytes by computing
byte_count = len(content.encode("utf-8")) (or open in binary and write
content.encode("utf-8") and use the written byte length); update the return
message to use byte_count and reference the write_file function and the
variables path and content so the change is easy to locate.
| return ValidationResult( | ||
| passed=result.returncode == 0, | ||
| output=result.stdout + result.stderr, | ||
| retry_hint=result.stderr if result.returncode != 0 else None, | ||
| ) |
There was a problem hiding this comment.
Send the full Vitest failure output back to the retry loop.
Vitest often prints assertion diffs and compiler diagnostics to stdout, so using only stderr for retry_hint can leave the next LLM attempt blind even though output has the useful details.
Reuse the combined output for `retry_hint`
- return ValidationResult(
- passed=result.returncode == 0,
- output=result.stdout + result.stderr,
- retry_hint=result.stderr if result.returncode != 0 else None,
- )
+ combined_output = result.stdout + result.stderr
+ return ValidationResult(
+ passed=result.returncode == 0,
+ output=combined_output,
+ retry_hint=combined_output if result.returncode != 0 else None,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scripts/testbot/tools/test_runner.py` around lines 125 - 129, The
retry_hint currently uses only result.stderr which omits important Vitest
details printed to stdout; update the return value in test_runner.py (the
ValidationResult construction) so that retry_hint uses the combined output
(result.stdout + result.stderr) when result.returncode != 0, while keeping
passed as result.returncode == 0 and output as the combined output; locate the
ValidationResult instantiation in the function that returns it and replace the
retry_hint expression to reference the same combined output string.
|
Closed in favor of #788 |
Summary
Add testbot, an AI-powered pipeline that automatically generates tests for uncovered code, validates them, and opens PRs for human review. Includes a respond module that addresses inline PR review comments.
Issue #760
Architecture
Test Generation Pipeline (
testbot.yaml)bazel test(Python/Go) orpnpm test(UI/TypeScript)Retry loop feeds validation/review errors back to the LLM (max 3 attempts).
Review Response (
testbot-respond.yaml)pull_request_review_commentfor PRs withai-generatedlabelPlugin Architecture
LLMProvider(ABC)LLMClientclaude(default),nemotron— configurable via--providerLLM_API_KEY(single key for all providers)Key Design Decisions
Settings
Testing
Files
Checklist
Summary by CodeRabbit
New Features
Documentation