Skip to content

Add testbot: AI-powered test generation pipeline#743

Closed
jiaenren wants to merge 2 commits intomainfrom
jiaenr/ai-agent-pipeline-design
Closed

Add testbot: AI-powered test generation pipeline#743
jiaenren wants to merge 2 commits intomainfrom
jiaenr/ai-agent-pipeline-design

Conversation

@jiaenren
Copy link
Copy Markdown
Collaborator

@jiaenren jiaenren commented Mar 25, 2026

Summary

Add testbot, an AI-powered pipeline that automatically generates tests for uncovered code, validates them, and opens PRs for human review. Includes a respond module that addresses inline PR review comments.

Issue #760

Architecture

Test Generation Pipeline (testbot.yaml)

Codecov API → analyze → write → validate → review → create_pr
                         ↑         |          |
                         └─────────┴──────────┘  (retry with feedback)
Stage Description
Analyze Fetches coverage from Codecov API, selects lowest-coverage file, caps uncovered lines (default 100)
Write Calls LLM (Claude via NVIDIA inference API) to generate tests with quality rules
Validate Python syntax check → bazel test (Python/Go) or pnpm test (UI/TypeScript)
Review LLM quality review (PASS/FAIL verdict with feedback)
Create PR Branches from HEAD, commits tests + BUILD entries, opens PR against main

Retry loop feeds validation/review errors back to the LLM (max 3 attempts).

Review Response (testbot-respond.yaml)

PR comment → GraphQL fetch threads → filter unresolved
  → group by file → LLM (with source context)
  → parse JSON {fix, replies, resolve}
  → validate fix → commit+push (if tests pass)
  → reply to each thread → resolve addressed threads
  • Triggers on pull_request_review_comment for PRs with ai-generated label
  • Fetches review threads via GraphQL (includes resolution status)
  • Groups unresolved threads by file, one LLM call per file
  • Includes source file context (inferred from test path) for informed responses
  • LLM returns JSON: fix + individual replies with resolve verdict per comment
  • Validates fix with retries (max 2), commits + pushes if tests pass
  • Resolves threads the LLM marks as addressed
  • Clear failure messaging when fix can't be applied

Plugin Architecture

Component Description
LLMProvider (ABC) Abstract interface for LLM backends
LLMClient OpenAI-compatible implementation for NVIDIA inference API
Providers claude (default), nemotron — configurable via --provider
Env vars LLM_API_KEY (single key for all providers)

Key Design Decisions

Decision Rationale
Codecov API for coverage Seconds vs 15+ min bazel coverage; better data from self-hosted CI
LLM review as sole quality gate Regex static checks were brittle with high false-positive rates
BUILD entries auto-generated LLM-generated BUILD entries had wrong dep targets
GraphQL for thread resolution REST API has no resolution support; enables skip/resolve workflow
JSON output for respond module Reliable parsing vs fragile text/regex extraction
Batch comments per file One LLM call handles all comments; cancel-in-progress handles arrivals

Settings

Setting Default
Schedule Weekdays 6 AM UTC
Max targets per run 1
Max uncovered lines per target 200
Max retries per target 3
Max fix retries (respond) 3
LLM provider claude (aws/anthropic/claude-opus-4-5)
Coverage source Codecov API
Max auto-responses per trigger 20

Testing

Files

src/scripts/testbot/           # Python package
├── main.py                    # CLI entry point
├── graph.py                   # LangGraph pipeline (StateGraph + routing)
├── state.py                   # TestbotState TypedDict + TestTarget dataclass
├── codecov_client.py          # Codecov API client → CoverageEntry
├── lcov_parser.py             # CoverageEntry dataclass + path filtering
├── respond.py                 # PR review comment responder (GraphQL + LLM)
├── nodes/                     # Pipeline nodes
│   ├── analyze.py             # Coverage analysis + target selection
│   ├── write.py               # LLM test generation + BUILD entry
│   ├── validate.py            # Bazel/vitest test runner
│   ├── review.py              # LLM quality review
│   └── create_pr.py           # Git branch + PR creation
├── plugins/                   # LLM provider abstraction
│   ├── base.py                # LLMProvider ABC + TestType enum
│   ├── llm_client.py          # OpenAI-compatible LLM client
│   └── __init__.py            # Plugin registry
├── prompts/                   # LLM prompt templates
│   ├── python_test.py         # Python test generation
│   ├── go_test.py             # Go test generation
│   ├── ui_test.py             # TypeScript/Vitest test generation
│   ├── review.py              # Quality review prompt
│   ├── respond.py             # Review comment response prompt
│   └── quality_rules.py       # Shared SWE best practices rules
├── tools/                     # Shell/file/test utilities
│   ├── test_runner.py         # Bazel + vitest test execution
│   ├── file_ops.py            # File read/write
│   └── shell.py               # Shell command execution
├── requirements.txt           # pip dependencies (openai, langgraph, requests)
└── tests/                     # 129 unit tests

.github/workflows/
├── testbot.yaml               # Test generation workflow
└── testbot-respond.yaml       # Review comment response workflow

Checklist

  • I am familiar with the Contributing Guidelines
  • New or existing tests cover these changes
  • The documentation is up to date with these changes

Summary by CodeRabbit

  • New Features

    • Added AI-powered automated test generation targeting uncovered code, integrated with Codecov coverage analysis and supporting Python, Go, and TypeScript tests.
    • Added automated PR review response workflow using AI to generate suggested fixes and replies to review comments.
    • Introduced GitHub Actions workflows for scheduled weekday test generation and on-demand PR review responses.
  • Documentation

    • Added comprehensive testbot documentation covering end-to-end test generation and review workflows.

@jiaenren jiaenren requested a review from a team as a code owner March 25, 2026 06:07
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive AI-driven testbot system for automated test generation and PR review. It includes two GitHub Actions workflows, a testbot package with LangGraph-based pipeline stages (analyze coverage, generate tests, validate, review, create PR), LLM provider plugins, prompt templates, review response automation, and extensive test coverage for all components.

Changes

Cohort / File(s) Summary
GitHub Actions Workflows
.github/workflows/testbot.yaml, .github/workflows/testbot-respond.yaml
Two new workflows: testbot.yaml runs weekday schedule or manual dispatch to execute test generation pipeline with configurable targets/lines/provider; testbot-respond.yaml responds to PR review comments on labeled PRs.
Testbot Package Core
src/scripts/testbot/__init__.py, src/scripts/testbot/main.py, src/scripts/testbot/state.py, src/scripts/testbot/graph.py
Package initialization, CLI entry point with argument parsing, state structures (TestTarget, TestbotState), and LangGraph workflow builder with routing logic for analyze→write→validate→review→create-pr pipeline.
Coverage Analysis
src/scripts/testbot/codecov_client.py, src/scripts/testbot/lcov_parser.py, src/scripts/testbot/nodes/analyze.py
Codecov API client for fetching coverage reports, LCOV parser dataclasses, and analyze node that selects coverage targets with configurable limits and test-type detection.
Test Generation & Validation
src/scripts/testbot/nodes/write.py, src/scripts/testbot/nodes/validate.py
Write node delegates to LLM provider and registers BUILD entries for Python tests; validate node executes generated tests and reports pass/fail status with retry hints.
Quality Review & PR Creation
src/scripts/testbot/nodes/review.py, src/scripts/testbot/nodes/create_pr.py
Review node invokes LLM-based quality assessment; create-pr node commits generated files, pushes branch, and opens GitHub PR with ai-generated label.
LLM Provider Architecture
src/scripts/testbot/plugins/base.py, src/scripts/testbot/plugins/llm_client.py, src/scripts/testbot/plugins/__init__.py
Base classes for LLM providers with test generation/validation contracts, LLMClient implementing OpenAI-compatible inference for Claude/Nemotron, and provider registry with instance caching.
Prompt Templates
src/scripts/testbot/prompts/__init__.py, src/scripts/testbot/prompts/quality_rules.py, src/scripts/testbot/prompts/python_test.py, src/scripts/testbot/prompts/go_test.py, src/scripts/testbot/prompts/ui_test.py, src/scripts/testbot/prompts/review.py
System and user prompt definitions for test generation across Python/Go/UI languages, quality assessment rules preamble, and LLM review prompt builder; all templates include escape utilities for fenced content.
PR Review Response
src/scripts/testbot/respond.py
CLI script that fetches PR review threads, filters actionable ones, batches LLM calls per file with thread context, parses JSON responses for fixes/replies, validates/commits changes conditionally, and posts replies/resolves threads.
Shell & File Tools
src/scripts/testbot/tools/shell.py, src/scripts/testbot/tools/file_ops.py, src/scripts/testbot/tools/bazel_query.py, src/scripts/testbot/tools/__init__.py
Shell execution with timeout/signal handling, file read/write utilities with error messaging, and Bazel query builder for discovering build targets.
Test Suite
src/scripts/testbot/tests/*
Comprehensive unit and integration tests covering coverage entry parsing, analyzer selection/skipping, graph routing transitions, prompt building, LLM response parsing, PR title/body generation, file operations, shell execution, Bazel integration, and review thread filtering.
Documentation
AGENTS.md, src/scripts/testbot/README.md, src/scripts/testbot/requirements.txt, src/scripts/testbot/BUILD
Updated codebase structure doc, detailed testbot README with pipeline flow diagrams and configuration knobs, pip dependencies specification, and public Bazel target for testbot entry point.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GitHub
    participant Testbot as Testbot Pipeline
    participant Codecov
    participant LLM
    participant BuildSystem as Build/Test System
    participant Git

    User->>GitHub: Trigger testbot (schedule/manual)
    GitHub->>Testbot: Start workflow
    Testbot->>Codecov: fetch_coverage(token)
    Codecov-->>Testbot: Coverage report
    Testbot->>Testbot: analyze_coverage()<br/>select_targets()
    loop For each target
        Testbot->>Testbot: write_test()<br/>(get_llm().generate_test())
        LLM-->>Testbot: GeneratedTest
        Testbot->>BuildSystem: validate_test()<br/>(run_test)
        BuildSystem-->>Testbot: ValidationResult
        alt Validation Failed
            Testbot->>LLM: generate_test()<br/>(with retry_context)
            LLM-->>Testbot: Regenerated test
        end
        Testbot->>LLM: review_test()
        LLM-->>Testbot: Review verdict
    end
    Testbot->>Git: create_pr()<br/>(commit & push)
    Git-->>Testbot: Branch pushed
    Testbot->>GitHub: gh pr create<br/>(ai-generated label)
    GitHub-->>Testbot: PR URL
    Testbot-->>User: Complete with PR link
Loading
sequenceDiagram
    participant GitHub
    participant Testbot as Testbot Respond
    participant GraphQL as GitHub GraphQL
    participant TestFile as Generated Test File
    participant LLM
    participant BuildSystem as Build/Test System

    GitHub->>Testbot: Review comment created
    Testbot->>GraphQL: Query review threads<br/>(unresolved, PR context)
    GraphQL-->>Testbot: Thread list
    Testbot->>Testbot: Filter actionable threads<br/>(not testbot source, not resolved)
    loop Per file with comments
        Testbot->>TestFile: Read current content
        TestFile-->>Testbot: File content
        Testbot->>LLM: build_batch_respond_prompt()<br/>(all file comments + context)
        LLM-->>Testbot: JSON(fix, replies[])
        alt Fix provided
            Testbot->>TestFile: Apply fix
            Testbot->>BuildSystem: validate_test()<br/>(run_test)
            BuildSystem-->>Testbot: Pass/Fail
            alt Validation passes
                Testbot->>GitHub: Commit & push fix
            else Validation fails (retries remain)
                Testbot->>LLM: Retry with error context
            end
        end
    end
    loop For each reply
        Testbot->>GitHub: Post comment reply
        Testbot->>GraphQL: Resolve thread<br/>(if indicated)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A testbot hops into view,
With tests it generates, shiny and new!
Coverage gaps it hunts with care,
LLMs help it write with flair—
PRs bloom where code gaps were bare! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.59% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add testbot: AI-powered test generation pipeline' clearly and concisely summarizes the main change in the PR, which introduces a new AI-driven test generation system.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jiaenr/ai-agent-pipeline-design

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (20)
src/scripts/coverage_agent/__init__.py-1-2 (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Add # pylint: disable=line-too-long for the copyright header.

The copyright line is 103 characters, exceeding the 100-character limit and causing the pylint failure. As per coding guidelines, add the pylint disable comment instead of breaking the line.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/__init__.py` around lines 1 - 2, Add a pylint
disable for line-too-long to the module so the 103-character SPDX copyright
header doesn't fail linting; insert the comment "# pylint:
disable=line-too-long" in the top of the file near the SPDX header lines (the
two lines starting with "# SPDX-FileCopyrightText" and "#
SPDX-License-Identifier") so the long copyright line is ignored by pylint.
src/scripts/coverage_agent/nodes/__init__.py-1-2 (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Add # pylint: disable=line-too-long for the copyright header.

Same pylint C0301 failure as reported. Add the disable comment per coding guidelines.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/nodes/__init__.py` around lines 1 - 2, Add a
module-level pylint disable for the long-line rule to silence C0301 for the SPDX
copyright header: insert the comment "# pylint: disable=line-too-long" at the
top of src/scripts/coverage_agent/nodes/__init__.py (immediately after or on the
same block as the existing SPDX header lines) so the copyright header no longer
triggers the lint failure.
src/scripts/coverage_agent/tools/file_ops.py-20-21 (1)

20-21: ⚠️ Potential issue | 🟡 Minor

Handle edge case when path has no directory component.

os.path.dirname("filename.txt") returns an empty string "", and os.makedirs("", exist_ok=True) raises FileNotFoundError. This will happen when writing to a file in the current directory without a path prefix.

Proposed fix
     try:
-        os.makedirs(os.path.dirname(path), exist_ok=True)
+        dir_name = os.path.dirname(path)
+        if dir_name:
+            os.makedirs(dir_name, exist_ok=True)
         with open(path, "w") as file:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tools/file_ops.py` around lines 20 - 21, When
creating parent directories for a file path, guard the os.makedirs call so it
doesn't get passed an empty string; check the directory component returned by
os.path.dirname(path) and only call os.makedirs(dirpath, exist_ok=True) if
dirpath is non-empty (or otherwise treat empty as current directory) to avoid
FileNotFoundError when path has no directory component; update the logic around
the existing os.path.dirname(path) / os.makedirs invocation in file_ops.py to
perform that check.
src/scripts/coverage_agent/prompts/go_test.py-30-30 (1)

30-30: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f-string prefix.

Line 30 uses an f-string but contains no placeholders. This was flagged by Ruff (F541).

Proposed fix
-    prompt = f"Generate Go unit tests for the following source file.\n\n"
+    prompt = "Generate Go unit tests for the following source file.\n\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/go_test.py` at line 30, The assignment to
the variable prompt uses an f-string with no placeholders (remove the
unnecessary "f" prefix) — locate the prompt = f"Generate Go unit tests for the
following source file.\n\n" statement and change it to a plain string literal
(prompt = "Generate Go unit tests for the following source file.\n\n") so the
extraneous f-string prefix flagged by Ruff (F541) is removed.
src/scripts/coverage_agent/prompts/ui_test.py-31-31 (1)

31-31: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f-string prefix.

Line 31 uses an f-string but contains no placeholders. This was flagged by Ruff (F541).

Proposed fix
-    prompt = f"Generate Vitest tests for the following TypeScript source file.\n\n"
+    prompt = "Generate Vitest tests for the following TypeScript source file.\n\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/ui_test.py` at line 31, The prompt
assignment in ui_test.py uses an unnecessary f-string (variable prompt) with no
placeholders; replace the f-string literal with a regular string literal (remove
the leading f) in the prompt = ... statement so it becomes a normal string to
resolve the Ruff F541 warning.
src/scripts/coverage_agent/prompts/python_test.py-45-45 (1)

45-45: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f-string prefix.

Line 45 uses an f-string but contains no placeholders. This was flagged by Ruff (F541).

Proposed fix
-    prompt = f"Generate unit tests for the following Python source file.\n\n"
+    prompt = "Generate unit tests for the following Python source file.\n\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/python_test.py` at line 45, The assignment
to the variable prompt uses an unnecessary f-string prefix (prompt = f"...")
even though there are no interpolation placeholders; update the prompt
assignment in python_test.py to use a regular string literal (remove the leading
"f") so the value is identical but avoids Ruff F541; locate the prompt variable
assignment in the file (named prompt) and replace the f-string with a plain
string literal.
src/scripts/coverage_agent/prompts/go_test.py-4-4 (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-string prefix at line 30.

Line 30 uses an f-string without any placeholders. Change f"Generate Go unit tests for the following source file.\n\n" to a regular string literal:

prompt = "Generate Go unit tests for the following source file.\n\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/go_test.py` at line 4, Change the
unnecessary f-string assignment to a normal string: locate the prompt variable
in go_test.py (the line setting prompt = f"Generate Go unit tests for the
following source file.\n\n") and remove the leading f so it becomes prompt =
"Generate Go unit tests for the following source file.\n\n"; no other logic
changes are needed.
src/scripts/coverage_agent/plugins/openai_writer.py-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Add module docstring to fix pylint C0114.

Pylint flagged the missing module docstring.

Proposed fix
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
+"""OpenAI-based test writer plugin using ChatOpenAI from langchain-openai."""

 import logging
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/plugins/openai_writer.py` around lines 1 - 3, Add
a module-level docstring to resolve pylint C0114 by placing a short descriptive
string at the top of the openai_writer.py module (first non-shebang/comment
lines); the docstring should briefly state the module's purpose (e.g.,
"Utilities for writing coverage data to OpenAI/remote writer" or similar) and be
separated by a blank line from the following imports/definitions so pylint
recognizes it. Ensure the docstring appears before any code, comments aside from
SPDX headers, and update any module-level documentation if needed.
src/scripts/coverage_agent/main.py-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Add module docstring to fix pylint C0114.

The pipeline failure indicates a missing module docstring.

Proposed fix
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
+"""CLI entrypoint for the AI Coverage Agent pipeline."""
 
 import argparse
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/main.py` around lines 1 - 3, Add a proper
module-level docstring at the top of this module (coverage_agent.main / main.py)
to satisfy pylint C0114: include a short description of the module’s
responsibility and, if helpful, a brief note about key classes/functions or
high-level behavior; place the docstring as the first statement in the file
(before the SPDX headers or immediately after if headers must remain) so pylint
recognizes it.
src/scripts/coverage_agent/main.py-18-21 (1)

18-21: ⚠️ Potential issue | 🟡 Minor

Fix line length violations flagged by pylint (C0301).

Lines exceed 100 characters. Per coding guidelines, add # pylint: disable=line-too-long or shorten the lines.

Proposed fix — shorten help strings
-    parser.add_argument("--lcov-path", default="bazel-out/_coverage/_coverage_report.dat",
-                        help="Path to backend LCOV coverage file (Python/Go)")
-    parser.add_argument("--ui-lcov-path", default="src/ui/coverage/lcov.info",
-                        help="Path to UI LCOV coverage file (Vitest/TypeScript)")
+    parser.add_argument(
+        "--lcov-path",
+        default="bazel-out/_coverage/_coverage_report.dat",
+        help="Path to backend LCOV file (Python/Go)",
+    )
+    parser.add_argument(
+        "--ui-lcov-path",
+        default="src/ui/coverage/lcov.info",
+        help="Path to UI LCOV file (Vitest/TypeScript)",
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/main.py` around lines 18 - 21, The two argument
definitions for parser.add_argument (flags --lcov-path and --ui-lcov-path) have
help strings that push the source lines past the 100-character limit; shorten
the help text to be concise (e.g., "Path to backend LCOV file" and "Path to UI
LCOV file") or move the long help text to a nearby constant and reference that
constant, ensuring each resulting source line is <=100 chars; alternatively, if
you intentionally want to keep the long descriptions, add a per-line pylint
exemption comment (# pylint: disable=line-too-long) next to the offending
parser.add_argument calls (prefer shortening first).
.github/workflows/coverage-agent-respond.yaml-89-92 (1)

89-92: ⚠️ Potential issue | 🟡 Minor

Missing OPENAI_API_KEY environment variable.

The main coverage-agent.yaml workflow passes OPENAI_API_KEY (line 148), but this workflow omits it. If the respond handler supports the openai provider, this secret should be included for consistency.

Proposed fix
         env:
           NVIDIA_API_KEY: ${{ secrets.NVIDIA_NIM_KEY }}
           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/coverage-agent-respond.yaml around lines 89 - 92, Add the
missing OPENAI_API_KEY env entry to the workflow's env block so the respond
handler has parity with the main coverage-agent.yaml; specifically, update the
env list that currently contains NVIDIA_API_KEY, ANTHROPIC_API_KEY, and GH_TOKEN
to also include OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} so any code paths
that use the openai provider will receive the secret.
src/scripts/coverage_agent/lcov_parser.py-1-6 (1)

1-6: ⚠️ Potential issue | 🟡 Minor

Add pylint disable for long copyright line and module docstring.

The pipeline reports Line too long (103/100) and Missing module docstring. Per coding guidelines, add # pylint: disable=line-too-long for the copyright line and add a module docstring.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
+# pylint: enable=line-too-long
+"""LCOV coverage file parser for the coverage agent pipeline."""

 import dataclasses
 import fnmatch
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/lcov_parser.py` around lines 1 - 6, Add a pylint
disable and a module docstring at the top of
src/scripts/coverage_agent/lcov_parser.py: append a comment like "# pylint:
disable=line-too-long" to the long copyright header line and add a short
module-level docstring (a one- or two-sentence triple-quoted string) directly
after the SPDX header and before the imports (i.e., before the existing import
dataclasses / fnmatch) that briefly describes the purpose of the lcov_parser
module.
src/scripts/coverage_agent/tools/test_runner.py-14-30 (1)

14-30: ⚠️ Potential issue | 🟡 Minor

Return type annotation is misleading; _file_path_to_bazel_target always returns a string.

The function signature declares Optional[str] but the fallback on lines 28-30 guarantees a non-None return. This causes the mypy error at line 54 where shlex.quote(target) receives str | None. Either:

  1. Remove Optional since the function always returns a string (preferred), or
  2. Add a None check before calling shlex.quote.
Proposed fix
-def _file_path_to_bazel_target(file_path: str) -> Optional[str]:
+def _file_path_to_bazel_target(file_path: str) -> str:

And remove the unused Optional import if no longer needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tools/test_runner.py` around lines 14 - 30, The
function _file_path_to_bazel_target always returns a string (either a discovered
target or the fallback), so change its return type annotation from Optional[str]
to str and remove the now-unused Optional import; this will fix the mypy error
when passing its result to shlex.quote (no None check needed). Locate
_file_path_to_bazel_target and update its signature and imports accordingly, and
run type checks to ensure no other callers relied on Optional.
src/scripts/coverage_agent/tests/test_lcov_parser.py-224-229 (1)

224-229: ⚠️ Potential issue | 🟡 Minor

Move import to module top level.

The import inside test_ui_files_detected_as_ui_type violates coding guidelines which state: "All imports must be at the top level of the module after the module docstring. No exceptions."

Proposed fix

Move the import to the top of the file with other imports:

 from coverage_agent.lcov_parser import CoverageEntry, parse_lcov
+from coverage_agent.plugins.base import TestType, detect_test_type

Then remove line 224 and update the test:

     def test_ui_files_detected_as_ui_type(self):
         """Verifies that UI files from Vitest LCOV get detect_test_type == UI.
 
         Note: generated.ts IS a valid UI file by path — it's filtered out by the
         LCOV parser's _is_ignored(), not by detect_test_type(). This test verifies
         both layers work correctly together.
         """
-        from coverage_agent.plugins.base import TestType, detect_test_type
-
         self.assertEqual(detect_test_type("src/ui/src/lib/utils.ts"), TestType.UI)

As per coding guidelines: "All imports must be at the top level of the module after the module docstring. No exceptions: imports inside functions are not allowed."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tests/test_lcov_parser.py` around lines 224 - 229,
The inline import of TestType and detect_test_type inside the test function
test_ui_files_detected_as_ui_type violates the rule requiring top-level imports;
move the line "from coverage_agent.plugins.base import TestType,
detect_test_type" to the module import section with the other imports (so
detect_test_type and TestType are imported at top-level), remove the in-function
import, and ensure the test still references detect_test_type and TestType
unchanged.
src/scripts/coverage_agent/plugins/__init__.py-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Add pylint disable for long copyright line and module docstring.

The pipeline reports Line too long (103/100) and Missing module docstring.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
+# pylint: enable=line-too-long
+"""Plugin registry for coverage agent writer implementations."""

 from coverage_agent.plugins.base import (
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/plugins/__init__.py` around lines 1 - 3, Add a
short module docstring or a pylint disable directive at the top of
src/scripts/coverage_agent/plugins/__init__.py to silence the "Missing module
docstring" and "Line too long (103/100)" warnings; for example, add a one-line
docstring or prepend a comment like "# pylint:
disable=line-too-long,missing-module-docstring" immediately after the SPDX
header so the linter no longer raises those two errors for this module.
src/scripts/coverage_agent/nodes/create_pr.py-86-95 (1)

86-95: ⚠️ Potential issue | 🟡 Minor

PR title/body with special characters may break shell command.

The PR title and body are embedded directly in a shell command string. If file paths contain quotes or other shell metacharacters, this could fail or cause injection. Consider using gh pr create with --body-file or passing arguments as a list.

♻️ Proposed fix using --body-file
+    import tempfile
+    with tempfile.NamedTemporaryFile(mode="w", suffix=".md", delete=False) as f:
+        f.write(pr_body)
+        body_file = f.name
+
     result = run_shell(
-        f'gh pr create --title "{pr_title}" --body "{pr_body}" --label ai-generated'
+        f'gh pr create --title {shlex.quote(pr_title)} --body-file {body_file} --label ai-generated'
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/nodes/create_pr.py` around lines 86 - 95, The
current use of run_shell with f'gh pr create --title "{pr_title}" --body
"{pr_body}" --label ai-generated' embeds pr_title/pr_body into a shell string
and can break or allow injection; change create_pr logic to write pr_body to a
temporary file and call run_shell with a safe argument form using --body-file
pointing to that temp file (or switch run_shell to accept an args list and
invoke gh directly), ensure pr_title is sanitized or passed as an argument (not
interpolated into a shell string), and keep the existing label-creation retry
flow by re-running the same safe invocation (using the same --body-file or args
list) after creating the ai-generated label; reference run_shell, pr_title,
pr_body and the retry block for changes.
src/scripts/coverage_agent/nodes/analyze.py-99-100 (1)

99-100: ⚠️ Potential issue | 🟡 Minor

Line exceeds 100 character limit.

Per the pipeline failure, this line is 119 characters. Break it up for compliance.

♻️ Proposed fix
     for target in targets:
-        logger.info("  %.1f%% %s (existing_test=%s)", target.coverage_pct, target.file_path, target.existing_test_path)
+        logger.info(
+            "  %.1f%% %s (existing_test=%s)",
+            target.coverage_pct, target.file_path, target.existing_test_path,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/nodes/analyze.py` around lines 99 - 100, The
logger.info call inside the for target in targets loop exceeds 100 chars;
refactor the long log statement by constructing a shorter message or splitting
arguments before the call — e.g., build a short msg/fields like coverage =
f"{target.coverage_pct:.1f}%" and path = target.file_path (and existing =
target.existing_test_path), then call logger.info with a shorter format string
or multiple concatenated pieces instead of the single 119-char line so the final
logger.info invocation is under 100 characters.
src/scripts/coverage_agent/nodes/analyze.py-122-127 (1)

122-127: ⚠️ Potential issue | 🟡 Minor

Mutation of input data may cause side effects.

Directly modifying entry.file_path mutates the original CoverageEntry objects, which could cause unexpected behavior if the same entries are used elsewhere. Consider creating a copy or normalizing during target creation.

♻️ Proposed fix - normalize during target creation
         for entry in ui_entries:
-            if not entry.file_path.startswith("src/ui/"):
-                entry.file_path = f"src/ui/{entry.file_path}"
+            pass  # Normalization moved to select_targets
         logger.info("UI: %d coverage entries", len(ui_entries))
         all_entries.extend(ui_entries)

Then in select_targets, normalize when creating the target:

file_path = entry.file_path
if not file_path.startswith("src/ui/") and test_type == TestType.UI:
    file_path = f"src/ui/{file_path}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/nodes/analyze.py` around lines 122 - 127, The loop
in analyze.py mutates CoverageEntry.file_path (ui_entries loop) which can cause
side effects; stop changing entry.file_path and instead normalize the path when
constructing targets (e.g., inside select_targets). Change target creation to
use a local variable: file_path = entry.file_path; if test_type == TestType.UI
and not file_path.startswith("src/ui/"): file_path = f"src/ui/{file_path}"; then
use file_path for the target creation (or return a copied/normalized
CoverageEntry) so original CoverageEntry objects remain unchanged.
src/scripts/coverage_agent/respond.py-128-132 (1)

128-132: ⚠️ Potential issue | 🟡 Minor

Fragile error detection via string prefix.

Checking file_content.startswith("Error") is unreliable—a legitimate file could start with "Error". Consider returning a tuple (success, content) or raising an exception from read_file.

♻️ Proposed fix
-    file_content = read_file(comment.file_path)
-    if file_content.startswith("Error"):
+    try:
+        file_content = read_file(comment.file_path)
+    except (FileNotFoundError, OSError) as e:
         _reply_to_comment(pr_number, comment_id, f"Unable to read file: {comment.file_path}")
         return

This requires read_file to raise exceptions on failure instead of returning error strings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/respond.py` around lines 128 - 132, The code
currently treats read_file returning an "Error" string by checking
file_content.startswith("Error"), which is fragile; update read_file to either
raise an exception on failure or return a (success, content) tuple, and change
the caller in respond.py to handle that explicit contract: if read_file raises,
catch the exception around the call in the function that uses file_content
(keeping _reply_to_comment and pr_number/comment_id usage), log/reply with the
exception message and return; or if you adopt (success, content), destructure
into success, file_content and branch on success instead of doing startswith
checks. Ensure you modify read_file's signature and all its call sites
consistently (identify read_file and the caller that currently checks
file_content.startswith("Error")).
src/scripts/coverage_agent/plugins/base.py-1-8 (1)

1-8: ⚠️ Potential issue | 🟡 Minor

Fix pylint failures: add line-length disable and module docstring.

The pipeline reports two pylint violations:

  1. Line 1 exceeds 100 characters (103/100)
  2. Missing module docstring (C0114)

Per coding guidelines, add the pylint disable comment for the copyright line rather than breaking it.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
+# pylint: enable=line-too-long
+"""Base plugin abstractions and shared type utilities for the coverage agent pipeline."""
 
 import dataclasses
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/plugins/base.py` around lines 1 - 8, Append a
pylint disable for line length to the long SPDX copyright line (e.g., add "  #
pylint: disable=line-too-long" to the existing SPDX line) so you don't wrap it,
and add a short module docstring immediately after the SPDX/header lines (for
example a one-line description like "Base classes and types for coverage agent
plugins") to satisfy C0114; keep the rest of imports and code unchanged.
🧹 Nitpick comments (21)
src/scripts/coverage_agent/tools/file_ops.py (2)

1-2: Add # pylint: disable=line-too-long for the copyright header.

Same 103-character copyright line as other files in this package.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tools/file_ops.py` around lines 1 - 2, The
copyright header line in src/scripts/coverage_agent/tools/file_ops.py exceeds
pylint's default line length; add a module-level pylint disable for
line-too-long (e.g., add "# pylint: disable=line-too-long") near the top of the
file (alongside the SPDX header) so pylint will ignore that long copyright line
while leaving other checks intact.

10-11: Specify explicit encoding for file operations.

Both open() calls rely on the platform default encoding, which can vary. Explicitly specify encoding="utf-8" for consistent behavior across environments, especially since this handles source code files.

Proposed fix
-        with open(path) as file:
+        with open(path, encoding="utf-8") as file:
             return file.read()
-        with open(path, "w") as file:
+        with open(path, "w", encoding="utf-8") as file:
             file.write(content)

Also applies to: 22-23

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tools/file_ops.py` around lines 10 - 11, The
open() calls in src/scripts/coverage_agent/tools/file_ops.py use the platform
default encoding; update both occurrences (the open(path) in the read routine
and the other open(...) at lines ~22-23) to pass encoding="utf-8" so source
files are read consistently across environments (i.e., replace open(path) with
open(path, encoding="utf-8") in the functions that read files).
src/scripts/coverage_agent/tests/__init__.py (1)

1-2: Add # pylint: disable=line-too-long for consistency.

Same 103-character copyright line. Add the pylint disable comment for consistency with the coding guidelines.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tests/__init__.py` around lines 1 - 2, The file
src/scripts/coverage_agent/tests/__init__.py has a long copyright line and needs
the pylint directive to suppress line-too-long; add a single-line comment "#
pylint: disable=line-too-long" at the top of the module (e.g., immediately after
the SPDX header in __init__.py) so pylint won't flag the 103-character copyright
line.
src/scripts/coverage_agent/tools/__init__.py (1)

1-2: Add # pylint: disable=line-too-long for consistency.

Same 103-character copyright line as other __init__.py files. Add the pylint disable comment for consistency and to prevent future linting failures. As per coding guidelines, copyright lines exceeding 100 characters should include this disable comment.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tools/__init__.py` around lines 1 - 2, Add a
pylint disable for line-too-long at the top of this package init file to match
other modules: insert the comment "# pylint: disable=line-too-long" immediately
after the SPDX copyright lines in src/scripts/coverage_agent/tools/__init__.py
so the long copyright string is exempt from the 100-character lint rule.
src/scripts/coverage_agent/prompts/__init__.py (1)

1-2: Add # pylint: disable=line-too-long for consistency.

Same 103-character copyright line. Add the pylint disable comment for consistency with the coding guidelines.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/__init__.py` around lines 1 - 2, Insert
the pylint disable comment to suppress line-too-long warnings in this module by
adding the line "# pylint: disable=line-too-long" near the top of
src/scripts/coverage_agent/prompts/__init__.py (e.g., immediately after the SPDX
header lines), so the long copyright line won't trigger pylint; ensure the
comment text matches exactly and keep formatting consistent.
src/scripts/coverage_agent/prompts/quality_rules.py (1)

1-2: Add # pylint: disable=line-too-long for the copyright header.

Same 103-character copyright line as other files in this package.

Proposed fix
+# pylint: disable=line-too-long
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/quality_rules.py` around lines 1 - 2, The
file-level copyright header exceeds the linter line length: add a module-level
pylint disable to suppress this check by inserting "# pylint:
disable=line-too-long" near the top of the module (adjacent to the existing SPDX
header) so the long 103-character copyright line is accepted by pylint; ensure
the directive is placed before or immediately after the SPDX lines to apply to
the whole file.
src/scripts/coverage_agent/plugins/openai_writer.py (1)

10-14: Remove unused imports.

Lines 10-14 import modules that are not used in this file: go_test, python_test, ui_test, file_ops, and shell. These are inherited from NemotronWriter and don't need to be re-imported here.

Proposed fix
 from coverage_agent.plugins.base import GeneratedTest, ValidationResult, WriterPlugin
 from coverage_agent.plugins.nemotron import NemotronWriter
-from coverage_agent.prompts.go_test import GO_TEST_SYSTEM_PROMPT, build_go_prompt
-from coverage_agent.prompts.python_test import PYTHON_TEST_SYSTEM_PROMPT, build_python_prompt
-from coverage_agent.prompts.ui_test import UI_TEST_SYSTEM_PROMPT, build_ui_prompt
-from coverage_agent.tools.file_ops import read_file, write_file
-from coverage_agent.tools.shell import run_shell
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/plugins/openai_writer.py` around lines 10 - 14,
Remove the unused imports at the top of openai_writer.py: delete the references
to GO_TEST_SYSTEM_PROMPT and build_go_prompt (from
coverage_agent.prompts.go_test), PYTHON_TEST_SYSTEM_PROMPT and
build_python_prompt (from coverage_agent.prompts.python_test),
UI_TEST_SYSTEM_PROMPT and build_ui_prompt (from coverage_agent.prompts.ui_test),
and read_file/write_file (from coverage_agent.tools.file_ops) and run_shell
(from coverage_agent.tools.shell) since those utilities are provided by or
inherited from NemotronWriter and are not referenced in this module; keep only
imports actually used in this file to eliminate lint warnings and unused
dependency clutter.
src/scripts/coverage_agent/BUILD (1)

9-9: Glob pattern includes test files in the binary.

glob(["**/*.py"]) will include all Python files including tests/*.py. Test files typically shouldn't be part of the production binary.

Proposed fix: Exclude test files
     srcs = glob(
-        ["**/*.py"],
+        ["**/*.py"],
+        exclude = ["tests/**"],
     ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/BUILD` at line 9, The srcs glob in the BUILD rule
currently uses glob(["**/*.py"]) which pulls test files into the binary; change
the srcs definition to exclude test patterns by replacing glob(["**/*.py"]) with
a glob that adds an exclude list (e.g., exclude=["**/tests/*.py",
"**/*_test.py", "**/test_*.py"]) so the BUILD's srcs no longer include test
modules.
src/scripts/coverage_agent/state.py (1)

16-16: Consider using Literal type for test_type.

The comment indicates valid values are "python" | "go" | "ui". Using Literal would provide type safety and better IDE support.

Proposed enhancement
 import dataclasses
-from typing import Optional, TypedDict
+from typing import Literal, Optional, TypedDict

 from coverage_agent.plugins.base import GeneratedTest


+TestType = Literal["python", "go", "ui"]
+
+
 `@dataclasses.dataclass`
 class CoverageTarget:
     file_path: str
     uncovered_ranges: list[tuple[int, int]]
     coverage_pct: float
     existing_test_path: Optional[str]
-    test_type: str  # "python" | "go" | "ui"
+    test_type: TestType
     build_package: str  # Bazel package path, e.g. "//src/utils/job"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/state.py` at line 16, Change the test_type
annotation from a plain str to a Literal to enforce the allowed values; import
Literal (from typing or typing_extensions as appropriate) and update the
annotation to Literal['python', 'go', 'ui'] for the test_type field (refer to
the test_type declaration in the dataclass/module) so IDEs and type checkers
will validate the three allowed options.
src/scripts/coverage_agent/tests/test_graph.py (1)

10-33: Consider adding lcov_path to the defaults for consistency.

The _make_state helper includes ui_lcov_path but omits lcov_path, which is present in main.py's initial_state. While this may not affect the routing functions under test, adding it ensures the helper produces a complete CoverageState for future test scenarios.

Suggested addition
         "dry_run": False,
+        "lcov_path": "bazel-out/_coverage/_coverage_report.dat",
         "ui_lcov_path": "src/ui/coverage/lcov.info",
         "errors": [],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tests/test_graph.py` around lines 10 - 33, The
test helper _make_state returns a CoverageState lacking the lcov_path key
present in the application's initial_state, so add a sensible default
"lcov_path" (e.g., same value as "ui_lcov_path" or the app's expected lcov path)
to the defaults dict in _make_state to keep the test CoverageState consistent
with main.py's initial_state; update the defaults variable inside the
_make_state function (which constructs CoverageState) to include "lcov_path":
"<appropriate default>".
src/scripts/coverage_agent/tests/test_analyze.py (1)

38-39: Consider using Path.touch() for cleaner file creation.

The open(...).close() pattern works but pathlib.Path.touch() is more idiomatic for creating empty files.

Suggested refactor
+from pathlib import Path
+
 # In test methods:
-            open(source, "w").close()
-            open(test_file, "w").close()
+            Path(source).touch()
+            Path(test_file).touch()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tests/test_analyze.py` around lines 38 - 39,
Replace the open(...).close() calls with pathlib.Path.touch() for idiomatic
empty-file creation: convert the variables source and test_file to Path objects
if they are strings (e.g., Path(source).touch()) or call
source.touch()/test_file.touch() if they are already Path instances, and add
"from pathlib import Path" import if missing; ensure parents exist beforehand
(Path(...).parent.mkdir(parents=True, exist_ok=True)) when needed.
.github/workflows/coverage-agent.yaml (1)

121-127: Redundant error suppression.

Line 126 has || true in the command, and line 127 has continue-on-error: true. Only one is needed—the step-level continue-on-error is cleaner as it preserves the actual exit code in logs for debugging.

Proposed fix
       run: |
         pnpm install --frozen-lockfile
         # Ensure lcov reporter is included (produces coverage/lcov.info for the agent)
-        npx vitest run --coverage --coverage.reporter=lcov --coverage.reporter=text || true
+        npx vitest run --coverage --coverage.reporter=lcov --coverage.reporter=text
       continue-on-error: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/coverage-agent.yaml around lines 121 - 127, Remove the
redundant command-level error suppression by deleting the trailing "|| true"
from the run line that invokes "npx vitest run --coverage
--coverage.reporter=lcov --coverage.reporter=text || true" in the "Run UI
coverage" step; keep the step-level "continue-on-error: true" so the workflow
preserves the real exit code in logs while still allowing the job to continue.
src/scripts/coverage_agent/lcov_parser.py (1)

69-69: Specify explicit encoding when opening files.

open(lcov_path) should specify encoding="utf-8" for consistent cross-platform behavior.

Proposed fix
-    with open(lcov_path) as file:
+    with open(lcov_path, encoding="utf-8") as file:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/lcov_parser.py` at line 69, Replace the bare file
open call "with open(lcov_path) as file:" so it explicitly sets the encoding
(e.g., encoding="utf-8") when opening the LCOV file; locate the open call that
uses the lcov_path variable in lcov_parser.py and update it to open with
encoding="utf-8" (and optionally errors="replace") to ensure consistent
cross-platform handling of the file read.
src/scripts/coverage_agent/tools/test_runner.py (1)

65-68: Specify explicit encoding when opening files.

open(part) uses the system default encoding, which can vary. Specify encoding="utf-8" for consistent behavior.

Proposed fix
-                with open(part) as log_file:
+                with open(part, encoding="utf-8") as log_file:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tools/test_runner.py` around lines 65 - 68, The
file-reading block uses open(part) without an explicit encoding; change the open
call in the try block that reads test log parts (the one creating log_file and
reading into output using part and log_file) to specify encoding="utf-8" (e.g.,
open(part, encoding="utf-8")) so file reads are consistent across environments.
src/scripts/coverage_agent/tests/test_tools.py (2)

29-49: Temp directories created by mkdtemp() are not cleaned up.

tempfile.mkdtemp() creates directories that persist after tests. Consider using tempfile.TemporaryDirectory() as a context manager or adding cleanup in finally blocks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tests/test_tools.py` around lines 29 - 49, The
tests in TestWriteFile (test_write_file_creates_file and
test_write_file_creates_parent_dirs) use tempfile.mkdtemp() which leaves temp
directories behind; update these tests to use tempfile.TemporaryDirectory() as a
context manager (or explicitly remove the created temp dir in the finally block)
so the temporary directories are removed after the test, and ensure the existing
cleanup that unlinks the file still runs; modify usages of mkdtemp() to
TemporaryDirectory() around the path construction for write_file to guarantee
automatic cleanup.

4-10: Unused import: signal.

The signal module is imported but never used in this file.

Proposed fix
 import os
-import signal
 import tempfile
 import unittest
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tests/test_tools.py` around lines 4 - 10, Remove
the unused import of the signal module in the test file: locate the import line
that includes "import signal" in the top of the tests (alongside imports for os,
tempfile, unittest, read_file/write_file, run_shell) and delete that token so
only required modules are imported; run the test linter to confirm no
unused-import warnings remain.
src/scripts/coverage_agent/plugins/claude_code.py (1)

227-239: Specify explicit encoding when opening files.

The open() calls at lines 228 and 235 don't specify encoding, which can cause issues on systems with non-UTF-8 default encodings.

♻️ Proposed fix
         try:
-            with open(test_file_path) as file:
+            with open(test_file_path, encoding="utf-8") as file:
                 test_content = file.read()

             build_dir = os.path.dirname(test_file_path)
             build_path = os.path.join(build_dir, "BUILD")
             build_entry = None
             try:
-                with open(build_path) as file:
+                with open(build_path, encoding="utf-8") as file:
                     build_entry = file.read()
             except FileNotFoundError:
                 pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/plugins/claude_code.py` around lines 227 - 239,
The open() calls reading test_file_path and build_path do not specify an
encoding; update the file reads (where test_content is assigned and where
build_entry is read) to open files with an explicit encoding (e.g.,
encoding="utf-8") and explicit mode ("r") to avoid platform-dependent
defaults—modify the open(...) usages that reference test_file_path and
build_path in claude_code.py so they include these parameters and keep the
existing FileNotFoundError handling for build_entry.
src/scripts/coverage_agent/nodes/quality_gate.py (3)

62-72: Regex won't match async test methods.

The pattern def (test_\w+) won't capture async def test_* methods, which are common in pytest-asyncio codebases. Consider supporting both sync and async patterns if you expect to generate async tests.

♻️ Optional fix to support async test methods
-        method_pattern = r"def (test_\w+)\(self.*?\):(.*?)(?=\n    def |\nclass |\Z)"
+        method_pattern = r"(?:async\s+)?def (test_\w+)\(self.*?\):(.*?)(?=\n\s+(?:async\s+)?def |\nclass |\Z)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/nodes/quality_gate.py` around lines 62 - 72, The
existing method_pattern only matches "def (test_...)" and thus misses async test
methods; update the regex used in the test_type == "python" branch (the variable
method_pattern) to accept an optional "async" prefix (e.g., allow
"(?:async\s+)?" before "def") while keeping DOTALL and the same lookahead so
methods = re.findall(method_pattern, content, re.DOTALL) still returns (name,
body); then run the same assertion check against PYTHON_ASSERTION_PATTERNS and
append to issues as before for name/body pairs that lack assertions.

208-211: Warnings appended to errors list creates semantic confusion.

Quality warnings are being appended to the errors list (line 211), mixing different severity levels. Consider using a separate warnings field in CoverageState or prefixing with a clear marker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/nodes/quality_gate.py` around lines 208 - 211, The
code is appending result.warnings into the errors list (errors.append(...))
which mixes severities; change handling to record warnings separately: add or
use a warnings collection on CoverageState (e.g., CoverageState.warnings) and
append f"Quality warnings for {file_path}: {'; '.join(result.warnings)}" to that
list instead of errors, keeping the existing logger.info("  Warning: %s",
warning) loop unchanged; if you cannot add a field, alternatively prefix the
message (e.g., "WARNING: ...") before appending to errors so severity is
explicit.

192-198: Specify explicit encoding when opening files.

The open() call doesn't specify encoding, which can cause issues on systems with non-UTF-8 default encodings.

♻️ Proposed fix
-            with open(file_path) as file:
+            with open(file_path, encoding="utf-8") as file:
                 content = file.read()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/nodes/quality_gate.py` around lines 192 - 198, The
open(file_path) call in the Quality Gate code does not specify an encoding which
can lead to platform-dependent decoding errors; update the file reading in the
try block that uses file_path to open the file with an explicit encoding (e.g.,
encoding="utf-8") and keep the existing FileNotFoundError handling and
logger.warning / errors.append behavior intact so logger and errors usage
remains unchanged.
src/scripts/coverage_agent/tests/test_integration.py (1)

7-7: Remove unused imports.

MagicMock and patch are imported but not used in this test file.

♻️ Proposed fix
-from unittest.mock import MagicMock, patch
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tests/test_integration.py` at line 7, Remove the
unused imports MagicMock and patch from the top-level import statement in this
test file (the "from unittest.mock import MagicMock, patch" line in
test_integration.py); simply delete the unused names so the file only imports
what it actually uses to avoid lint errors and clutter.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9550ee2d-3490-4186-9fdd-919a505db35b

📥 Commits

Reviewing files that changed from the base of the PR and between cb95814 and 2c8cba2.

📒 Files selected for processing (41)
  • .github/workflows/coverage-agent-respond.yaml
  • .github/workflows/coverage-agent.yaml
  • src/agent_requirements.txt
  • src/scripts/coverage_agent/BUILD
  • src/scripts/coverage_agent/__init__.py
  • src/scripts/coverage_agent/graph.py
  • src/scripts/coverage_agent/lcov_parser.py
  • src/scripts/coverage_agent/main.py
  • src/scripts/coverage_agent/nodes/__init__.py
  • src/scripts/coverage_agent/nodes/analyze.py
  • src/scripts/coverage_agent/nodes/create_pr.py
  • src/scripts/coverage_agent/nodes/quality_gate.py
  • src/scripts/coverage_agent/nodes/validate.py
  • src/scripts/coverage_agent/nodes/write.py
  • src/scripts/coverage_agent/plugins/__init__.py
  • src/scripts/coverage_agent/plugins/base.py
  • src/scripts/coverage_agent/plugins/claude_code.py
  • src/scripts/coverage_agent/plugins/nemotron.py
  • src/scripts/coverage_agent/plugins/openai_writer.py
  • src/scripts/coverage_agent/prompts/__init__.py
  • src/scripts/coverage_agent/prompts/go_test.py
  • src/scripts/coverage_agent/prompts/python_test.py
  • src/scripts/coverage_agent/prompts/quality_rules.py
  • src/scripts/coverage_agent/prompts/ui_test.py
  • src/scripts/coverage_agent/respond.py
  • src/scripts/coverage_agent/state.py
  • src/scripts/coverage_agent/tests/__init__.py
  • src/scripts/coverage_agent/tests/test_analyze.py
  • src/scripts/coverage_agent/tests/test_graph.py
  • src/scripts/coverage_agent/tests/test_integration.py
  • src/scripts/coverage_agent/tests/test_lcov_parser.py
  • src/scripts/coverage_agent/tests/test_plugins_base.py
  • src/scripts/coverage_agent/tests/test_quality_gate.py
  • src/scripts/coverage_agent/tests/test_respond.py
  • src/scripts/coverage_agent/tests/test_state.py
  • src/scripts/coverage_agent/tests/test_tools.py
  • src/scripts/coverage_agent/tools/__init__.py
  • src/scripts/coverage_agent/tools/bazel_query.py
  • src/scripts/coverage_agent/tools/file_ops.py
  • src/scripts/coverage_agent/tools/shell.py
  • src/scripts/coverage_agent/tools/test_runner.py

Comment thread src/scripts/testbot/BUILD
Comment thread src/scripts/coverage_agent/graph.py Outdated
Comment thread src/scripts/coverage_agent/main.py Outdated
Comment thread src/scripts/coverage_agent/nodes/create_pr.py Outdated
Comment thread src/scripts/coverage_agent/plugins/__init__.py Outdated
Comment thread src/scripts/coverage_agent/plugins/openai_writer.py Outdated
Comment thread src/scripts/coverage_agent/respond.py Outdated
Comment thread src/scripts/coverage_agent/respond.py Outdated
Comment thread src/scripts/testbot/tools/bazel_query.py
Comment thread src/scripts/coverage_agent/tools/shell.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
src/scripts/coverage_agent/respond.py (1)

95-101: ⚠️ Potential issue | 🟠 Major

Command injection risk persists—escaping is inadequate.

The escaping on line 97 (message.replace('"', '\\"')) only handles double quotes. Shell metacharacters like backticks, $(), newlines, and others remain unescaped, allowing command injection via malicious comment bodies.

🔒 Proposed fix using subprocess directly
+import subprocess
+
 def _reply_to_comment(pr_number: int, comment_id: int, message: str) -> None:
     """Post a reply to a review comment."""
-    escaped_message = message.replace('"', '\\"')
-    run_shell(
-        f'gh api repos/{{owner}}/{{repo}}/pulls/{pr_number}/comments/{comment_id}/replies '
-        f'-f body="{escaped_message}"'
-    )
+    subprocess.run(
+        [
+            "gh", "api",
+            f"repos/{{owner}}/{{repo}}/pulls/{pr_number}/comments/{comment_id}/replies",
+            "-f", f"body={message}",
+        ],
+        check=False,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/respond.py` around lines 95 - 101,
_reply_to_comment currently sanitizes only double quotes and builds a shell
string passed to run_shell, leaving it vulnerable to shell metacharacters and
command injection; change it to avoid shell interpolation by calling the GitHub
API with a safe subprocess call (or the existing run_shell abstraction) that
accepts an argument list or uses stdin/JSON payloads rather than a formatted
shell string: stop building f'... -f body="{escaped_message}"', instead pass the
body as a separate argument (or send JSON via stdin with subprocess.run([...],
input=body_bytes, check=True)) so the message content is never executed by a
shell; update references to _reply_to_comment and the call site of run_shell to
use the safe-argument approach.
🧹 Nitpick comments (6)
src/scripts/coverage_agent/prompts/python_test.py (1)

46-46: Remove extraneous f prefix from string literal.

Same issue as go_test.py - line 46 uses an f-string without placeholders.

Proposed fix
-    prompt = f"Generate unit tests for the following Python source file.\n\n"
+    prompt = "Generate unit tests for the following Python source file.\n\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/python_test.py` at line 46, The string
assigned to variable prompt in python_test.py is using an unnecessary f-string
with no placeholders; change the assignment of prompt to use a regular string
literal (remove the leading f) so it's a normal string ("Generate unit tests for
the following Python source file.\n\n"), ensuring the prompt variable remains
unchanged otherwise.
src/scripts/coverage_agent/tools/test_runner.py (1)

20-36: Return type Optional[str] is misleading; function always returns a string.

The function _file_path_to_bazel_target declares return type Optional[str], but it always returns a string—either the discovered target (line 32) or the fallback (line 36). The fallback path is unconditional, so None is never returned.

Proposed fix
-def _file_path_to_bazel_target(file_path: str) -> Optional[str]:
+def _file_path_to_bazel_target(file_path: str) -> str:

And remove the unused Optional import if no longer needed elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/tools/test_runner.py` around lines 20 - 36, The
function _file_path_to_bazel_target currently annotates its return type as
Optional[str] but always returns a string (either the discovered target or the
fallback), so change the return annotation to str; also remove the Optional
import from the module if it's no longer used elsewhere, and leave the
implementation of _file_path_to_bazel_target (package, basename, run_shell call,
fallback) unchanged.
src/scripts/coverage_agent/prompts/go_test.py (1)

31-31: Remove extraneous f prefix from string literal.

Line 31 uses an f-string without any placeholders. This is flagged by Ruff (F541).

Proposed fix
-    prompt = f"Generate Go unit tests for the following source file.\n\n"
+    prompt = "Generate Go unit tests for the following source file.\n\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/go_test.py` at line 31, The string
assigned to variable prompt in src/scripts/coverage_agent/prompts/go_test.py is
using an unnecessary f-string (no placeholders); remove the leading f so the
assignment uses a plain string literal (e.g., change prompt = f"Generate Go unit
tests..." to prompt = "Generate Go unit tests...") to satisfy the Ruff F541
warning and keep the rest of the logic in the same function/module unchanged.
src/scripts/coverage_agent/prompts/ui_test.py (1)

32-32: Remove extraneous f prefix from string literal.

Same issue as the other prompt modules—line 32 uses an f-string without placeholders.

Proposed fix
-    prompt = f"Generate Vitest tests for the following TypeScript source file.\n\n"
+    prompt = "Generate Vitest tests for the following TypeScript source file.\n\n"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/prompts/ui_test.py` at line 32, The prompt string
assigned to the variable prompt in ui_test.py is using an unnecessary f-string
(f"Generate Vitest tests for the following TypeScript source file.\n\n") even
though there are no placeholders; remove the extraneous f prefix and make it a
normal string (e.g., "Generate Vitest tests for the following TypeScript source
file.\n\n") so the prompt assignment in that module/function (prompt variable)
uses a plain string literal.
src/scripts/coverage_agent/graph.py (1)

7-11: Add type annotations for fallback assignments.

The fallback assignments for END and StateGraph should include type ignore comments to satisfy type checkers. This was partially addressed from the previous review but the type annotations are missing.

♻️ Proposed fix
 try:
     from langgraph.graph import END, StateGraph
 except ImportError:
-    END = None
-    StateGraph = None
+    END = None  # type: ignore[misc,assignment]
+    StateGraph = None  # type: ignore[misc,assignment]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/graph.py` around lines 7 - 11, In the try/except
fallback in graph.py, annotate the fallback assignments for END and StateGraph
so the type checker is satisfied: either import typing.Any and declare END: Any
= None and StateGraph: Any = None, or append explicit assignment comments like
"END = None  # type: ignore" and "StateGraph = None  # type: ignore" inside the
except ImportError block to silence type errors when the import fails.
src/scripts/coverage_agent/plugins/nemotron.py (1)

10-13: Add type annotation for ChatNVIDIA fallback assignment.

The fallback ChatNVIDIA = None on line 13 lacks a type annotation, which may cause mypy issues. Add a type ignore comment.

♻️ Proposed fix
 try:
     from langchain_nvidia_ai_endpoints import ChatNVIDIA
 except ImportError:
-    ChatNVIDIA = None
+    ChatNVIDIA = None  # type: ignore[misc,assignment]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/plugins/nemotron.py` around lines 10 - 13, The
fallback assignment for ChatNVIDIA lacks a type annotation and can trigger mypy
errors; update the except block to annotate the fallback (e.g., change
"ChatNVIDIA = None" to "ChatNVIDIA = None  # type: ignore" or "ChatNVIDIA: Any =
None  # type: ignore" and add an appropriate import for Any if used) so the
ChatNVIDIA symbol has a mypy-safe fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scripts/coverage_agent/nodes/create_pr.py`:
- Around line 87-96: The gh CLI invocation embeds pr_title and pr_body in double
quotes which breaks for quotes, backticks, or $ chars; fix by passing a safely
quoted title (use shlex.quote on pr_title before building the command) and
writing pr_body to a temporary file then using gh's --body-file flag (or,
alternately, extend run_shell to accept an input parameter and pipe the body via
stdin). Update both the initial call and the retry after label creation in
create_pr.py (references: run_shell, pr_title, pr_body) to use the quoted title
and --body-file with proper tempfile creation and cleanup, and leave
label-creation logic unchanged except to re-run the new command form.
- Around line 38-66: Sanitize all interpolated shell arguments before calling
run_shell to prevent shell injection: use shlex.quote on branch_name wherever
used (the git checkout and git push calls) and on each entry of
state["generated_files"] when building the files string for git add (quote every
filename individually); also pass a safely quoted commit_message to git commit
by shlex.quote(commit_message) instead of embedding raw double quotes. Locate
uses of run_shell, branch_name, commit_message, and state["generated_files"] in
create_pr.py and replace direct f-string interpolation with shlex.quote-wrapped
values.

In `@src/scripts/coverage_agent/nodes/quality_gate.py`:
- Around line 63-72: The current method_pattern assumes exactly four spaces ("\n
def ") which breaks on tabs or different indentation; update method_pattern to
use a whitespace-aware delimiter (e.g. use "\n\s+def " and "\n\s*class " in the
lookahead) and enable re.MULTILINE when calling re.findall so boundaries are
detected regardless of indentation. Specifically, change the method_pattern
variable and the re.findall call that uses it (referencing method_pattern and
the re.findall(..., re.DOTALL) invocation) so the lookahead uses "\n\s+def " and
"\n\s*class " (or equivalent \s patterns) and add re.MULTILINE to the flags.

In `@src/scripts/coverage_agent/respond.py`:
- Around line 133-136: The current check using file_content.startswith("Error")
is fragile; change read_file to use an explicit error signaling mechanism (e.g.,
raise an exception on failure or return a (success, content) tuple/None) and
then update respond.py to handle that explicitly: call read_file inside a
try/except (or inspect the success flag/None), and on failure call
_reply_to_comment(pr_number, comment_id, f"Unable to read file:
{comment.file_path}") and return; otherwise proceed with the valid file_content.
Ensure you update all usages of read_file accordingly and remove the
startswith("Error") check.
- Around line 178-181: The shell commands use unsanitized comment.file_path when
building git commands in run_shell, which can be unsafe; update the calls that
build commands (the three run_shell invocations around git add, git commit, and
git push in respond.py) to quote the file path using
shlex.quote(comment.file_path) before interpolation so paths are safely escaped,
and ensure the quoted value is used in the commit message and git add invocation
while leaving git push unchanged.

In `@src/scripts/coverage_agent/tools/shell.py`:
- Around line 18-41: The shell injection risk is that callers pass unescaped
strings into run_shell(...) which uses shell=True; update all call sites in
respond.py to consistently escape shell-injected variables using shlex.quote():
replace the ad-hoc message.replace('"', '\\"') in _reply_to_comment with either
JSON-safe invocation (use gh api -f with a JSON payload) or wrap the message
argument with shlex.quote(message) when invoking the shell, and for git
operations wrap comment.file_path with shlex.quote(comment.file_path) (used in
the git add / git commit calls). Use the same pattern as test_runner.py
(shlex.quote(...)) so all strings passed into shell commands are properly
escaped.

---

Duplicate comments:
In `@src/scripts/coverage_agent/respond.py`:
- Around line 95-101: _reply_to_comment currently sanitizes only double quotes
and builds a shell string passed to run_shell, leaving it vulnerable to shell
metacharacters and command injection; change it to avoid shell interpolation by
calling the GitHub API with a safe subprocess call (or the existing run_shell
abstraction) that accepts an argument list or uses stdin/JSON payloads rather
than a formatted shell string: stop building f'... -f body="{escaped_message}"',
instead pass the body as a separate argument (or send JSON via stdin with
subprocess.run([...], input=body_bytes, check=True)) so the message content is
never executed by a shell; update references to _reply_to_comment and the call
site of run_shell to use the safe-argument approach.

---

Nitpick comments:
In `@src/scripts/coverage_agent/graph.py`:
- Around line 7-11: In the try/except fallback in graph.py, annotate the
fallback assignments for END and StateGraph so the type checker is satisfied:
either import typing.Any and declare END: Any = None and StateGraph: Any = None,
or append explicit assignment comments like "END = None  # type: ignore" and
"StateGraph = None  # type: ignore" inside the except ImportError block to
silence type errors when the import fails.

In `@src/scripts/coverage_agent/plugins/nemotron.py`:
- Around line 10-13: The fallback assignment for ChatNVIDIA lacks a type
annotation and can trigger mypy errors; update the except block to annotate the
fallback (e.g., change "ChatNVIDIA = None" to "ChatNVIDIA = None  # type:
ignore" or "ChatNVIDIA: Any = None  # type: ignore" and add an appropriate
import for Any if used) so the ChatNVIDIA symbol has a mypy-safe fallback.

In `@src/scripts/coverage_agent/prompts/go_test.py`:
- Line 31: The string assigned to variable prompt in
src/scripts/coverage_agent/prompts/go_test.py is using an unnecessary f-string
(no placeholders); remove the leading f so the assignment uses a plain string
literal (e.g., change prompt = f"Generate Go unit tests..." to prompt =
"Generate Go unit tests...") to satisfy the Ruff F541 warning and keep the rest
of the logic in the same function/module unchanged.

In `@src/scripts/coverage_agent/prompts/python_test.py`:
- Line 46: The string assigned to variable prompt in python_test.py is using an
unnecessary f-string with no placeholders; change the assignment of prompt to
use a regular string literal (remove the leading f) so it's a normal string
("Generate unit tests for the following Python source file.\n\n"), ensuring the
prompt variable remains unchanged otherwise.

In `@src/scripts/coverage_agent/prompts/ui_test.py`:
- Line 32: The prompt string assigned to the variable prompt in ui_test.py is
using an unnecessary f-string (f"Generate Vitest tests for the following
TypeScript source file.\n\n") even though there are no placeholders; remove the
extraneous f prefix and make it a normal string (e.g., "Generate Vitest tests
for the following TypeScript source file.\n\n") so the prompt assignment in that
module/function (prompt variable) uses a plain string literal.

In `@src/scripts/coverage_agent/tools/test_runner.py`:
- Around line 20-36: The function _file_path_to_bazel_target currently annotates
its return type as Optional[str] but always returns a string (either the
discovered target or the fallback), so change the return annotation to str; also
remove the Optional import from the module if it's no longer used elsewhere, and
leave the implementation of _file_path_to_bazel_target (package, basename,
run_shell call, fallback) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1449cf70-001d-4ee6-82f7-f7baebe2b92a

📥 Commits

Reviewing files that changed from the base of the PR and between 2c8cba2 and d3dfbf6.

📒 Files selected for processing (37)
  • src/scripts/coverage_agent/__init__.py
  • src/scripts/coverage_agent/graph.py
  • src/scripts/coverage_agent/lcov_parser.py
  • src/scripts/coverage_agent/main.py
  • src/scripts/coverage_agent/nodes/__init__.py
  • src/scripts/coverage_agent/nodes/analyze.py
  • src/scripts/coverage_agent/nodes/create_pr.py
  • src/scripts/coverage_agent/nodes/quality_gate.py
  • src/scripts/coverage_agent/nodes/validate.py
  • src/scripts/coverage_agent/nodes/write.py
  • src/scripts/coverage_agent/plugins/__init__.py
  • src/scripts/coverage_agent/plugins/base.py
  • src/scripts/coverage_agent/plugins/claude_code.py
  • src/scripts/coverage_agent/plugins/nemotron.py
  • src/scripts/coverage_agent/plugins/openai_writer.py
  • src/scripts/coverage_agent/prompts/__init__.py
  • src/scripts/coverage_agent/prompts/go_test.py
  • src/scripts/coverage_agent/prompts/python_test.py
  • src/scripts/coverage_agent/prompts/quality_rules.py
  • src/scripts/coverage_agent/prompts/ui_test.py
  • src/scripts/coverage_agent/respond.py
  • src/scripts/coverage_agent/state.py
  • src/scripts/coverage_agent/tests/__init__.py
  • src/scripts/coverage_agent/tests/test_analyze.py
  • src/scripts/coverage_agent/tests/test_graph.py
  • src/scripts/coverage_agent/tests/test_integration.py
  • src/scripts/coverage_agent/tests/test_lcov_parser.py
  • src/scripts/coverage_agent/tests/test_plugins_base.py
  • src/scripts/coverage_agent/tests/test_quality_gate.py
  • src/scripts/coverage_agent/tests/test_respond.py
  • src/scripts/coverage_agent/tests/test_state.py
  • src/scripts/coverage_agent/tests/test_tools.py
  • src/scripts/coverage_agent/tools/__init__.py
  • src/scripts/coverage_agent/tools/bazel_query.py
  • src/scripts/coverage_agent/tools/file_ops.py
  • src/scripts/coverage_agent/tools/shell.py
  • src/scripts/coverage_agent/tools/test_runner.py
✅ Files skipped from review due to trivial changes (15)
  • src/scripts/coverage_agent/tests/init.py
  • src/scripts/coverage_agent/prompts/init.py
  • src/scripts/coverage_agent/nodes/init.py
  • src/scripts/coverage_agent/init.py
  • src/scripts/coverage_agent/tools/init.py
  • src/scripts/coverage_agent/tools/file_ops.py
  • src/scripts/coverage_agent/tools/bazel_query.py
  • src/scripts/coverage_agent/plugins/openai_writer.py
  • src/scripts/coverage_agent/tests/test_analyze.py
  • src/scripts/coverage_agent/prompts/quality_rules.py
  • src/scripts/coverage_agent/tests/test_plugins_base.py
  • src/scripts/coverage_agent/tests/test_state.py
  • src/scripts/coverage_agent/tests/test_graph.py
  • src/scripts/coverage_agent/state.py
  • src/scripts/coverage_agent/plugins/base.py
🚧 Files skipped from review as they are similar to previous changes (10)
  • src/scripts/coverage_agent/main.py
  • src/scripts/coverage_agent/nodes/write.py
  • src/scripts/coverage_agent/nodes/validate.py
  • src/scripts/coverage_agent/tests/test_tools.py
  • src/scripts/coverage_agent/lcov_parser.py
  • src/scripts/coverage_agent/tests/test_quality_gate.py
  • src/scripts/coverage_agent/tests/test_respond.py
  • src/scripts/coverage_agent/tests/test_lcov_parser.py
  • src/scripts/coverage_agent/tests/test_integration.py
  • src/scripts/coverage_agent/nodes/analyze.py

Comment thread src/scripts/testbot/nodes/create_pr.py
Comment thread src/scripts/coverage_agent/nodes/create_pr.py Outdated
Comment thread src/scripts/testbot/nodes/quality_gate.py Outdated
Comment thread src/scripts/testbot/respond.py Outdated
Comment thread src/scripts/testbot/respond.py Outdated
Comment thread src/scripts/testbot/tools/shell.py
@jiaenren jiaenren marked this pull request as draft March 25, 2026 06:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
src/scripts/coverage_agent/respond.py (1)

95-100: ⚠️ Potential issue | 🔴 Critical

Stop interpolating GitHub-controlled text into shell commands.

message and comment.file_path both come from the GitHub API, and this file already relies on shell parsing (2>/dev/null, || true). Replacing " alone is not enough here; a crafted review body or path can still alter the executed command.

Also applies to: 179-180

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/coverage_agent/respond.py` around lines 95 - 100, The code in
_reply_to_comment (and the other comment-posting call around lines 179-180)
dangerously interpolates GitHub-controlled strings into a shell f-string; stop
building a single shell command with message or comment.file_path inserted.
Instead invoke the GitHub API without shell interpolation by using
subprocess.run (or subprocess.check_call) with a list of arguments (e.g.,
["gh","api","repos/{owner}/{repo}/pulls/{pr_number}/comments/{comment_id}/replies","-f","body="+message])
or by passing the body via stdin/--input file (gh supports -f body=@- or reading
from a temporary file) so that message and file_path are never parsed by the
shell; update both _reply_to_comment and the other comment-posting site to
construct the command as an argument list or feed the message via stdin to
prevent command injection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scripts/coverage_agent/plugins/claude_code.py`:
- Line 195: The parameter named `_build_package` should be renamed to
`build_package` in the function/method signature (e.g., the function that
currently declares `_build_package`) and then explicitly marked unused inside
the function body with a `del build_package` statement to satisfy pylint's
invalid-name rule; apply the same pattern for other underscore-prefixed
parameters in that signature (rename to no-leading-underscore and add
corresponding `del <name>` lines inside the function).

In `@src/scripts/coverage_agent/prompts/python_test.py`:
- Around line 47-59: The prompt builder inserts raw content into triple-backtick
fences, so any backticks inside source_content, existing_test_content, or
reference_test_content can close the fence and corrupt the prompt; add a helper
function _format_code_block(content: str, language: str = "python") that picks a
backtick fence long enough to not appear in content (e.g., start with "```" and
append backticks while the fence exists in content), then return the fenced
block with the language tag and content, and replace direct string interpolation
in build_python_prompt (references: build_python_prompt, source_content,
existing_test_content, reference_test_content) to use _format_code_block(...)
for each inserted block.

In `@src/scripts/coverage_agent/respond.py`:
- Around line 83-92: The mutating shell calls in _update_response_count (and the
similar block around the other label/edit operations) are treated as
fire-and-forget via run_shell, so failures are ignored; change them to check the
command result and abort/raise on the first non-zero exit so the bot doesn't
pretend a change landed. Modify run_shell usage (or update run_shell itself) to
return exit status/raise on failure, then in _update_response_count call the
commands sequentially (remove-label, label create, add-label) and if any returns
non-zero propagate an exception or return an error so the caller can stop
processing and log/surface the failure; also apply the same pattern to the other
mutating GH/GIT command sequence referenced in the diff. Ensure error messages
include the failing command and its output.
- Around line 121-122: The current quota check (using _get_response_count and
should_skip_comment) races because it is only read up front and only updated on
the final success path; to fix, implement an atomic "reserve slot" step that
increments the stored agent-responses count before any automated reply
(including error and quality-gate replies) or serialize per-PR operations; add a
new helper (e.g., reserve_response_slot(pr_number) or acquire_response_quota)
that attempts to increment/persist the counter and returns success/failure, call
it at the top of every reply path that currently reads
_get_response_count/should_skip_comment (including the branches referenced
around the current call and at the other spots you noted) and only proceed to
send any comment if the reservation succeeded; ensure the final success path
keeps the counter consistent (or releases on early failures) so concurrent jobs
cannot both proceed from a stale count.
- Around line 150-156: The call to writer.generate_test is passing reviewer
guidance via retry_context (retry_context=fix_prompt), which causes the writer
to treat it as prior-failure context; instead, remove fix_prompt from
retry_context for the initial generation (set retry_context to None/empty) and
pass the reviewer guidance through a dedicated field (e.g.,
reviewer_instructions, initial_prompt, or review_guidance) supported by
writer.generate_test; update the call site that constructs generated to use the
new parameter and ensure retry_context is only used for true retry/failure
state.

---

Duplicate comments:
In `@src/scripts/coverage_agent/respond.py`:
- Around line 95-100: The code in _reply_to_comment (and the other
comment-posting call around lines 179-180) dangerously interpolates
GitHub-controlled strings into a shell f-string; stop building a single shell
command with message or comment.file_path inserted. Instead invoke the GitHub
API without shell interpolation by using subprocess.run (or
subprocess.check_call) with a list of arguments (e.g.,
["gh","api","repos/{owner}/{repo}/pulls/{pr_number}/comments/{comment_id}/replies","-f","body="+message])
or by passing the body via stdin/--input file (gh supports -f body=@- or reading
from a temporary file) so that message and file_path are never parsed by the
shell; update both _reply_to_comment and the other comment-posting site to
construct the command as an argument list or feed the message via stdin to
prevent command injection.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d04f7157-cc65-43ae-ab7e-459c0adccae2

📥 Commits

Reviewing files that changed from the base of the PR and between cdda7c3 and bf5b19e.

📒 Files selected for processing (7)
  • src/scripts/coverage_agent/plugins/base.py
  • src/scripts/coverage_agent/plugins/claude_code.py
  • src/scripts/coverage_agent/plugins/openai_writer.py
  • src/scripts/coverage_agent/prompts/go_test.py
  • src/scripts/coverage_agent/prompts/python_test.py
  • src/scripts/coverage_agent/prompts/ui_test.py
  • src/scripts/coverage_agent/respond.py
✅ Files skipped from review due to trivial changes (2)
  • src/scripts/coverage_agent/prompts/ui_test.py
  • src/scripts/coverage_agent/plugins/base.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/scripts/coverage_agent/plugins/openai_writer.py
  • src/scripts/coverage_agent/prompts/go_test.py

Comment thread src/scripts/coverage_agent/plugins/claude_code.py Outdated
Comment thread src/scripts/coverage_agent/prompts/python_test.py Outdated
Comment thread src/scripts/coverage_agent/respond.py Outdated
Comment thread src/scripts/testbot/respond.py Outdated
Comment thread src/scripts/coverage_agent/respond.py Outdated
jiaenren added a commit that referenced this pull request Mar 25, 2026
…t escaping

Fixes 7 valid findings from CodeRabbit review on PR #743:

Shell injection hardening (create_pr.py, respond.py):
- All file paths now wrapped with shlex.quote() in git/gh commands
- Commit messages written to temp files, passed via git commit -F
  (avoids shell metacharacter issues with inline -m)
- PR body written to temp file, passed via gh pr create --body-file
- Reply message written to temp file, passed via gh api -F body=@file
- Temp files cleaned up in finally blocks

Prompt fence escaping (prompts/):
- Add escape_fenced_content() in prompts/__init__.py that inserts
  zero-width space in ``` sequences to prevent source code with
  triple backticks from closing the prompt's markdown fences
- Applied to all three prompt builders (python, go, ui)

Documentation:
- Add coverage_agent to AGENTS.md Scripts section per project
  convention: "If adding a service, module, or major component,
  update the Codebase Structure section"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jiaenren added a commit that referenced this pull request Mar 30, 2026
…t escaping

Fixes 7 valid findings from CodeRabbit review on PR #743:

Shell injection hardening (create_pr.py, respond.py):
- All file paths now wrapped with shlex.quote() in git/gh commands
- Commit messages written to temp files, passed via git commit -F
  (avoids shell metacharacter issues with inline -m)
- PR body written to temp file, passed via gh pr create --body-file
- Reply message written to temp file, passed via gh api -F body=@file
- Temp files cleaned up in finally blocks

Prompt fence escaping (prompts/):
- Add escape_fenced_content() in prompts/__init__.py that inserts
  zero-width space in ``` sequences to prevent source code with
  triple backticks from closing the prompt's markdown fences
- Applied to all three prompt builders (python, go, ui)

Documentation:
- Add coverage_agent to AGENTS.md Scripts section per project
  convention: "If adding a service, module, or major component,
  update the Codebase Structure section"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jiaenren jiaenren force-pushed the jiaenr/ai-agent-pipeline-design branch from dad89f5 to b4e3b3f Compare March 30, 2026 17:09
@jiaenren jiaenren force-pushed the jiaenr/ai-agent-pipeline-design branch from 47b7f09 to 2fe512f Compare March 31, 2026 04:54
@jiaenren jiaenren force-pushed the jiaenr/ai-agent-pipeline-design branch from bcf2fb6 to 508af79 Compare March 31, 2026 08:16
jiaenren and others added 2 commits April 1, 2026 16:44
Testbot analyzes coverage gaps, generates tests via LLM, validates them
with bazel, and opens PRs for human review.

Pipeline: analyze → write → validate → review → create_pr
- Analyze: fetches coverage from Codecov API, selects low-coverage
  targets, caps uncovered lines per target (default 200)
- Write: calls LLM to generate tests with quality rules in the prompt
- Validate: syntax check then bazel test (Python/Go) or vitest (UI)
- Review: LLM-based quality review (PASS/FAIL with feedback)
- Create PR: branches from current HEAD, targets main

Respond module: batch-processes PR review comments per file
- Fetches review threads via GraphQL (includes resolution status)
- Groups unresolved threads by file, one LLM call per file
- Includes source file context for informed responses
- LLM returns JSON with fix, replies, and resolve verdict per comment
- Validates fix with retries, commits+pushes if tests pass
- Resolves threads the LLM marks as addressed

Key design decisions:
- Codecov API for coverage data (seconds vs 15+ min bazel coverage)
- LLM review as sole quality gate (no brittle regex static checks)
- Retry loop: validation/review failures feed back to LLM (max 3)
- BUILD entries auto-generated from existing deps (not LLM-generated)

Plugin architecture:
- LLMProvider (ABC) with LLMClient implementation
- Two providers: claude (default), nemotron (via NVIDIA inference API)
- Single env var: LLM_API_KEY

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
testbot.yaml:
- Triggers: weekday schedule (6AM UTC), manual dispatch
- Fetches coverage from Codecov API (no bazel coverage needed)
- Sets up Node.js/pnpm for UI test validation, Bazel for Python/Go
- Creates PR against main with ai-generated label
- Uses SVC_OSMO_CI_TOKEN so created PRs trigger CI

testbot-respond.yaml:
- Triggers on pull_request_review_comment for ai-generated PRs
- Sets up pnpm + Bazel for test validation
- Batch processes all unresolved threads grouped by file
- Resolves threads after LLM addresses them

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jiaenren
Copy link
Copy Markdown
Collaborator Author

jiaenren commented Apr 1, 2026

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

✅ Actions performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@jiaenren
Copy link
Copy Markdown
Collaborator Author

jiaenren commented Apr 1, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

♻️ Duplicate comments (4)
src/scripts/testbot/tools/bazel_query.py (1)

8-10: ⚠️ Potential issue | 🔴 Critical

Validate package before building the shell command.

package is directly interpolated into a shell command. Without strict validation, crafted input can inject additional commands.

Suggested patch
+import re
+
 from testbot.tools.shell import run_shell
 
+_VALID_BAZEL_PACKAGE_PATTERN = re.compile(r"^//[A-Za-z0-9/_\-\.:]+$")
+
 
 def list_build_targets(package: str) -> str:
     """List all build targets in a Bazel package."""
+    if not _VALID_BAZEL_PACKAGE_PATTERN.fullmatch(package):
+        return f"Error: Invalid Bazel package: {package}"
     result = run_shell(f"bazel query 'kind(rule, {package}/...)'")
     if result.returncode != 0:
         return f"Error querying {package}: {result.stderr}"
     return result.stdout
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tools/bazel_query.py` around lines 8 - 10, The function
list_build_targets interpolates package into a shell command, allowing command
injection; update list_build_targets to validate and escape package before
calling run_shell: first reject any package that doesn't match a safe pattern
(e.g. regex like r'^[A-Za-z0-9_\-./]+$' or whatever Bazel package grammar you
accept) in list_build_targets, then use shlex.quote(package) (or pass an
argv-style command to a subprocess wrapper if run_shell supports it) when
constructing the bazel query string; reference function name list_build_targets
and the call to run_shell to locate and fix the interpolation.
src/scripts/testbot/respond.py (3)

220-223: ⚠️ Potential issue | 🟡 Minor

Fragile error detection: startswith("Error") could match legitimate content.

This pattern was flagged in a previous review and remains unaddressed. If a file legitimately starts with "Error" (e.g., error-handling module docstring), this check incorrectly treats it as a failure. The read_file function should signal errors explicitly (e.g., raise an exception or return a result type).

🔧 Suggested approach

Have read_file raise FileNotFoundError or a custom exception on failure, then use try/except:

-    file_content = read_file(file_path)
-    if file_content.startswith("Error"):
-        logger.error("Cannot read %s: %s", file_path, file_content)
-        return 0
+    try:
+        file_content = read_file(file_path)
+    except FileNotFoundError as exc:
+        logger.error("Cannot read %s: %s", file_path, exc)
+        return 0

This requires updating read_file to raise instead of returning an error string.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/respond.py` around lines 220 - 223, The current fragile
check using file_content.startswith("Error") should be replaced by making
read_file raise a clear exception on failure (e.g., FileNotFoundError or a
custom ReadFileError) and updating the caller in respond.py to wrap the call in
try/except; call read_file(file_path) inside a try block, catch the specific
exception(s), log the error via logger.error including the caught exception
details, and return 0 on exception—remove the startswith check and any reliance
on special error strings.

233-236: ⚠️ Potential issue | 🟡 Minor

Same fragile error check repeated for source content.

The startswith("Error") pattern is repeated here for source file reading. Apply the same fix as above.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/respond.py` around lines 233 - 236, Replace the fragile
startswith("Error") check on source_content with the same robust pattern used
elsewhere: have read_file(signal) return None on failure (or raise) and update
this caller to check for None (or catch the exception) instead of using
source_content.startswith("Error"); update the block that references
source_path, source_content and read_file to treat a None return as a failed
read (source_content = None) or catch and log the exception accordingly so you
no longer rely on the "Error" string sentinel.

368-374: ⚠️ Potential issue | 🟠 Major

The MAX_AUTO_RESPONSES quota check has a race condition.

The quota is checked once per file iteration, but concurrent workflow runs on the same PR can both proceed from the same count. This was noted in a previous review. Consider implementing an atomic reservation mechanism or serializing per-PR operations to prevent exceeding the quota.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/respond.py` around lines 368 - 374, The loop using
total_replied and MAX_AUTO_RESPONSES has a race condition; replace the
non-atomic check with an atomic reservation or per-PR serialization so
concurrent runs cannot exceed the quota. Implement a reservation API (e.g.,
reserve_responses(pr_number, count=1) backed by a small persistent/atomic store
or a per-PR lock) and call it before processing each file/thread batch in the
loop that invokes _process_file_threads(file_path, file_threads, provider); only
proceed if the reservation grants space and decrement/track based on the granted
amount, updating total_replied from the actual reserved count, or alternatively
acquire a per-PR mutex around the entire by_file iteration to serialize
operations for pr_number. Ensure MAX_AUTO_RESPONSES, total_replied,
_process_file_threads, by_file and pr_number are used consistently with the new
reservation/lock mechanism.
🧹 Nitpick comments (7)
src/scripts/testbot/prompts/quality_rules.py (1)

5-5: Add an explicit type annotation for the exported preamble constant.

This improves readability and static checking for downstream prompt/template consumers.

♻️ Suggested change
-QUALITY_RULES_PREAMBLE = """\
+QUALITY_RULES_PREAMBLE: str = """\

As per coding guidelines, "Add type annotations where they improve code clarity and catch errors in Python code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/prompts/quality_rules.py` at line 5, The exported
constant QUALITY_RULES_PREAMBLE lacks an explicit type annotation; add a clear
type hint (e.g., declare QUALITY_RULES_PREAMBLE as type str) to the constant
definition so static checkers and readers know it's a string, updating the
declaration of QUALITY_RULES_PREAMBLE in
src/scripts/testbot/prompts/quality_rules.py accordingly.
src/scripts/testbot/tests/test_create_pr.py (1)

11-17: Type-annotate _make_state to make the test fixture contract explicit.

This helper is reused across multiple tests and benefits from a typed return.

♻️ Suggested change
+from typing import Sequence
+
 from testbot.nodes.create_pr import build_pr_body, build_pr_title
-from testbot.state import TestTarget
+from testbot.state import TestTarget, TestbotState
@@
-def _make_state(targets, generated_files):
+def _make_state(targets: Sequence[TestTarget], generated_files: Sequence[str]) -> TestbotState:
     """Create minimal state for PR building."""

As per coding guidelines, "Add type annotations where they improve code clarity and catch errors in Python code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tests/test_create_pr.py` around lines 11 - 17, The helper
function _make_state lacks type annotations; add parameter and return type hints
to make the fixture contract explicit — annotate targets and generated_files
(e.g., List[str] or Iterable[str] as appropriate) and the return type as a
mapping (e.g., Dict[str, Any] or a specific TypedDict/PRState dataclass) so
callers and static checkers know the expected shape; update the signature of
_make_state and import any typing symbols (List, Dict, Any, TypedDict) needed to
reflect that the returned dict contains "targets", "generated_files", and
"provider".
src/scripts/testbot/plugins/llm_client.py (1)

247-260: Edge case: empty lines treated as code.

_looks_like_code returns True for empty lines (stripped == ""), which means if the response starts with blank lines, _find_code_start will return 0 and skip no prose. While likely benign, this could retain prose if the LLM starts with blank lines followed by explanatory text.

♻️ Consider starting search from first non-empty line
 def _find_code_start(lines: list[str]) -> int:
     """Find the line index where actual code begins after prose preamble.

     Skips blank lines between prose and code — requires a non-empty code line.
     """
     for i, line in enumerate(lines):
-        if line.strip() and _looks_like_code(line):
+        stripped = line.strip()
+        if stripped and _looks_like_code(line):
             return i
     return 0

And adjust _looks_like_code to not return True for empty:

 def _looks_like_code(line: str) -> bool:
     """Check if a line looks like the start of source code."""
     stripped = line.strip()
+    if not stripped:
+        return False
     return (
         stripped.startswith("#")
         or stripped.startswith("//")
         or stripped.startswith("import ")
         or stripped.startswith("from ")
         or stripped.startswith("package ")
         or stripped.startswith("def ")
         or stripped.startswith("class ")
         or stripped.startswith("func ")
-        or stripped == ""
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/plugins/llm_client.py` around lines 247 - 260, The helper
_looks_like_code currently treats empty lines as code (stripped == ""), causing
_find_code_start to misidentify code start when responses begin with blank
lines; update _looks_like_code to remove the empty-line check (do not return
True for stripped == "") and modify _find_code_start to begin its search from
the first non-empty line (skip leading blank lines) before calling
_looks_like_code on subsequent lines so prose preceded by blank lines isn't
mistaken for code.
src/scripts/testbot/graph.py (1)

119-123: Consider defining done_check before node registration for consistency.

The done_check function is defined in the middle of graph construction (after adding edges for write_test). While functional, defining it alongside other transition helpers at lines 90-104 or before the graph construction block would improve readability.

♻️ Suggested refactor
+    def done_check(state: TestbotState) -> TestbotState:
+        """No-op node — just a routing point for done/abort decisions."""
+        return state
+
     graph = StateGraph(TestbotState)
     graph.add_node("analyze", analyze_coverage)
     graph.add_node("write_test", write_test)
     graph.add_node("validate", validate_with_transition)
     graph.add_node("review", review_with_transition)
     graph.add_node("create_pr", create_pr)
+    graph.add_node("done_check", done_check)

     graph.set_entry_point("analyze")
     graph.add_conditional_edges("analyze", route_analyze, {
         "write_test": "write_test",
         "done": END,
     })
     graph.add_edge("write_test", "validate")
-    def done_check(state: TestbotState) -> TestbotState:
-        """No-op node — just a routing point for done/abort decisions."""
-        return state
-
-    graph.add_node("done_check", done_check)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/graph.py` around lines 119 - 123, The done_check no-op
routing function is defined immediately before graph.add_node("done_check")
which breaks consistency with other transition helpers; move the def
done_check(state: TestbotState) -> TestbotState: ... implementation to the same
helper section where other transition helper functions are declared (near the
other helper defs) and then keep the existing graph.add_node("done_check",
done_check) registration in the graph construction block so the function
definitions are grouped and registrations remain in the graph build.
src/scripts/testbot/nodes/create_pr.py (1)

90-93: Long PR titles may occur with many targets.

If state["targets"] contains many files, the title could become excessively long. Consider truncating with an ellipsis or count.

♻️ Suggested truncation
 def build_pr_title(state: TestbotState) -> str:
     """Build PR title from target file paths."""
     target_files = [t.file_path for t in state["targets"]]
-    return f"[testbot] Add tests for {', '.join(target_files)}"
+    if len(target_files) <= 2:
+        return f"[testbot] Add tests for {', '.join(target_files)}"
+    return f"[testbot] Add tests for {target_files[0]} and {len(target_files) - 1} more"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/nodes/create_pr.py` around lines 90 - 93, build_pr_title
currently joins all file paths from state["targets"] which can create
excessively long PR titles; update build_pr_title to limit items included (e.g.,
take first 3 file names), then append an ellipsis or a suffix like " (+N more)"
when there are additional targets, and ensure the final string remains within a
reasonable length; locate the build_pr_title function and change the join logic
to slice target_files and compute remaining_count to append the truncated
indicator.
src/scripts/testbot/nodes/write.py (1)

69-96: Consider atomic file operations for BUILD updates.

The function reads the BUILD file (lines 86-87), then appends in a separate operation (lines 94-95). If the process is interrupted between read and write, or if concurrent runs target the same BUILD file, inconsistencies could occur.

♻️ Suggested atomic approach
     with open(build_path, encoding="utf-8") as build_file:
         build_content = build_file.read()
     if test_basename in build_content:
         logger.info("BUILD already contains entry for %s", test_basename)
         return

     entry = _generate_default_build_entry(test_name, test_basename, source_path, build_content)
     logger.info("BUILD entry to append for %s:\n%s", test_name, entry)
-    with open(build_path, "a", encoding="utf-8") as build_file:
-        build_file.write("\n" + entry + "\n")
+    # Write atomically to avoid partial writes
+    new_content = build_content.rstrip() + "\n\n" + entry + "\n"
+    with open(build_path, "w", encoding="utf-8") as build_file:
+        build_file.write(new_content)
     logger.info("Appended BUILD entry for %s to %s", test_name, build_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/nodes/write.py` around lines 69 - 96, The
_apply_build_entry function currently reads BUILD into build_content and later
appends, which risks race conditions or partial updates; change it to perform an
atomic update: after computing entry with _generate_default_build_entry (using
test_name, test_basename, source_path, build_content), open a temporary file in
the same directory (e.g., BUILD.temp or use tempfile.NamedTemporaryFile with
dir=os.path.dirname(build_path)), write the original build_content plus the new
entry into the temp file, flush and fsync, then atomically replace the original
BUILD with os.replace(temp_path, build_path); ensure proper cleanup on error and
consider using a file lock (e.g., fcntl or portalocker) around the
read/generate/write/replace sequence to prevent concurrent writers.
src/scripts/testbot/respond.py (1)

193-208: JSON parsing could be more robust.

The slice text[text.find("{"):text.rfind("}") + 1] can produce malformed JSON if there are nested braces or if { appears in a string value before the actual JSON object. Consider using a more robust extraction pattern.

♻️ More robust JSON extraction
 def parse_llm_json_response(response_text: str) -> dict:
     """Parse JSON LLM response. Returns dict with 'fix' and 'replies' keys."""
     text = response_text.strip()

-    for attempt_text in [text, text[text.find("{"):text.rfind("}") + 1] if "{" in text else ""]:
+    # Try parsing the full text first
+    try:
+        data = json.loads(text)
+        if isinstance(data, dict):
+            return data
+    except (json.JSONDecodeError, TypeError):
+        pass
+
+    # Try to extract JSON by finding balanced braces
+    if "{" in text:
+        start = text.find("{")
+        depth = 0
+        for i, char in enumerate(text[start:], start):
+            if char == "{":
+                depth += 1
+            elif char == "}":
+                depth -= 1
+                if depth == 0:
+                    try:
+                        data = json.loads(text[start:i + 1])
+                        if isinstance(data, dict):
+                            return data
+                    except (json.JSONDecodeError, TypeError):
+                        pass
+                    break
+
+    logger.warning("Failed to parse LLM JSON response")
+    return {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/respond.py` around lines 193 - 208, The current
parse_llm_json_response() uses text[text.find("{"):text.rfind("}")+1] which
breaks on nested braces or braces inside strings; replace that naive slice with
a robust extractor that scans from the first '{' and finds its matching closing
'}' using a brace-counter that ignores braces inside quoted strings and handles
escaped quotes, then json.loads that balanced substring (falling back to trying
the whole text), returning the parsed dict if successful; update references in
parse_llm_json_response to use this balanced-brace extraction routine before
attempting json.loads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 36-46: The checkout step that uses ref:
github.event.pull_request.head.ref ("Checkout PR branch") is pulling and running
untrusted PR head code while exposing secrets SVC_OSMO_CI_TOKEN and
NVIDIA_NIM_KEY; change the workflow to avoid checking out or executing untrusted
PR code with repository secrets by gating that step: replace using the PR head
ref with a safe source (e.g., use the merge commit or github.sha /
pull_request.target ref) or only use the service secrets when a trusted
label/actor is present, and for untrusted PRs use the default GITHUB_TOKEN
without extra secrets and set persist-credentials: false; ensure both
occurrences (the checkout step at the top and the similar one later) are updated
accordingly.

In @.github/workflows/testbot.yaml:
- Around line 57-63: The workflow currently checks out the repo with a
credentialed checkout (actions/checkout@...) and exposes service tokens (token:
${{ secrets.SVC_OSMO_CI_TOKEN }}) to the same job that runs LLM-generated test
validation; change the flow so the validation runs in an unprivileged
job/container by setting persist-credentials: false on the checkout used for
test validation (or use a separate job that performs the checkout with
persist-credentials: false) and remove any secret/token injection (LLM_API_KEY,
CODECOV_TOKEN, GH_TOKEN, SVC_OSMO_CI_TOKEN) from that job; then add a separate,
later job/step that performs a credentialed checkout or uses token: ${{
secrets.SVC_OSMO_CI_TOKEN }} only when creating the PR or pushing results (the
PR-creation step), ensuring secrets are injected only into that write-only job.

In `@src/scripts/testbot/BUILD`:
- Around line 11-12: Update the stale comment referencing
src/agent_requirements.txt to point to the new requirements file added by this
PR (src/scripts/testbot/requirements.txt); locate the comment block that
mentions "langchain/langgraph deps are installed via pip in CI (not
Bazel-managed for v1)." and replace the path so it accurately documents where
those deps are declared, keeping the rest of the comment intact.

In `@src/scripts/testbot/main.py`:
- Around line 24-35: The CLI numeric flags defined via parser.add_argument
(--max-targets, --max-lines, --max-retries) currently allow invalid values; add
explicit validation after args = parser.parse_args() to enforce sensible bounds
(e.g., args.max_targets > 0, args.max_lines > 0, args.max_retries >= 0) and call
parser.error(...) with a clear message if any check fails; alternatively,
enforce constraints directly in the add_argument calls (use a custom
type/validator or choices) so that parser rejects negatives/zeros for
parser.add_argument("--max-targets"), parser.add_argument("--max-lines"), and
parser.add_argument("--max-retries") before proceeding.

In `@src/scripts/testbot/nodes/validate.py`:
- Around line 42-44: The call to llm.validate_test (from
get_llm(state["provider"])) can throw and currently will abort the node; wrap
the llm.validate_test(last_generated) invocation in a try/except that catches
all exceptions, log the exception, and convert it into a standardized failed
validation result (including the error message and a flag/field indicating it
should be retried) instead of letting the exception propagate; ensure the
returned/assigned variable used by this node follows the same validation-result
shape the pipeline expects so downstream retry handling can operate (refer to
get_llm, llm.validate_test, state["provider"], and last_generated to locate and
update the code).

In `@src/scripts/testbot/nodes/write.py`:
- Line 15: The current _DEPS_RE only captures the first quoted dep; update it to
capture the entire deps bracket (e.g. match from "deps" through the closing ]
with a multiline/ DOTALL-aware pattern) and then extract individual entries by
running a second re.findall for quoted strings (or split on commas) on the
captured block; modify uses of _DEPS_RE in this file (e.g., where deps are
parsed in write.py) to iterate over all extracted dep strings instead of taking
a single group.

In `@src/scripts/testbot/plugins/__init__.py`:
- Around line 25-34: The get_llm function can raise KeyError because
_register_defaults() is never invoked; update get_llm to call
_register_defaults() at the start (before checking PLUGINS) so default providers
are registered, then proceed to check PLUGINS and use _instances as before; make
the call idempotent (since _register_defaults likely already guards against
re-registration). Apply the same change to the other provider-retrieval function
in this module (the one around the other provider-check block referencing
PLUGINS/_instances) so both entry points initialize defaults before validating
providers.

In `@src/scripts/testbot/plugins/llm_client.py`:
- Around line 134-140: Add a defensive guard around accessing response.choices
in llm_client.py: check that response and response.choices exist and that
len(response.choices) > 0 before using response.choices[0]; if empty, log a
clear message via logger (including any available response metadata), set
content/finish_reason to safe defaults (e.g., "" and None) or return/raise a
well-defined error, and ensure downstream code that uses content/finish_reason
handles this case; update the block that assigns choice, content, and
finish_reason to use this guard and handle the empty-choices path gracefully.
- Around line 92-93: The if-check that raises RuntimeError("OpenAI client not
available. pip install openai") is unreachable because self.client is
unconditionally set in __init__ (OpenAI(...) at initialization); remove this
dead branch from the method containing that check (the runtime validation
referencing self.client) so the code no longer contains the impossible if
self.client is None path — or if deferred initialization was intended, replace
with a proper lazy-init pattern, but by default delete the unreachable check and
its raise to clean up the code.

In `@src/scripts/testbot/prompts/respond.py`:
- Around line 33-49: Replace the untyped list[dict] review payload with a
dataclass to improve type-safety: define a ReviewComment dataclass (fields
comment_id, author, line, body) and import it into this module, change the
function signature from comments: list[dict] to comments: list[ReviewComment]
(or List[ReviewComment]) and update the comprehension that builds comments_text
to access attributes (e.g., c.comment_id, c.author, c.line, c.body) instead of
dictionary keys; also update the docstring to describe the ReviewComment type
and add the necessary dataclasses/typing imports.

In `@src/scripts/testbot/prompts/review.py`:
- Around line 49-55: The prompt builds a markdown-wrapped string using
source_content and test_content raw, which can break outer fences if those
strings contain ```; modify the return in review.py to first produce
escaped_source_content and escaped_test_content (e.g., replace all occurrences
of "```" with "`\\`\\`\\`" or otherwise escape triple backticks/newlines) and
then use those escaped variables in the f-string in place of source_content and
test_content so the outer ``` fences are preserved; update the return expression
to reference escaped_source_content and escaped_test_content and ensure escaping
logic runs before the f-string is constructed.

In `@src/scripts/testbot/README.md`:
- Around line 9-13: Add language identifiers to the three unlabeled fenced code
blocks in the README so markdownlint MD040 is satisfied: change the fences for
the block containing "Codecov API → analyze → write → validate → review →
create_pr" to use ```text, change the fence for the block that begins "PR
comment → GraphQL fetch threads → filter unresolved" to ```text, and change the
fence for the block showing the "src/scripts/testbot/ ├── main.py ... └──
tests/" tree to ```text; ensure each opening ``` has the corresponding ```text
label and leave block contents unchanged.

In `@src/scripts/testbot/respond.py`:
- Around line 56-61: The helper _get_repo_nwo currently swallows failures from
run_shell("gh repo view...") and returns the hardcoded "NVIDIA/OSMO", which can
cause actions against the wrong repo; update _get_repo_nwo to surface failures
instead of silently defaulting: when result.returncode != 0 or stdout is empty,
either log a warning with the captured stdout/stderr and the failing command
(use the module's logger if available) and then raise a RuntimeError (or a
custom exception) explaining the gh CLI failure and suggesting
authentication/repo context, or at minimum log the full error and re-raise;
reference _get_repo_nwo and run_shell to locate and change the return/default
behavior.

In `@src/scripts/testbot/tests/test_integration.py`:
- Around line 5-14: The tests call review_test() which calls
testbot.nodes.review.get_llm() and can hit real LLM services; instead stub
get_llm to a deterministic fake LLM client in these tests. Use
unittest.mock.patch to patch testbot.nodes.review.get_llm (or patch.object on
the review module) in the test module setup or per-test, returning a lightweight
fake object implementing the same interface used by review_test (e.g., a
generate/respond method that returns fixed outputs), apply the same patch for
the other tests around lines 91-117, and ensure the fake is used before calling
review_test() so no real network/credentials are accessed.

In `@src/scripts/testbot/tests/test_tools.py`:
- Around line 94-96: The test currently swallows PermissionError in the
kill-check except block (the "except PermissionError: pass" after the kill
signal attempt), which can mask a still-running process; remove that bare pass
so PermissionError is not treated as success — either let PermissionError
propagate (so the test fails) or include PermissionError in the outer except
tuple (e.g., except (FileNotFoundError, ValueError, PermissionError):) and
handle it as a failure case; update the exception handling around the kill/send
signal logic (the except blocks shown in test_tools.py) accordingly.

In `@src/scripts/testbot/tools/file_ops.py`:
- Around line 22-25: The write_file function fails when path is filename-only
and returns an incorrect byte count; fix it in write_file by computing dir =
os.path.dirname(path) and only calling os.makedirs(dir, exist_ok=True) if dir is
non-empty (or use dir or "."), then write the content as before but report the
actual number of bytes by computing byte_count = len(content.encode("utf-8"))
(or open in binary and write content.encode("utf-8") and use the written byte
length); update the return message to use byte_count and reference the
write_file function and the variables path and content so the change is easy to
locate.

In `@src/scripts/testbot/tools/test_runner.py`:
- Around line 125-129: The retry_hint currently uses only result.stderr which
omits important Vitest details printed to stdout; update the return value in
test_runner.py (the ValidationResult construction) so that retry_hint uses the
combined output (result.stdout + result.stderr) when result.returncode != 0,
while keeping passed as result.returncode == 0 and output as the combined
output; locate the ValidationResult instantiation in the function that returns
it and replace the retry_hint expression to reference the same combined output
string.

---

Duplicate comments:
In `@src/scripts/testbot/respond.py`:
- Around line 220-223: The current fragile check using
file_content.startswith("Error") should be replaced by making read_file raise a
clear exception on failure (e.g., FileNotFoundError or a custom ReadFileError)
and updating the caller in respond.py to wrap the call in try/except; call
read_file(file_path) inside a try block, catch the specific exception(s), log
the error via logger.error including the caught exception details, and return 0
on exception—remove the startswith check and any reliance on special error
strings.
- Around line 233-236: Replace the fragile startswith("Error") check on
source_content with the same robust pattern used elsewhere: have
read_file(signal) return None on failure (or raise) and update this caller to
check for None (or catch the exception) instead of using
source_content.startswith("Error"); update the block that references
source_path, source_content and read_file to treat a None return as a failed
read (source_content = None) or catch and log the exception accordingly so you
no longer rely on the "Error" string sentinel.
- Around line 368-374: The loop using total_replied and MAX_AUTO_RESPONSES has a
race condition; replace the non-atomic check with an atomic reservation or
per-PR serialization so concurrent runs cannot exceed the quota. Implement a
reservation API (e.g., reserve_responses(pr_number, count=1) backed by a small
persistent/atomic store or a per-PR lock) and call it before processing each
file/thread batch in the loop that invokes _process_file_threads(file_path,
file_threads, provider); only proceed if the reservation grants space and
decrement/track based on the granted amount, updating total_replied from the
actual reserved count, or alternatively acquire a per-PR mutex around the entire
by_file iteration to serialize operations for pr_number. Ensure
MAX_AUTO_RESPONSES, total_replied, _process_file_threads, by_file and pr_number
are used consistently with the new reservation/lock mechanism.

In `@src/scripts/testbot/tools/bazel_query.py`:
- Around line 8-10: The function list_build_targets interpolates package into a
shell command, allowing command injection; update list_build_targets to validate
and escape package before calling run_shell: first reject any package that
doesn't match a safe pattern (e.g. regex like r'^[A-Za-z0-9_\-./]+$' or whatever
Bazel package grammar you accept) in list_build_targets, then use
shlex.quote(package) (or pass an argv-style command to a subprocess wrapper if
run_shell supports it) when constructing the bazel query string; reference
function name list_build_targets and the call to run_shell to locate and fix the
interpolation.

---

Nitpick comments:
In `@src/scripts/testbot/graph.py`:
- Around line 119-123: The done_check no-op routing function is defined
immediately before graph.add_node("done_check") which breaks consistency with
other transition helpers; move the def done_check(state: TestbotState) ->
TestbotState: ... implementation to the same helper section where other
transition helper functions are declared (near the other helper defs) and then
keep the existing graph.add_node("done_check", done_check) registration in the
graph construction block so the function definitions are grouped and
registrations remain in the graph build.

In `@src/scripts/testbot/nodes/create_pr.py`:
- Around line 90-93: build_pr_title currently joins all file paths from
state["targets"] which can create excessively long PR titles; update
build_pr_title to limit items included (e.g., take first 3 file names), then
append an ellipsis or a suffix like " (+N more)" when there are additional
targets, and ensure the final string remains within a reasonable length; locate
the build_pr_title function and change the join logic to slice target_files and
compute remaining_count to append the truncated indicator.

In `@src/scripts/testbot/nodes/write.py`:
- Around line 69-96: The _apply_build_entry function currently reads BUILD into
build_content and later appends, which risks race conditions or partial updates;
change it to perform an atomic update: after computing entry with
_generate_default_build_entry (using test_name, test_basename, source_path,
build_content), open a temporary file in the same directory (e.g., BUILD.temp or
use tempfile.NamedTemporaryFile with dir=os.path.dirname(build_path)), write the
original build_content plus the new entry into the temp file, flush and fsync,
then atomically replace the original BUILD with os.replace(temp_path,
build_path); ensure proper cleanup on error and consider using a file lock
(e.g., fcntl or portalocker) around the read/generate/write/replace sequence to
prevent concurrent writers.

In `@src/scripts/testbot/plugins/llm_client.py`:
- Around line 247-260: The helper _looks_like_code currently treats empty lines
as code (stripped == ""), causing _find_code_start to misidentify code start
when responses begin with blank lines; update _looks_like_code to remove the
empty-line check (do not return True for stripped == "") and modify
_find_code_start to begin its search from the first non-empty line (skip leading
blank lines) before calling _looks_like_code on subsequent lines so prose
preceded by blank lines isn't mistaken for code.

In `@src/scripts/testbot/prompts/quality_rules.py`:
- Line 5: The exported constant QUALITY_RULES_PREAMBLE lacks an explicit type
annotation; add a clear type hint (e.g., declare QUALITY_RULES_PREAMBLE as type
str) to the constant definition so static checkers and readers know it's a
string, updating the declaration of QUALITY_RULES_PREAMBLE in
src/scripts/testbot/prompts/quality_rules.py accordingly.

In `@src/scripts/testbot/respond.py`:
- Around line 193-208: The current parse_llm_json_response() uses
text[text.find("{"):text.rfind("}")+1] which breaks on nested braces or braces
inside strings; replace that naive slice with a robust extractor that scans from
the first '{' and finds its matching closing '}' using a brace-counter that
ignores braces inside quoted strings and handles escaped quotes, then json.loads
that balanced substring (falling back to trying the whole text), returning the
parsed dict if successful; update references in parse_llm_json_response to use
this balanced-brace extraction routine before attempting json.loads.

In `@src/scripts/testbot/tests/test_create_pr.py`:
- Around line 11-17: The helper function _make_state lacks type annotations; add
parameter and return type hints to make the fixture contract explicit — annotate
targets and generated_files (e.g., List[str] or Iterable[str] as appropriate)
and the return type as a mapping (e.g., Dict[str, Any] or a specific
TypedDict/PRState dataclass) so callers and static checkers know the expected
shape; update the signature of _make_state and import any typing symbols (List,
Dict, Any, TypedDict) needed to reflect that the returned dict contains
"targets", "generated_files", and "provider".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 647c4d70-c896-4bf0-a289-69a591406c76

📥 Commits

Reviewing files that changed from the base of the PR and between c8bf5bc and 5c5051d.

📒 Files selected for processing (47)
  • .github/workflows/testbot-respond.yaml
  • .github/workflows/testbot.yaml
  • AGENTS.md
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/README.md
  • src/scripts/testbot/__init__.py
  • src/scripts/testbot/codecov_client.py
  • src/scripts/testbot/graph.py
  • src/scripts/testbot/lcov_parser.py
  • src/scripts/testbot/main.py
  • src/scripts/testbot/nodes/__init__.py
  • src/scripts/testbot/nodes/analyze.py
  • src/scripts/testbot/nodes/create_pr.py
  • src/scripts/testbot/nodes/review.py
  • src/scripts/testbot/nodes/validate.py
  • src/scripts/testbot/nodes/write.py
  • src/scripts/testbot/plugins/__init__.py
  • src/scripts/testbot/plugins/base.py
  • src/scripts/testbot/plugins/llm_client.py
  • src/scripts/testbot/prompts/__init__.py
  • src/scripts/testbot/prompts/go_test.py
  • src/scripts/testbot/prompts/python_test.py
  • src/scripts/testbot/prompts/quality_rules.py
  • src/scripts/testbot/prompts/respond.py
  • src/scripts/testbot/prompts/review.py
  • src/scripts/testbot/prompts/ui_test.py
  • src/scripts/testbot/requirements.txt
  • src/scripts/testbot/respond.py
  • src/scripts/testbot/state.py
  • src/scripts/testbot/tests/__init__.py
  • src/scripts/testbot/tests/test_analyze.py
  • src/scripts/testbot/tests/test_codecov_client.py
  • src/scripts/testbot/tests/test_create_pr.py
  • src/scripts/testbot/tests/test_graph.py
  • src/scripts/testbot/tests/test_integration.py
  • src/scripts/testbot/tests/test_lcov_parser.py
  • src/scripts/testbot/tests/test_llm_client.py
  • src/scripts/testbot/tests/test_plugins_base.py
  • src/scripts/testbot/tests/test_respond.py
  • src/scripts/testbot/tests/test_state.py
  • src/scripts/testbot/tests/test_tools.py
  • src/scripts/testbot/tests/test_write_build_entry.py
  • src/scripts/testbot/tools/__init__.py
  • src/scripts/testbot/tools/bazel_query.py
  • src/scripts/testbot/tools/file_ops.py
  • src/scripts/testbot/tools/shell.py
  • src/scripts/testbot/tools/test_runner.py

Comment on lines +36 to +46
if: contains(github.event.pull_request.labels.*.name, 'ai-generated')
timeout-minutes: 15
runs-on: ubuntu-latest
steps:
- name: Checkout PR branch
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
with:
ref: ${{ github.event.pull_request.head.ref }}
fetch-depth: 0
token: ${{ secrets.SVC_OSMO_CI_TOKEN }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Harden trust boundary before running PR-head code with secrets.

At Lines 40-46 and Lines 82-89, the job executes code from github.event.pull_request.head.ref while exposing SVC_OSMO_CI_TOKEN and NVIDIA_NIM_KEY. If an untrusted PR receives ai-generated, this can become a secret-exfil path.

🔒 Suggested guardrail
-    if: contains(github.event.pull_request.labels.*.name, 'ai-generated')
+    if: >
+      contains(github.event.pull_request.labels.*.name, 'ai-generated') &&
+      github.event.pull_request.head.repo.full_name == github.repository &&
+      github.event.pull_request.user.login == 'testbot[bot]'

Also applies to: 82-89

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/testbot-respond.yaml around lines 36 - 46, The checkout
step that uses ref: github.event.pull_request.head.ref ("Checkout PR branch") is
pulling and running untrusted PR head code while exposing secrets
SVC_OSMO_CI_TOKEN and NVIDIA_NIM_KEY; change the workflow to avoid checking out
or executing untrusted PR code with repository secrets by gating that step:
replace using the PR head ref with a safe source (e.g., use the merge commit or
github.sha / pull_request.target ref) or only use the service secrets when a
trusted label/actor is present, and for untrusted PRs use the default
GITHUB_TOKEN without extra secrets and set persist-credentials: false; ensure
both occurrences (the checkout step at the top and the similar one later) are
updated accordingly.

Comment on lines +57 to +63
- name: Checkout
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
with:
lfs: true
fetch-depth: 0
# Use service account PAT so git push and PR creation trigger CI workflows
token: ${{ secrets.SVC_OSMO_CI_TOKEN }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't validate generated tests in the same credentialed job.

This workflow eventually executes LLM-generated test code, but the job also exposes LLM_API_KEY, CODECOV_TOKEN, GH_TOKEN, and a persisted service-account checkout credential to that same process tree. A prompt-injected or malicious test can read/exfiltrate those secrets during validation. Please run validation in an unprivileged job/container with persist-credentials: false, and only inject the write token in the later PR-creation step.

Also applies to: 111-122

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/testbot.yaml around lines 57 - 63, The workflow currently
checks out the repo with a credentialed checkout (actions/checkout@...) and
exposes service tokens (token: ${{ secrets.SVC_OSMO_CI_TOKEN }}) to the same job
that runs LLM-generated test validation; change the flow so the validation runs
in an unprivileged job/container by setting persist-credentials: false on the
checkout used for test validation (or use a separate job that performs the
checkout with persist-credentials: false) and remove any secret/token injection
(LLM_API_KEY, CODECOV_TOKEN, GH_TOKEN, SVC_OSMO_CI_TOKEN) from that job; then
add a separate, later job/step that performs a credentialed checkout or uses
token: ${{ secrets.SVC_OSMO_CI_TOKEN }} only when creating the PR or pushing
results (the PR-creation step), ensuring secrets are injected only into that
write-only job.

Comment thread src/scripts/testbot/BUILD
Comment on lines +11 to +12
# langchain/langgraph deps are installed via pip in CI (not Bazel-managed for v1).
# See src/agent_requirements.txt.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix stale requirements path in comment.

The comment points to src/agent_requirements.txt, but this PR adds src/scripts/testbot/requirements.txt. This is easy to misread during maintenance.

Suggested patch
-        # See src/agent_requirements.txt.
+        # See src/scripts/testbot/requirements.txt.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# langchain/langgraph deps are installed via pip in CI (not Bazel-managed for v1).
# See src/agent_requirements.txt.
# langchain/langgraph deps are installed via pip in CI (not Bazel-managed for v1).
# See src/scripts/testbot/requirements.txt.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/BUILD` around lines 11 - 12, Update the stale comment
referencing src/agent_requirements.txt to point to the new requirements file
added by this PR (src/scripts/testbot/requirements.txt); locate the comment
block that mentions "langchain/langgraph deps are installed via pip in CI (not
Bazel-managed for v1)." and replace the path so it accurately documents where
those deps are declared, keeping the rest of the comment intact.

Comment on lines +24 to +35
parser.add_argument("--max-targets", type=int, default=1,
help="Maximum number of files to target per run")
parser.add_argument("--max-lines", type=int, default=200,
help="Maximum uncovered lines per target (larger files are partially covered)")
parser.add_argument("--max-retries", type=int, default=3,
help="Maximum retries per target on test failure")
parser.add_argument("--provider", default="claude",
choices=["nemotron", "claude"],
help="LLM provider for test generation")
parser.add_argument("--dry-run", action="store_true",
help="Generate tests but don't create a PR")
args = parser.parse_args()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate CLI numeric bounds early.

--max-targets, --max-lines, and --max-retries currently accept invalid negatives/zero values, which can put the pipeline in an invalid state.

✅ Suggested guard
     args = parser.parse_args()
+    if args.max_targets < 1:
+        parser.error("--max-targets must be >= 1")
+    if args.max_lines < 1:
+        parser.error("--max-lines must be >= 1")
+    if args.max_retries < 0:
+        parser.error("--max-retries must be >= 0")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parser.add_argument("--max-targets", type=int, default=1,
help="Maximum number of files to target per run")
parser.add_argument("--max-lines", type=int, default=200,
help="Maximum uncovered lines per target (larger files are partially covered)")
parser.add_argument("--max-retries", type=int, default=3,
help="Maximum retries per target on test failure")
parser.add_argument("--provider", default="claude",
choices=["nemotron", "claude"],
help="LLM provider for test generation")
parser.add_argument("--dry-run", action="store_true",
help="Generate tests but don't create a PR")
args = parser.parse_args()
parser.add_argument("--max-targets", type=int, default=1,
help="Maximum number of files to target per run")
parser.add_argument("--max-lines", type=int, default=200,
help="Maximum uncovered lines per target (larger files are partially covered)")
parser.add_argument("--max-retries", type=int, default=3,
help="Maximum retries per target on test failure")
parser.add_argument("--provider", default="claude",
choices=["nemotron", "claude"],
help="LLM provider for test generation")
parser.add_argument("--dry-run", action="store_true",
help="Generate tests but don't create a PR")
args = parser.parse_args()
if args.max_targets < 1:
parser.error("--max-targets must be >= 1")
if args.max_lines < 1:
parser.error("--max-lines must be >= 1")
if args.max_retries < 0:
parser.error("--max-retries must be >= 0")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/main.py` around lines 24 - 35, The CLI numeric flags
defined via parser.add_argument (--max-targets, --max-lines, --max-retries)
currently allow invalid values; add explicit validation after args =
parser.parse_args() to enforce sensible bounds (e.g., args.max_targets > 0,
args.max_lines > 0, args.max_retries >= 0) and call parser.error(...) with a
clear message if any check fails; alternatively, enforce constraints directly in
the add_argument calls (use a custom type/validator or choices) so that parser
rejects negatives/zeros for parser.add_argument("--max-targets"),
parser.add_argument("--max-lines"), and parser.add_argument("--max-retries")
before proceeding.

Comment on lines +63 to +70
if source_content.startswith("Error"):
logger.warning("Cannot read source file for review: %s", target.file_path)
return True, ""

llm = get_llm(state["provider"])
if llm.client is None:
logger.warning("LLM client not available, skipping LLM review")
return True, ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail the review stage closed when the reviewer can't run.

Unreadable source files, missing clients, API exceptions, and unparsable replies all currently return (True, ""). That means the quality gate silently degrades to PASS exactly when it is least trustworthy, and the graph continues toward PR creation.

Return a failed review and surface the reason to the retry path
     if source_content.startswith("Error"):
         logger.warning("Cannot read source file for review: %s", target.file_path)
-        return True, ""
+        return False, f"Cannot read source file for review: {target.file_path}"
@@
     llm = get_llm(state["provider"])
     if llm.client is None:
         logger.warning("LLM client not available, skipping LLM review")
-        return True, ""
+        return False, "LLM client not available for review"
@@
     except Exception as exc:  # pylint: disable=broad-except
-        logger.warning("LLM review failed: %s. Skipping.", exc)
-        return True, ""
+        logger.warning("LLM review failed: %s", exc)
+        return False, f"LLM review failed: {exc}"
@@
     verdict_match = re.search(r"VERDICT:\s*(PASS|FAIL)", review_text, re.IGNORECASE)
     if not verdict_match:
-        logger.warning("Could not parse VERDICT from LLM review. Assuming PASS.")
-        return True, ""
+        logger.warning("Could not parse VERDICT from LLM review")
+        return False, "LLM review returned an unparseable verdict."

Also applies to: 81-101

Comment on lines +56 to +61
def _get_repo_nwo() -> str:
"""Get the owner/repo from gh CLI."""
result = run_shell("gh repo view --json nameWithOwner --jq .nameWithOwner")
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip()
return "NVIDIA/OSMO"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

_get_repo_nwo silently falls back to hardcoded default.

If gh repo view fails (e.g., not in a git repo or gh not authenticated), this silently returns "NVIDIA/OSMO". This could cause the workflow to operate on the wrong repository. Consider logging a warning or raising an error.

🛡️ Suggested fix
 def _get_repo_nwo() -> str:
     """Get the owner/repo from gh CLI."""
     result = run_shell("gh repo view --json nameWithOwner --jq .nameWithOwner")
     if result.returncode == 0 and result.stdout.strip():
         return result.stdout.strip()
+    logger.warning("Failed to detect repo from gh CLI, falling back to NVIDIA/OSMO")
     return "NVIDIA/OSMO"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/respond.py` around lines 56 - 61, The helper
_get_repo_nwo currently swallows failures from run_shell("gh repo view...") and
returns the hardcoded "NVIDIA/OSMO", which can cause actions against the wrong
repo; update _get_repo_nwo to surface failures instead of silently defaulting:
when result.returncode != 0 or stdout is empty, either log a warning with the
captured stdout/stderr and the failing command (use the module's logger if
available) and then raise a RuntimeError (or a custom exception) explaining the
gh CLI failure and suggesting authentication/repo context, or at minimum log the
full error and re-raise; reference _get_repo_nwo and run_shell to locate and
change the return/default behavior.

Comment on lines +5 to +14
import os
import tempfile
import unittest

from testbot.graph import route_done, route_review, route_validation
from testbot.lcov_parser import CoverageEntry
from testbot.nodes.analyze import select_targets
from testbot.nodes.review import review_test
from testbot.plugins.base import GeneratedTest
from testbot.state import TestbotState, TestTarget
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stub get_llm() in these tests instead of relying on ambient env state.

review_test() calls testbot.nodes.review.get_llm() directly. If CI or a developer shell has real LLM credentials configured, these tests stop being deterministic integration tests and start making live network calls with provider-dependent outcomes.

Patch the review client to a deterministic fake
 import os
 import tempfile
 import unittest
+from unittest import mock
@@
-            result = review_test(state)
+            with mock.patch(
+                "testbot.nodes.review.get_llm",
+                return_value=mock.Mock(client=None, model="nemotron"),
+            ):
+                result = review_test(state)
@@
-            result = review_test(state)
+            with mock.patch(
+                "testbot.nodes.review.get_llm",
+                return_value=mock.Mock(client=None, model="nemotron"),
+            ):
+                result = review_test(state)

Also applies to: 91-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tests/test_integration.py` around lines 5 - 14, The tests
call review_test() which calls testbot.nodes.review.get_llm() and can hit real
LLM services; instead stub get_llm to a deterministic fake LLM client in these
tests. Use unittest.mock.patch to patch testbot.nodes.review.get_llm (or
patch.object on the review module) in the test module setup or per-test,
returning a lightweight fake object implementing the same interface used by
review_test (e.g., a generate/respond method that returns fixed outputs), apply
the same patch for the other tests around lines 91-117, and ensure the fake is
used before calling review_test() so no real network/credentials are accessed.

Comment on lines +94 to +96
except PermissionError:
pass # also acceptable: process exists but we can't signal it
except (FileNotFoundError, ValueError):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t treat PermissionError as a successful kill check.

This can hide a still-running process and make the test pass incorrectly.

Suggested fix
-            except PermissionError:
-                pass  # also acceptable: process exists but we can't signal it
+            except PermissionError as permission_error:
+                self.fail(f"Unable to verify process termination: {permission_error}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except PermissionError:
pass # also acceptable: process exists but we can't signal it
except (FileNotFoundError, ValueError):
except PermissionError as permission_error:
self.fail(f"Unable to verify process termination: {permission_error}")
except (FileNotFoundError, ValueError):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tests/test_tools.py` around lines 94 - 96, The test
currently swallows PermissionError in the kill-check except block (the "except
PermissionError: pass" after the kill signal attempt), which can mask a
still-running process; remove that bare pass so PermissionError is not treated
as success — either let PermissionError propagate (so the test fails) or include
PermissionError in the outer except tuple (e.g., except (FileNotFoundError,
ValueError, PermissionError):) and handle it as a failure case; update the
exception handling around the kill/send signal logic (the except blocks shown in
test_tools.py) accordingly.

Comment on lines +22 to +25
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w", encoding="utf-8") as file:
file.write(content)
return f"Written {len(content)} bytes to {path}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import os
print("dirname:", repr(os.path.dirname("output.txt")))
try:
    os.makedirs(os.path.dirname("output.txt"), exist_ok=True)
    print("makedirs succeeded")
except Exception as e:
    print("makedirs error:", type(e).__name__, str(e))
PY

Repository: NVIDIA/OSMO

Length of output: 140


🏁 Script executed:

git ls-files src/scripts/testbot/tools/

Repository: NVIDIA/OSMO

Length of output: 247


🏁 Script executed:

cat -n src/scripts/testbot/tools/file_ops.py

Repository: NVIDIA/OSMO

Length of output: 1279


🏁 Script executed:

python3 << 'EOF'
# Test 1: ASCII string
ascii_content = "hello world"
print(f"ASCII string '{ascii_content}':")
print(f"  len(content): {len(ascii_content)}")
print(f"  len(content.encode('utf-8')): {len(ascii_content.encode('utf-8'))}")

# Test 2: Non-ASCII string
non_ascii_content = "hello 世界"  # Contains Chinese characters (3 bytes each in UTF-8)
print(f"\nNon-ASCII string '{non_ascii_content}':")
print(f"  len(content): {len(non_ascii_content)}")
print(f"  len(content.encode('utf-8')): {len(non_ascii_content.encode('utf-8'))}")

# Test 3: Emoji
emoji_content = "Hello 👋"  # Emoji is 4 bytes in UTF-8
print(f"\nEmoji string '{emoji_content}':")
print(f"  len(content): {len(emoji_content)}")
print(f"  len(content.encode('utf-8')): {len(emoji_content.encode('utf-8'))}")
EOF

Repository: NVIDIA/OSMO

Length of output: 298


Fix filename-only path handling and byte count accuracy in write_file.

The function fails on filename-only paths (e.g., "output.txt") because os.path.dirname() returns an empty string, causing os.makedirs("") to raise FileNotFoundError. Additionally, the return message reports bytes but counts characters using len(content), which is inaccurate for non-ASCII content—UTF-8 encoded text may be longer in bytes than in characters.

Suggested patch
 def write_file(path: str, content: str) -> str:
     """Write content to a file, creating parent directories if needed."""
     try:
-        os.makedirs(os.path.dirname(path), exist_ok=True)
+        parent_directory = os.path.dirname(path)
+        if parent_directory:
+            os.makedirs(parent_directory, exist_ok=True)
         with open(path, "w", encoding="utf-8") as file:
             file.write(content)
-        return f"Written {len(content)} bytes to {path}"
+        return f"Written {len(content.encode('utf-8'))} bytes to {path}"
     except OSError as exception:
         return f"Error writing {path}: {exception}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w", encoding="utf-8") as file:
file.write(content)
return f"Written {len(content)} bytes to {path}"
parent_directory = os.path.dirname(path)
if parent_directory:
os.makedirs(parent_directory, exist_ok=True)
with open(path, "w", encoding="utf-8") as file:
file.write(content)
return f"Written {len(content.encode('utf-8'))} bytes to {path}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tools/file_ops.py` around lines 22 - 25, The write_file
function fails when path is filename-only and returns an incorrect byte count;
fix it in write_file by computing dir = os.path.dirname(path) and only calling
os.makedirs(dir, exist_ok=True) if dir is non-empty (or use dir or "."), then
write the content as before but report the actual number of bytes by computing
byte_count = len(content.encode("utf-8")) (or open in binary and write
content.encode("utf-8") and use the written byte length); update the return
message to use byte_count and reference the write_file function and the
variables path and content so the change is easy to locate.

Comment on lines +125 to +129
return ValidationResult(
passed=result.returncode == 0,
output=result.stdout + result.stderr,
retry_hint=result.stderr if result.returncode != 0 else None,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Send the full Vitest failure output back to the retry loop.

Vitest often prints assertion diffs and compiler diagnostics to stdout, so using only stderr for retry_hint can leave the next LLM attempt blind even though output has the useful details.

Reuse the combined output for `retry_hint`
-    return ValidationResult(
-        passed=result.returncode == 0,
-        output=result.stdout + result.stderr,
-        retry_hint=result.stderr if result.returncode != 0 else None,
-    )
+    combined_output = result.stdout + result.stderr
+    return ValidationResult(
+        passed=result.returncode == 0,
+        output=combined_output,
+        retry_hint=combined_output if result.returncode != 0 else None,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tools/test_runner.py` around lines 125 - 129, The
retry_hint currently uses only result.stderr which omits important Vitest
details printed to stdout; update the return value in test_runner.py (the
ValidationResult construction) so that retry_hint uses the combined output
(result.stdout + result.stderr) when result.returncode != 0, while keeping
passed as result.returncode == 0 and output as the combined output; locate the
ValidationResult instantiation in the function that returns it and replace the
retry_hint expression to reference the same combined output string.

@jiaenren
Copy link
Copy Markdown
Collaborator Author

jiaenren commented Apr 3, 2026

Closed in favor of #788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant