Skip to content

feat: update definitions generation#1002

Open
pdp2121 wants to merge 5 commits into
mainfrom
update-definitions-generation
Open

feat: update definitions generation#1002
pdp2121 wants to merge 5 commits into
mainfrom
update-definitions-generation

Conversation

@pdp2121

@pdp2121 pdp2121 commented May 26, 2026

Copy link
Copy Markdown
Collaborator

High Level Overview of Change

Replace the regex-based definitions.json generation with artifact-based approach that downloads server_definitions.json directly from rippled CI.

Added definitions validation step to CI (unit-test) to ensure definitions.json is being checked for correctness before any PR is merged.

Context of Change

rippled PR#6858 added server_definitions as artifact for every merged PRs, making it usable for downstream consumers

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Tests (You added tests for code that already exists, or your new feature included in this PR)
  • Documentation Updates
  • Release

Did you update CHANGELOG.md?

  • Yes
  • No, this change does not impact library users

Test Plan

CI passes.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@pdp2121, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 10 minutes and 1 second. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e232d7b0-bd18-4106-85b7-9801b61c0e67

📥 Commits

Reviewing files that changed from the base of the PR and between c50b6b1 and 14440a1.

📒 Files selected for processing (1)
  • CONTRIBUTING.md

Walkthrough

This PR modernizes the rippled definitions workflow by replacing header-parsing logic with GitHub Actions artifact retrieval, adding automated validation against a benchmark, and integrating validation into CI. Generate_definitions.py is completely rewritten to query artifacts; validate_definitions.py is introduced as a new validation layer; and Poe task definitions are restructured to coordinate the new commands.

Changes

Definitions Workflow Modernization

Layer / File(s) Summary
Poe Task Configuration
pyproject.toml
Task definitions restructured: definitions, validate_definitions, generate_tx_models, and test are now explicit Poe subtables with cmd, help, and argument wiring including ${BRANCH_OR_PR} for definitions and ${FILE_OR_GITHUB_PATH} for tx models.
Definitions Generation from CI Artifacts
tools/generate_definitions.py
Script rewritten to download server-definitions artifacts from GitHub Actions instead of parsing C++ headers. Resolves PR/branch specifiers, queries artifacts by PR head SHA then branch, downloads to temp directory, extracts server_definitions.json, and writes to output path with explicit error handling for missing gh CLI, absent artifacts, and extraction failures.
Definitions Validation Against Benchmark
tools/validate_definitions.py
New script downloads benchmark server_definitions.json from rippled develop branch via GitHub CLI, classifies JSON structure patterns (fields, simple_map, nested_map, format_array), validates required keys and per-key types, and compares local definitions entry-by-entry against the benchmark; prints detailed mismatch details and exits non-zero on failure.
CI Integration and Documentation
.github/workflows/unit_test.yml, CONTRIBUTING.md, .gitignore
Added validation step to lint-and-type-check job that runs poetry run poe validate_definitions with GH_TOKEN; updated CONTRIBUTING.md to document definitions workflow now sourced from rippled CI artifacts with pr:<id> and branch specifier forms; added .claude/worktrees/ to .gitignore.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as Definitions Script
  participant Args as Argument Parser
  participant GH as GitHub CLI
  participant Artifact as Artifact Download
  participant JSON as JSON Extraction
  participant Disk as Output File
  
  CLI->>Args: Parse source (pr/branch/fork)
  Args->>GH: Query PR/artifact info
  GH-->>Args: PR head SHA / branch name
  Args->>GH: Find artifact by SHA or branch
  GH-->>Args: Artifact run_id (non-expired)
  Args->>Artifact: Download artifact to temp dir
  Artifact-->>Args: Extraction complete
  Args->>JSON: Load server_definitions.json
  JSON-->>Args: Parsed definitions
  Args->>Disk: Write definitions.json (pretty-print)
  Disk-->>Args: Output path confirmed
Loading
sequenceDiagram
  participant Main as Validation Script
  participant GH as GitHub CLI
  participant Local as Local definitions.json
  participant Bench as Benchmark Artifact
  participant Classify as Structure Classifier
  participant Validate as Structural Validator
  participant Compare as Entry Comparator
  
  Main->>GH: Verify gh CLI available
  Main->>Local: Load local definitions.json
  Main->>GH: Download benchmark from develop
  GH-->>Bench: Artifact fetched and parsed
  Main->>Classify: Classify benchmark structure
  Classify-->>Main: Category map (fields/simple_map/etc)
  Main->>Validate: Enforce REQUIRED_KEYS and types
  Validate-->>Main: Structure valid or errors
  Main->>Compare: Compare by classification category
  Compare-->>Main: Mismatches detected (if any)
  Main->>Main: Exit with status (0=pass, 1=fail)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • XRPLF/xrpl-py#772: Directly related refactor of the same definitions generation script and Poe task wiring.

Suggested reviewers

  • mvadari
  • ckeshava
  • achowdhry-ripple
  • khancode

🐰 Hop, skip, and a leap so grand,
Definitions flow from artifact land,
No more headers to parse with care,
GitHub's artifacts floating in air!
Validation checks keep all things right,
The workflow now shines ever so bright! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: update definitions generation' clearly describes the main change: replacing regex-based definitions generation with artifact-based approach, which is the primary focus of the PR.
Description check ✅ Passed The PR description covers the high-level overview, context, type of change, and CHANGELOG consideration. However, it lacks substantive detail in the Test Plan section (only states 'CI passes').
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch update-definitions-generation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pyproject.toml`:
- Around line 94-97: The poe task definitions entry is passing BRANCH_OR_PR as a
variadic positional (multiple = true) causing argparse in
tools/generate_definitions.py (which expects a single optional positional arg,
nargs="?") to fail; change the task config for tool.poe.tasks.definitions so
that the BRANCH_OR_PR arg is not multiple (remove or set multiple = false) and
keep its default as "" so only zero or one positional value is forwarded to the
script, matching the script’s nargs="?" expectation.

In `@tools/generate_definitions.py`:
- Around line 27-31: The _exec function currently calls subprocess.run with
shell=True which allows command injection; change the signature of _exec(cmd:
list[str]) -> str (or similar) to accept an argv list and call
subprocess.run(cmd, shell=False, capture_output=True, text=True, check=True),
and update all callers that build gh api / gh run download commands (the call
sites that construct command strings for gh and the --jq filter) to pass a list
of arguments (separate "--jq" and its filter as its own list element) instead of
a single shell-quoted string so no shell interpolation occurs.

In `@tools/validate_definitions.py`:
- Around line 240-246: The loop over fields in the format_array validation
assumes each item is a dict and calls field.get(...), causing AttributeError for
non-dict entries; update the loop in the format_array validation (the for j,
field in enumerate(fields) block) to first check isinstance(field, dict) and if
not, append a validation error like '{key} "{type_name}"[{j}]: field must be an
object/dict' and continue, otherwise proceed to check field.get("name") and
field.get("optionality") as before.
- Around line 35-38: Replace the unsafe shell invocation in _exec: stop passing
a single command string with shell=True to subprocess.run and instead
accept/construct an argv-style list (or split the command safely) and call
subprocess.run(..., shell=False, args=[...], capture_output=True, text=True,
check=True) so commands are executed without a shell to prevent injection; and
in _validate_structure's format_array branch ensure you validate each item is a
dict before using field.get by adding an explicit type check (e.g., if not
isinstance(field, dict): record a structured validation error for that item) and
only then call field.get(...) so malformed inputs produce validation errors
rather than AttributeError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7de9c1c2-acf3-4074-ad15-c0625317fb46

📥 Commits

Reviewing files that changed from the base of the PR and between 69573b2 and c50b6b1.

📒 Files selected for processing (7)
  • .github/workflows/unit_test.yml
  • .gitignore
  • CONTRIBUTING.md
  • pyproject.toml
  • tools/generate_definitions.py
  • tools/validate_definitions.py
  • xrpl/core/binarycodec/definitions/definitions.json

Comment thread pyproject.toml
Comment on lines +94 to +97
[tool.poe.tasks.definitions]
help = "Generate definitions.json from rippled CI artifacts"
cmd = "python3 tools/generate_definitions.py ${BRANCH_OR_PR}"
args = [{ name = "BRANCH_OR_PR", positional = true, default = "", multiple = true }]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and show the relevant Poe task config in pyproject.toml
echo "=== pyproject.toml: tool.poe.tasks.definitions ==="
rg -n --hidden --no-ignore -S "tool\.poe\.tasks\.definitions|\\[tool\\.poe\\.tasks\\.definitions\\]" pyproject.toml || true
sed -n '80,120p' pyproject.toml

# Locate the script and inspect its CLI/argparse handling
echo
echo "=== tools/generate_definitions.py argparse ==="
ls -la tools/generate_definitions.py 2>/dev/null || true
rg -n --hidden --no-ignore -S "argparse|add_argument|nargs=|sys\.argv" tools/generate_definitions.py || true
python3 - <<'PY'
import itertools
from pathlib import Path
p=Path("tools/generate_definitions.py")
if not p.exists():
    print("tools/generate_definitions.py not found")
    raise SystemExit(0)
lines=p.read_text(encoding="utf-8").splitlines()
for i in range(0,len(lines),1):
    if "ArgumentParser" in lines[i] or "add_argument" in lines[i] or "nargs" in lines[i]:
        start=max(0,i-5); end=min(len(lines), i+15)
        print("\n--- snippet around line", i+1, "---")
        for j in range(start,end):
            print(f"{j+1:4d}: {lines[j]}")
PY

# Check for any other parser entrypoints or wrapper scripts
echo
echo "=== usages of generate_definitions.py ==="
rg -n --hidden --no-ignore -S "generate_definitions\.py" . || true

Repository: XRPLF/xrpl-py

Length of output: 5021


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "=== Context around tool.poe.tasks.definitions in pyproject.toml ==="
sed -n '60,120p' pyproject.toml

echo
echo "=== Search for BRANCH_OR_PR and multiple=true usage ==="
rg -n --hidden --no-ignore -S "BRANCH_OR_PR|multiple\s*=\s*true" pyproject.toml

Repository: XRPLF/xrpl-py

Length of output: 2109


🌐 Web query:

Poe the poet multiple=true positional args expansion into command line

💡 Result:

In "poethepoet," you can handle multiple values for a positional argument by setting the "multiple" option to true in your task configuration [1][2]. When this is enabled, the task will accept one or more values for that argument [1]. Key rules and usage: 1. Positioning: Only the last positional argument in a task's argument list may have the "multiple" option set to true [1][2]. 2. Availability: For cmd tasks, the values are provided as a single string of whitespace-separated values, which can be injected into your command using templates (e.g., ${ARG_NAME}) [1][2]. For script tasks, the values are passed as a list of values to your Python function [1]. 3. Example Configuration: [tool.poe.tasks.save] cmd = "echo ${FILE_PATHS}" args = [{ name = "FILE_PATHS", positional = true, multiple = true }] When you run this task from the command line, extra arguments provided after the task name are captured into the variable [1][3][4]: poe save file1.txt file2.txt file3.txt # Results in: echo file1.txt file2.txt file3.txt Note that "multiple" is not compatible with arguments of type "boolean" [1][2]. If you need to handle additional arbitrary arguments that do not fit into a defined positional structure, recent versions of poethepoet also provide mechanisms for forwarding "free" arguments (e.g., via $POE_EXTRA_ARGS) [3][5].

Citations:


Fix Poe positional argument cardinality for definitions.

tools/generate_definitions.py accepts exactly one optional positional argument (source, nargs="?"), but pyproject.toml configures ${BRANCH_OR_PR} with multiple = true, so supplying more than one value can forward extra positional args and make argparse fail (“unrecognized/too many positional arguments”).

Suggested fix
 [tool.poe.tasks.definitions]
 help = "Generate definitions.json from rippled CI artifacts"
 cmd = "python3 tools/generate_definitions.py ${BRANCH_OR_PR}"
-args = [{ name = "BRANCH_OR_PR", positional = true, default = "", multiple = true }]
+args = [{ name = "BRANCH_OR_PR", positional = true, default = "" }]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[tool.poe.tasks.definitions]
help = "Generate definitions.json from rippled CI artifacts"
cmd = "python3 tools/generate_definitions.py ${BRANCH_OR_PR}"
args = [{ name = "BRANCH_OR_PR", positional = true, default = "", multiple = true }]
[tool.poe.tasks.definitions]
help = "Generate definitions.json from rippled CI artifacts"
cmd = "python3 tools/generate_definitions.py ${BRANCH_OR_PR}"
args = [{ name = "BRANCH_OR_PR", positional = true, default = "" }]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 94 - 97, The poe task definitions entry is
passing BRANCH_OR_PR as a variadic positional (multiple = true) causing argparse
in tools/generate_definitions.py (which expects a single optional positional
arg, nargs="?") to fail; change the task config for tool.poe.tasks.definitions
so that the BRANCH_OR_PR arg is not multiple (remove or set multiple = false)
and keep its default as "" so only zero or one positional value is forwarded to
the script, matching the script’s nargs="?" expectation.

Comment on lines +27 to +31
def _exec(cmd: str) -> str:
"""Run a shell command and return its stripped stdout."""
result = subprocess.run(
cmd, shell=True, capture_output=True, text=True, check=True
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the relevant section of the file
echo "=== tools/generate_definitions.py (lines 1-120) ==="
nl -ba tools/generate_definitions.py | sed -n '1,120p'

echo
echo "=== usages of _exec in tools/generate_definitions.py ==="
rg -n "_exec\\(" tools/generate_definitions.py

echo
echo "=== show call sites around each _exec usage ==="
# Print ~20 lines around each usage line number
rg -n "_exec\\(" tools/generate_definitions.py | cut -d: -f1 | while read -r ln; do
  start=$((ln-15)); end=$((ln+25));
  nl -ba tools/generate_definitions.py | sed -n "${start},${end}p" | sed "s/^/  /"
  echo "---"
done

echo
echo "=== find where CLI args parsed (likely argparse/typer/click) ==="
rg -n "argparse|click|typer|ArgumentParser|add_argument\\(|sys\\.argv" tools/generate_definitions.py

Repository: XRPLF/xrpl-py

Length of output: 170


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="tools/generate_definitions.py"
if [ ! -f "$FILE" ]; then
  echo "Missing $FILE"
  exit 1
fi

echo "=== tools/generate_definitions.py (lines 1-140) with numbers ==="
awk '{printf "%5d %s\n", NR, $0}' "$FILE" | sed -n '1,140p'

echo
echo "=== usages of _exec in tools/generate_definitions.py ==="
rg -n "_exec\\(" "$FILE" || true

echo
echo "=== show call-site neighborhoods for _exec usages ==="
rg -n "_exec\\(" "$FILE" | cut -d: -f1 | while read -r ln; do
  start=$((ln-20)); end=$((ln+40));
  awk '{printf "%5d %s\n", NR, $0}' "$FILE" | sed -n "${start},${end}p" | sed 's/^/  /'
  echo "---"
done

echo
echo "=== locate CLI parsing/inputs in this file ==="
rg -n "argparse|click|typer|add_argument\\(|sys\\.argv" "$FILE" || true
echo
echo "=== locate any references to branch/owner/pr/fork (common variable names) ==="
rg -n "branch|owner|pr\\b|fork|pull_request|github\\.com" "$FILE" || true

Repository: XRPLF/xrpl-py

Length of output: 24622


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="tools/generate_definitions.py"

awk 'NR>=150 && NR<=310 {printf "%5d %s\n", NR, $0}' "$FILE"

Repository: XRPLF/xrpl-py

Length of output: 5243


Remove shell=True from _exec in tools/generate_definitions.py.

_exec (lines 27-32) runs subprocess.run(cmd, shell=True, ...) where cmd is a shell string composed from CLI source (parsed into pr_number, branch, and fork_owner) and interpolated into gh api / gh run download commands (e.g., call sites at ~53, 69, 90, 104, 120). This enables command injection if the input contains shell metacharacters/quotes. Refactor _exec to take list[str] and call subprocess.run(..., shell=False); update call sites to build argv lists end-to-end (including passing the --jq filter as its own argument instead of embedding shell-quoted text).

🧰 Tools
🪛 OpenGrep (1.22.0)

[ERROR] 29-31: Dynamic command passed to subprocess with shell=True. Use a command list without shell=True, or use shlex.quote() to sanitize input.

(coderabbit.command-injection.python-shell-true)

🪛 Ruff (0.15.14)

[error] 29-29: subprocess call with shell=True identified, security issue

(S602)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/generate_definitions.py` around lines 27 - 31, The _exec function
currently calls subprocess.run with shell=True which allows command injection;
change the signature of _exec(cmd: list[str]) -> str (or similar) to accept an
argv list and call subprocess.run(cmd, shell=False, capture_output=True,
text=True, check=True), and update all callers that build gh api / gh run
download commands (the call sites that construct command strings for gh and the
--jq filter) to pass a list of arguments (separate "--jq" and its filter as its
own list element) instead of a single shell-quoted string so no shell
interpolation occurs.

Comment on lines +35 to +38
def _exec(cmd: str) -> str:
result = subprocess.run(
cmd, shell=True, capture_output=True, text=True, check=True
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and show the relevant sections of tools/validate_definitions.py
FILE="tools/validate_definitions.py"

echo "== _exec definition =="
nl -ba "$FILE" | sed -n '1,140p' | sed -n '30,90p'

echo
echo "== validate structure around format_array =="
nl -ba "$FILE" | sed -n '200,320p'

echo
echo "== Search for _validate_structure and field handling =="
rg -n "def _validate_structure|format_array|field\.get\(" "$FILE" || true

Repository: XRPLF/xrpl-py

Length of output: 141


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="tools/validate_definitions.py"

echo "== _exec definition =="
cat -n "$FILE" | sed -n '30,90p'

echo
echo "== validate structure around format_array =="
cat -n "$FILE" | sed -n '200,320p'

echo
echo "== Search for _validate_structure and field handling =="
rg -n "def _validate_structure|format_array|field\.get\(" "$FILE" || true

Repository: XRPLF/xrpl-py

Length of output: 8865


Harden command execution and validation logic in tools/validate_definitions.py

  • Remove shell=True from _exec (subprocess.run(..., shell=True, ...), lines 35-38) and switch to argv-style invocation to avoid command-injection risks from dynamically built command strings.
  • In _validate_structure’s format_array branch, validate that each field is a dict before calling field.get(...) (lines 240-248) to prevent malformed input from raising AttributeError instead of producing structured validation errors.
🧰 Tools
🪛 OpenGrep (1.22.0)

[ERROR] 36-38: Dynamic command passed to subprocess with shell=True. Use a command list without shell=True, or use shlex.quote() to sanitize input.

(coderabbit.command-injection.python-shell-true)

🪛 Ruff (0.15.14)

[error] 36-36: subprocess call with shell=True identified, security issue

(S602)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/validate_definitions.py` around lines 35 - 38, Replace the unsafe shell
invocation in _exec: stop passing a single command string with shell=True to
subprocess.run and instead accept/construct an argv-style list (or split the
command safely) and call subprocess.run(..., shell=False, args=[...],
capture_output=True, text=True, check=True) so commands are executed without a
shell to prevent injection; and in _validate_structure's format_array branch
ensure you validate each item is a dict before using field.get by adding an
explicit type check (e.g., if not isinstance(field, dict): record a structured
validation error for that item) and only then call field.get(...) so malformed
inputs produce validation errors rather than AttributeError.

Comment on lines +240 to +246
for j, field in enumerate(fields):
if not isinstance(field.get("name"), str):
errors.append(
f'{key} "{type_name}"[{j}]: "name" must be a string'
)
if not isinstance(field.get("optionality"), int):
errors.append(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-dict entries before .get() in format_array validation.

If fields contains a non-dict entry, field.get(...) crashes with AttributeError instead of reporting a validation error.

Suggested fix
                 for j, field in enumerate(fields):
+                    if not isinstance(field, dict):
+                        errors.append(
+                            f'{key} "{type_name}"[{j}]: entry must be an object'
+                        )
+                        continue
                     if not isinstance(field.get("name"), str):
                         errors.append(
                             f'{key} "{type_name}"[{j}]: "name" must be a string'
                         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for j, field in enumerate(fields):
if not isinstance(field.get("name"), str):
errors.append(
f'{key} "{type_name}"[{j}]: "name" must be a string'
)
if not isinstance(field.get("optionality"), int):
errors.append(
for j, field in enumerate(fields):
if not isinstance(field, dict):
errors.append(
f'{key} "{type_name}"[{j}]: entry must be an object'
)
continue
if not isinstance(field.get("name"), str):
errors.append(
f'{key} "{type_name}"[{j}]: "name" must be a string'
)
if not isinstance(field.get("optionality"), int):
errors.append(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/validate_definitions.py` around lines 240 - 246, The loop over fields
in the format_array validation assumes each item is a dict and calls
field.get(...), causing AttributeError for non-dict entries; update the loop in
the format_array validation (the for j, field in enumerate(fields) block) to
first check isinstance(field, dict) and if not, append a validation error like
'{key} "{type_name}"[{j}]: field must be an object/dict' and continue, otherwise
proceed to check field.get("name") and field.get("optionality") as before.

raw = _exec(
f'gh api "repos/{repo}/actions/artifacts'
f"?name={ARTIFACT_NAME}&per_page=50\""
f" --jq '[.artifacts[] | select(.workflow_run.head_branch == \"{branch}\""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Severity: MEDIUM

The branch parameter is interpolated into a shell=True command via _exec. When called from the pr:NUMBER flow (line 236), branch is the GitHub API's headRefName. Git/GitHub allow single quotes in branch names, which break out of the jq argument's single-quoted shell context, enabling arbitrary command execution. Confirmed via PoC: a branch named test';touch /tmp/pwned;echo ' executes touch /tmp/pwned.
Helpful? Add 👍 / 👎

💡 Fix Suggestion

Suggestion: Refactor _find_artifact_by_branch (and similarly _find_artifact_by_sha and _find_pr_for_fork_branch) to avoid interpolating user-controlled values into shell commands. The recommended approach is to:

  1. Remove the --jq flag from the gh api call and retrieve raw JSON instead.
  2. Parse the JSON response in Python using json.loads().
  3. Perform the filtering (e.g., matching workflow_run.head_branch) in Python code, which eliminates the shell injection vector entirely.

For example, replace _find_artifact_by_branch with something like:

def _find_artifact_by_branch(repo: str, branch: str) -> str | None:
    try:
        raw = _exec(
            f'gh api "repos/{repo}/actions/artifacts'
            f'?name={ARTIFACT_NAME}&per_page=50"'
        )
        data = json.loads(raw)
        for artifact in data.get("artifacts", []):
            wr = artifact.get("workflow_run", {})
            if wr.get("head_branch") == branch and not artifact.get("expired", True):
                return str(wr.get("id"))
    except (subprocess.CalledProcessError, json.JSONDecodeError):
        pass
    return None

Apply the same pattern to _find_artifact_by_sha (filter on head_sha instead) and _find_pr_for_fork_branch. Also consider refactoring _exec to accept a list of arguments and avoid shell=True entirely, or at minimum validate/sanitize repo parameter which is also interpolated.

@@ -1,376 +1,288 @@
"""Script to generate the definitions.json file from rippled source code."""
"""Script to generate the definitions.json file from rippled CI artifacts.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Phu, Based on XRPLF/rippled#6858 and checking with Claude, the server-definitions artifact actually gets produced and retained for 3 days only when a cron job is run on each weekday (and not on every commit into develop branch, so many runs don't have those files). That means:

  1. The pr:, owner:branch, and arbitrary-branch flows do exactly the right API queries — but no PR-triggered or branch-push-triggered run ever uploads the artifact, so they consistently come up empty. So do we need to fix the upstream rippled workflow to allow file generate on every commit (on develop and feature branches)?

  2. Also the poetry run poe definitions won't give server-definitions for the latest commit on develop branch, but instead it would give the file generated based on the most recent run, due to same reason as upstream generates the files on a scheduled basis. And if there are no commits for > 3 days, the poetry run poe definitions starts failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants