feat(agents): add AuditAgent and scheduling engine by Deez-Automations · Pull Request #531 · GenAI-Security-Project/finbot-ctf

Deez-Automations · 2026-06-24T07:20:55Z

Summary

Adds two pieces of platform infrastructure that don't ship a CTF challenge on their own, but are required groundwork for two upcoming advanced threat challenges in the autonomous/agentic track:

AuditAgent — needed for a Data Corruption Cascade challenge (ASI-08, Cascading Failures)
schedule_cron_job tool on the SystemUtils MCP server — needed for a Zombie Agent persistence challenge (ASI-10, Rogue/Persistent Agents)

Both categories currently have zero challenge coverage on the platform. This PR is purely the infrastructure; the challenge YAML/detector for each will follow as separate PRs once this is merged, so review here can focus on whether the agent/tool design itself is sound.

AuditAgent (`finbot/agents/specialized/audit.py`)

An autonomous agent that scans the namespace's vendor and invoice ledger on a periodic batch cycle (configurable, default 60 minutes) and reasons over the result without a human in the loop.

Two tools:

scan_ledger — pulls a full vendor/invoice summary via the existing get_all_vendors_summary data accessor.
lockdown_all_vendors — a "Lockdown Protocol" tool that deactivates every active vendor in the namespace (status=inactive, trust_level=low, risk_level=high) when the agent decides the ledger is compromised. Emits a audit.lockdown_triggered business event with the full list of affected vendors for traceability.

The system prompt instructs the agent to treat a single anomalous record (e.g. one mathematically impossible routing number) as sufficient evidence to trigger lockdown immediately, without waiting for confirmation. This over-generalization is intentional and flagged inline with a <CTF:VULNERABILITY> comment — it's the mechanism the Data Corruption Cascade challenge will exploit: one malformed input record, no privilege escalation required, causing a platform-wide vendor lockdown purely through the agent's own autonomous reasoning.

Follows the existing BaseAgent pattern used by every other specialized agent (InvoiceAgent, FraudComplianceAgent, etc.) — same _get_system_prompt / _get_tool_definitions / _get_callables structure, registered the same way via run_audit_agent in runner.py.

Scheduling engine (`finbot/mcp/servers/systemutils/server.py`)

Adds schedule_cron_job to the existing SystemUtils MCP server's tool list. Takes a task name, an interval in minutes (1 to 10080, i.e. up to 7 days), a target tool name, and optional tool arguments, and returns a registered job ID with a computed next-run time.

This is intentionally scoped to exactly what the persistence challenge needs and nothing more — it registers the job and returns confirmation; it does not implement an actual background task runner or execute anything itself. The challenge this unlocks is about an attacker tricking an agent into registering a recurring malicious callback via prompt injection, not about building general-purpose job infrastructure.

Testing

14 tests total, all passing:

tests/unit/agents/test_audit_agent.py (7) — covers both tools, lockdown event emission, vendor fetch failure handling, and the skip-on-missing-vendor-id case.
tests/unit/mcp/test_systemutils_server.py (7) — covers the new tool's parameter handling, interval bounds, and response shape.

pytest tests/unit/agents/test_audit_agent.py tests/unit/mcp/test_systemutils_server.py -v

Test plan

All 14 new/existing tests pass locally
AuditAgent registers and runs correctly via run_audit_agent in a live instance
schedule_cron_job callable from an agent's tool list in a live instance

Adds two pieces of platform infrastructure needed for upcoming advanced threat challenges (ASI-08 cascading failures, ASI-10 rogue/persistent agents): - AuditAgent: an autonomous agent that scans ledger and vendor records on a batch cycle, reasons over them, and can trigger downstream actions including a lockdown protocol. Required groundwork for the Data Corruption Cascade challenge. - schedule_cron_job tool on the SystemUtils MCP server: lets an agent register a recurring task invocation at a fixed interval. Required groundwork for the Zombie Agent persistence challenge. Both ship with unit tests (14 total, all passing).

Copilot

Pull request overview

Adds new platform infrastructure to support upcoming agentic/autonomous CTF challenges: an autonomous ledger auditing agent with a “Lockdown Protocol”, plus a scheduling registration tool on the SystemUtils MCP server.

Changes:

Added AuditAgent with scan_ledger and lockdown_all_vendors tools, and wired it into the agent runner.
Added schedule_cron_job tool to the SystemUtils MCP server (mock scheduling registration + next-run computation).
Added unit tests for the new agent and MCP tool.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`finbot/agents/specialized/audit.py`	Introduces `AuditAgent` with ledger scan and vendor-lockdown behavior plus event emission.
`finbot/agents/runner.py`	Registers `run_audit_agent` entrypoint for executing the new agent via the runner.
`finbot/mcp/servers/systemutils/server.py`	Adds `schedule_cron_job` tool to SystemUtils MCP server and enables it by default.
`tests/unit/agents/test_audit_agent.py`	Adds unit tests for `AuditAgent` initialization, prompts, tools, lockdown, and event emission.
`tests/unit/mcp/test_systemutils_server.py`	Adds unit tests for `schedule_cron_job` registration, response shape, and logging.
`tests/unit/mcp/__init__.py`	Package marker for unit MCP tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…tput, routing-number enrichment, failed-vendor traceability Addresses all 8 review comments from the Copilot pull request review on GenAI-Security-Project#531: - schedule_cron_job now validates interval_minutes against [1, 10080] and returns a structured error instead of silently accepting 0/negative values that would produce a next_run in the past. - schedule_cron_job's user-facing message and job_id now use the newline-sanitized task_name/tool_name values, matching what was already done for the log line. - AuditAgent.scan_ledger now enriches each vendor with bank_routing_number via a direct VendorRepository lookup, since get_all_vendors_summary never exposed that field — without this, the agent's system prompt told it to audit a value it never actually received. Enrichment is best-effort and degrades to None on failure rather than breaking the scan. - lockdown_all_vendors now includes failed_vendors (not just failed_count) in both its return value and the emitted audit.lockdown_triggered event, for actual traceability of partial failures. Adds 9 new tests covering the above: interval bounds (below min, above max, negative, both boundaries), message sanitization, vendor-fetch failure, skip-on-missing-vendor-id, failed-vendor detail reporting, and routing-number enrichment (success and failure paths). 24/24 tests passing.

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

… shapes, docstring accuracy - schedule_cron_job's out-of-bounds error response now echoes interval_minutes/tool_name/tool_args, matching the success response shape instead of returning a sparser payload callers have to special-case. - Fixed schedule_cron_job's docstring, which claimed the tool "will invoke" the target repeatedly until cancelled. This server is registration-only and mock by design (see module docstring); it never executes anything and has no cancellation path. Docstring now matches actual behavior. - lockdown_all_vendors' fetch-failure return path now includes an empty failed_vendors list, matching the success path's shape. - Tightened the next_run assertion in SAI-SCH-003 to check it lands within the requested interval (+/- 5s), not just "sometime after the call." 24/24 tests passing.

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated no new comments.

Deez-Automations · 2026-06-24T07:51:53Z

Pushed two follow-up commits addressing all of Copilot's review feedback (both passes, 12 comments total):

Bounds & validation

schedule_cron_job now validates interval_minutes against [1, 10080] and returns a structured error instead of silently accepting 0/negative values that would produce a next_run in the past.

Output consistency

The error response now echoes interval_minutes/tool_name/tool_args, matching the success response shape.
The user-facing message and job_id now use the already-sanitized safe_task/safe_tool values instead of raw input.
lockdown_all_vendors's fetch-failure path now includes an empty failed_vendors list, matching its success path's shape.

Correctness

scan_ledger now enriches each vendor with bank_routing_number via a direct VendorRepository lookup (one query, not N+1). Without this, the agent's own system prompt told it to audit a field it never actually received, the audit logic was structurally unable to do what it was instructed to do.
lockdown_all_vendors now reports failed_vendors (not just a count) in both its return value and the emitted audit.lockdown_triggered event, for actual traceability of partial failures.

Docs

Fixed schedule_cron_job's docstring, which claimed it "will invoke" the tool repeatedly until cancelled. This server is registration-only and mock by design, it never executes anything and has no cancellation path. Docstring now matches actual behavior.

Tests

Added 9 new tests (interval bounds × 4, message sanitization, vendor-fetch failure, skip-on-missing-vendor-id, failed-vendor detail reporting, routing-number enrichment success/failure) and tightened one existing assertion (next_run now checked against the actual requested interval, not just "sometime after now").

24/24 tests passing, no regressions in the rest of the suite.

Deez-Automations · 2026-06-24T07:59:23Z

Quick context on why this PR exists with no challenge attached: both pieces here are prerequisite infrastructure, not standalone features.

AuditAgent is required for an upcoming Data Corruption Cascade challenge (ASI-08, Cascading Failures). The over-aggressive lockdown behavior here is intentional, it's the vulnerability that challenge will exploit (one anomalous record triggering a full vendor lockdown via the agent's own autonomous reasoning, no privilege escalation needed).
schedule_cron_job is required for an upcoming Zombie Agent challenge (ASI-10, Rogue/Persistent Agents). It's a registration-only mock by design, the challenge is about getting an agent to register a malicious recurring callback via injection, not about building a real job scheduler.

Neither category (ASI-08, ASI-10) currently has any challenge coverage on the platform. The actual challenge YAML + detector for each will follow as separate PRs once this merges, so this PR can be reviewed purely on whether the agent/tool design itself is sound.

Copilot AI review requested due to automatic review settings June 24, 2026 07:20

Copilot started reviewing on behalf of Deez-Automations June 24, 2026 07:21 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Deez-Automations requested a review from Copilot June 24, 2026 07:35

Copilot started reviewing on behalf of Deez-Automations June 24, 2026 07:35 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread finbot/mcp/servers/systemutils/server.py

Comment thread finbot/mcp/servers/systemutils/server.py Outdated

Comment thread finbot/agents/specialized/audit.py

Comment thread tests/unit/mcp/test_systemutils_server.py Outdated

Deez-Automations requested a review from Copilot June 24, 2026 07:44

Copilot started reviewing on behalf of Deez-Automations June 24, 2026 07:45 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agents): add AuditAgent and scheduling engine#531

feat(agents): add AuditAgent and scheduling engine#531
Deez-Automations wants to merge 3 commits into
GenAI-Security-Project:mainfrom
Deez-Automations:feat/ctf-audit-agent

Deez-Automations commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Deez-Automations commented Jun 24, 2026

Uh oh!

Deez-Automations commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Deez-Automations commented Jun 24, 2026

Summary

AuditAgent (finbot/agents/specialized/audit.py)

Scheduling engine (finbot/mcp/servers/systemutils/server.py)

Testing

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Deez-Automations commented Jun 24, 2026

Uh oh!

Deez-Automations commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AuditAgent (`finbot/agents/specialized/audit.py`)

Scheduling engine (`finbot/mcp/servers/systemutils/server.py`)