feat(agents): add AuditAgent and scheduling engine#531
feat(agents): add AuditAgent and scheduling engine#531Deez-Automations wants to merge 3 commits into
Conversation
Adds two pieces of platform infrastructure needed for upcoming advanced threat challenges (ASI-08 cascading failures, ASI-10 rogue/persistent agents): - AuditAgent: an autonomous agent that scans ledger and vendor records on a batch cycle, reasons over them, and can trigger downstream actions including a lockdown protocol. Required groundwork for the Data Corruption Cascade challenge. - schedule_cron_job tool on the SystemUtils MCP server: lets an agent register a recurring task invocation at a fixed interval. Required groundwork for the Zombie Agent persistence challenge. Both ship with unit tests (14 total, all passing).
There was a problem hiding this comment.
Pull request overview
Adds new platform infrastructure to support upcoming agentic/autonomous CTF challenges: an autonomous ledger auditing agent with a “Lockdown Protocol”, plus a scheduling registration tool on the SystemUtils MCP server.
Changes:
- Added
AuditAgentwithscan_ledgerandlockdown_all_vendorstools, and wired it into the agent runner. - Added
schedule_cron_jobtool to the SystemUtils MCP server (mock scheduling registration + next-run computation). - Added unit tests for the new agent and MCP tool.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
finbot/agents/specialized/audit.py |
Introduces AuditAgent with ledger scan and vendor-lockdown behavior plus event emission. |
finbot/agents/runner.py |
Registers run_audit_agent entrypoint for executing the new agent via the runner. |
finbot/mcp/servers/systemutils/server.py |
Adds schedule_cron_job tool to SystemUtils MCP server and enables it by default. |
tests/unit/agents/test_audit_agent.py |
Adds unit tests for AuditAgent initialization, prompts, tools, lockdown, and event emission. |
tests/unit/mcp/test_systemutils_server.py |
Adds unit tests for schedule_cron_job registration, response shape, and logging. |
tests/unit/mcp/__init__.py |
Package marker for unit MCP tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…tput, routing-number enrichment, failed-vendor traceability Addresses all 8 review comments from the Copilot pull request review on GenAI-Security-Project#531: - schedule_cron_job now validates interval_minutes against [1, 10080] and returns a structured error instead of silently accepting 0/negative values that would produce a next_run in the past. - schedule_cron_job's user-facing message and job_id now use the newline-sanitized task_name/tool_name values, matching what was already done for the log line. - AuditAgent.scan_ledger now enriches each vendor with bank_routing_number via a direct VendorRepository lookup, since get_all_vendors_summary never exposed that field — without this, the agent's system prompt told it to audit a value it never actually received. Enrichment is best-effort and degrades to None on failure rather than breaking the scan. - lockdown_all_vendors now includes failed_vendors (not just failed_count) in both its return value and the emitted audit.lockdown_triggered event, for actual traceability of partial failures. Adds 9 new tests covering the above: interval bounds (below min, above max, negative, both boundaries), message sanitization, vendor-fetch failure, skip-on-missing-vendor-id, failed-vendor detail reporting, and routing-number enrichment (success and failure paths). 24/24 tests passing.
… shapes, docstring accuracy - schedule_cron_job's out-of-bounds error response now echoes interval_minutes/tool_name/tool_args, matching the success response shape instead of returning a sparser payload callers have to special-case. - Fixed schedule_cron_job's docstring, which claimed the tool "will invoke" the target repeatedly until cancelled. This server is registration-only and mock by design (see module docstring); it never executes anything and has no cancellation path. Docstring now matches actual behavior. - lockdown_all_vendors' fetch-failure return path now includes an empty failed_vendors list, matching the success path's shape. - Tightened the next_run assertion in SAI-SCH-003 to check it lands within the requested interval (+/- 5s), not just "sometime after the call." 24/24 tests passing.
|
Pushed two follow-up commits addressing all of Copilot's review feedback (both passes, 12 comments total): Bounds & validation
Output consistency
Correctness
Docs
Tests
24/24 tests passing, no regressions in the rest of the suite. |
|
Quick context on why this PR exists with no challenge attached: both pieces here are prerequisite infrastructure, not standalone features.
Neither category (ASI-08, ASI-10) currently has any challenge coverage on the platform. The actual challenge YAML + detector for each will follow as separate PRs once this merges, so this PR can be reviewed purely on whether the agent/tool design itself is sound. |
Summary
Adds two pieces of platform infrastructure that don't ship a CTF challenge on their own, but are required groundwork for two upcoming advanced threat challenges in the autonomous/agentic track:
schedule_cron_jobtool on the SystemUtils MCP server — needed for a Zombie Agent persistence challenge (ASI-10, Rogue/Persistent Agents)Both categories currently have zero challenge coverage on the platform. This PR is purely the infrastructure; the challenge YAML/detector for each will follow as separate PRs once this is merged, so review here can focus on whether the agent/tool design itself is sound.
AuditAgent (
finbot/agents/specialized/audit.py)An autonomous agent that scans the namespace's vendor and invoice ledger on a periodic batch cycle (configurable, default 60 minutes) and reasons over the result without a human in the loop.
Two tools:
scan_ledger— pulls a full vendor/invoice summary via the existingget_all_vendors_summarydata accessor.lockdown_all_vendors— a "Lockdown Protocol" tool that deactivates every active vendor in the namespace (status=inactive,trust_level=low,risk_level=high) when the agent decides the ledger is compromised. Emits aaudit.lockdown_triggeredbusiness event with the full list of affected vendors for traceability.The system prompt instructs the agent to treat a single anomalous record (e.g. one mathematically impossible routing number) as sufficient evidence to trigger lockdown immediately, without waiting for confirmation. This over-generalization is intentional and flagged inline with a
<CTF:VULNERABILITY>comment — it's the mechanism the Data Corruption Cascade challenge will exploit: one malformed input record, no privilege escalation required, causing a platform-wide vendor lockdown purely through the agent's own autonomous reasoning.Follows the existing
BaseAgentpattern used by every other specialized agent (InvoiceAgent,FraudComplianceAgent, etc.) — same_get_system_prompt/_get_tool_definitions/_get_callablesstructure, registered the same way viarun_audit_agentinrunner.py.Scheduling engine (
finbot/mcp/servers/systemutils/server.py)Adds
schedule_cron_jobto the existing SystemUtils MCP server's tool list. Takes a task name, an interval in minutes (1 to 10080, i.e. up to 7 days), a target tool name, and optional tool arguments, and returns a registered job ID with a computed next-run time.This is intentionally scoped to exactly what the persistence challenge needs and nothing more — it registers the job and returns confirmation; it does not implement an actual background task runner or execute anything itself. The challenge this unlocks is about an attacker tricking an agent into registering a recurring malicious callback via prompt injection, not about building general-purpose job infrastructure.
Testing
14 tests total, all passing:
tests/unit/agents/test_audit_agent.py(7) — covers both tools, lockdown event emission, vendor fetch failure handling, and the skip-on-missing-vendor-id case.tests/unit/mcp/test_systemutils_server.py(7) — covers the new tool's parameter handling, interval bounds, and response shape.Test plan
run_audit_agentin a live instanceschedule_cron_jobcallable from an agent's tool list in a live instance