Add HTTP bootstrapper plugin#424
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new drasi-bootstrap-http bootstrapper plugin crate that can fetch initial graph state from HTTP REST APIs and emit bootstrap events, including support for pagination, authentication, and response-to-element mapping.
Changes:
- Introduces a new HTTP bootstrapper provider with multi-endpoint fetching, auth strategies, pagination, and template-based mapping.
- Adds DTO + OpenAPI schema/descriptor integration for plugin config deserialization and validation.
- Adds extensive integration tests using real axum HTTP servers across pagination/auth/content-type scenarios.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| components/bootstrappers/http/tests/integration_test.rs | End-to-end tests for pagination/auth/parsing/mapping and descriptor pipeline |
| components/bootstrappers/http/src/template_engine.rs | Handlebars-based template rendering with “simple path” type preservation |
| components/bootstrappers/http/src/response.rs | Response item extraction + mapping items to Drasi Elements |
| components/bootstrappers/http/src/provider.rs | Main bootstrap provider: request loop, pagination orchestration, retry, label filtering |
| components/bootstrappers/http/src/pagination.rs | Pagination strategy implementations + lightweight JSONPath-like navigation |
| components/bootstrappers/http/src/lib.rs | Plugin crate module exports + dynamic plugin entrypoint |
| components/bootstrappers/http/src/descriptor.rs | DTO types, schema export, DTO→config mapping, provider construction |
| components/bootstrappers/http/src/content_parser.rs | Content-Type detection and JSON/YAML/XML body parsing |
| components/bootstrappers/http/src/config.rs | Core config types + validation logic |
| components/bootstrappers/http/src/auth.rs | Auth resolution (env-driven) + OAuth2 client-credentials token caching |
| components/bootstrappers/http/README.md | User-facing configuration and examples |
| components/bootstrappers/http/Cargo.toml | New crate manifest and dependencies |
| Cargo.toml | Adds the new crate to the workspace members |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…rsion, error truncation - Cycle detection uses HashSet to catch multi-hop cycles (A→B→A), not just immediate repeats - Retry backoff is now exponential (base * 2^(attempt-1)) matching docs - JSON objects convert recursively to ElementValue::Object instead of being stringified - Error response bodies truncated to 256 chars to prevent PII leakage Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ency Force HTTP/1.1 to prevent body corruption when multiple bootstrap calls share the same reqwest::Client. HTTP/2 frame reassembly under heavy multiplexing on a single connection corrupts large response bodies on page 2+. Also adds diagnostic logging (body length + first 200 chars) on parse failures to aid future debugging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 3 end-to-end integration tests that verify bootstrapped HTTP data is queryable via DrasiLib's Cypher engine: - test_bootstrapped_nodes_queryable_via_cypher: simple JSON endpoint → 3 User nodes → MATCH query - test_paginated_bootstrap_queryable_via_cypher: offset/limit pagination across 3 pages → 15 Product nodes - test_bootstrapped_relationships_queryable_via_cypher: multi-endpoint (nodes + relations) → graph traversal query Each test spins up a real axum HTTP server, creates an HttpBootstrapProvider, wires it through DrasiLib with a Cypher query, and asserts on get_query_results(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review feedback addressed in commit 88a1530All issues from this review have been addressed: Already fixed in prior commits:
Fixed in this commit:
DevSkim false positives:The localhost references (integration_test.rs:48, 53, 370 and e2e_query_test.rs:116, 121) are in test code that binds to |
- Remove unused imports: Context (pagination.rs), anyhow! (content_parser.rs), State/SocketAddr/BootstrapResult (integration_test.rs) - Remove unused Cargo.toml deps: uuid, jsonpath-rust, reqwest stream feature - Fix unused variable bindings: mut rx → rx, properties → .. - Eliminate per-label String allocation in should_include_element by comparing Arc<str> directly against &str without allocating Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add with_bootstrap_for_source() to DrasiLibBuilder so that programmatic users can register bootstrap provider metadata (kind + properties) that gets persisted in the component graph and appears in configuration snapshots. This enables config persistence for the HTTP bootstrapper when used through the builder API (not just the descriptor/server path). Includes two E2E tests: - snapshot_captures_bootstrap_via_builder_api: verifies metadata registration and snapshot retrieval - snapshot_bootstrap_builder_api_json_roundtrip: verifies lossless JSON serialization/deserialization of bootstrap config Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the HTTP source's convention of using kebab-case for serde discriminator tags (e.g., 'offset-limit', 'link-header', 'oauth2-client-credentials', 'api-key', 'next-url') instead of snake_case. Single-word variants (bearer, basic, cursor, header, query) are unaffected. Updated: DTOs, internal config, tests, and README. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
✅ Correctness Review
The HTTP bootstrap plugin is well-structured: double-checked locking for OAuth2 token caching is correct, the cycle-detection guard (seen_requests) prevents infinite pagination loops, navigate_segment handles negative array indices with a proper bounds check, and the Paginator state-machine pattern is sound. Error propagation follows project conventions throughout.
Three findings, none of them blockers:
-
🟡 Exponential backoff not capped (
provider.rsline 274-275):1u64 << (attempt - 1)grows without bound. Atmax_retries = 31the last delay is ≈12 days; at ≥65 thechecked_shloverflows tou64::MAX, clamps tou32::MAX, and produces a ≈136-year sleep.max_retrieshas no validated upper bound in config. Inline comment with suggested fix posted. -
🔵 Link header comma-split (
pagination.rsline 441):split(',')on the rawLinkheader value can split a URL that legally contains an unencoded comma, silently missingrel="next". Inline comment with a bracket-aware alternative posted. -
🔵
context_to_jsonsilently swallows errors (template_engine.rsline 170):unwrap_or(JsonValue::Null)hides a serialization failure; should beexpect(...)or propagate the error. Inline comment posted.
Generated by pr-correctness-reviewer for issue #424 · ● 1.2M
There was a problem hiding this comment.
🔒 Security Review
This PR adds a new HTTP bootstrap provider with solid fundamentals (credentials via env vars, cycle detection, page limits), but the two redirect-following pagination modes introduce SSRF vulnerabilities that should be addressed before merge.
The next-url and link-header paginators follow URLs sourced from attacker-influenced data (HTTP response body and response headers, respectively) without restricting them to the original endpoint's hostname. In a cloud-deployed Drasi instance, a compromised or malicious API endpoint could redirect the bootstrapper to IMDS (169.254.169.254), internal Kubernetes service IPs, or any other HTTP-reachable service, with the responses ingested as graph data. Both are filed as inline comments above.
A secondary issue is that the full OAuth2 token error response body is logged, which can leak auth-server internals to log aggregation systems.
Summary of findings (all filed inline):
- 🔴
pagination.rs:308— SSRF via attacker-controlled next-URL in response body - 🔴
pagination.rs:280— SSRF via attacker-controlled URL inLinkresponse header - 🟡
auth.rs:152— Full OAuth2 error response body propagated to logs - 🔵
config.rs:234— URL scheme not validated (defense-in-depth)
Generated by pr-security-reviewer for issue #424 · ● 1.2M
There was a problem hiding this comment.
🏗️ Design Review
The new components/bootstrappers/http crate follows the established bootstrapper plugin pattern (separate crate, BootstrapProvider trait, export_plugin! entry point) and is structurally consistent with drasi-bootstrap-mssql. The module decomposition into config, auth, pagination, template_engine, response, and descriptor is clean and appropriately separates concerns. The dual config/DTO model (internal config.rs types vs. descriptor.rs DTO types with ConfigValue<T>) is consistent with existing plugin patterns.
Four design issues are flagged via inline comments:
provider.rsline 48 🟡 —resolved_authsparallel-indexed toendpointswithout type-system enforcement; bundle them into aResolvedEndpointstruct.provider.rsline 93 🟡 —TemplateEngine(Handlebars) reconstructed on every endpoint fetch; move it to the provider constructor.pagination.rsline 32 🔵 —Paginatortrait adds heap allocation and indirection with no real extensibility since all construction goes through an exhaustive-match factory.lib/src/builder.rsline 133 🟡 —with_bootstrap_for_sourceregisters metadata independently from the actual provider instance, making it easy to silently mismatch; couple them at the same call site.
Generated by pr-design-reviewer for issue #424 · ● 652.4K
There was a problem hiding this comment.
🧪 Testing Review
Overall coverage is solid: 40 unit tests span config deserialization, validation, content parsing, pagination state machines, and template type preservation; 20 integration tests exercise real HTTP servers via axum covering all pagination strategies, all auth methods, XML/YAML, descriptor/DTO pipeline, multi-endpoint, relations, and retry recovery. The e2e query tests tie the full stack together. Three issues are worth addressing before merge:
- The
collect_eventstest helper usestry_recv()which is non-blocking and will silently miss events if they haven't been scheduled into the channel yet, making several tests potentially flaky (see inline comment at line 71 ofintegration_test.rs). - The retry test only covers the happy path (failure → success); the path where all retries are exhausted is untested (see inline comment at line 995 of
integration_test.rs). - The
OAuth2TokenProvider::get_token()caching and expiry logic has no unit tests inauth.rs; the integration test only exercises a single successful fetch (see inline comment at line 94 ofauth.rs).
Generated by pr-testing-reviewer for issue #424 · ● 1.4M
There was a problem hiding this comment.
📝 Documentation Review
The README is thorough and well-structured: configuration reference tables are complete, all five pagination strategies and four auth methods have concise explanations with JSON examples, and the real-world API walkthroughs add good practical value. Rust doc comments on public structs, enums, and methods across config.rs, auth.rs, provider.rs, and builder.rs are complete and accurate. Two minor issues found — one factual inaccuracy in the config table and one example snippet inconsistency.
Generated by pr-docs-reviewer for issue #424 · ● 1.1M
There was a problem hiding this comment.
🔍 Prior Art Review
The PR makes good use of existing crates throughout: reqwest for HTTP, handlebars for templating, serde_json/serde_yaml/quick-xml for content parsing, and a sensible hand-rolled OAuth2 client-credentials flow (the oauth2 crate would be disproportionately complex here). The only place where a well-maintained crate could replace custom code is the JSONPath navigation helper — see the inline comment on pagination.rs:351.
Generated by pr-prior-art-reviewer for issue #424 · ● 487.1K
…w findings Security (Critical): - Add SSRF protection: pagination-followed URLs must share the same host as the original endpoint URL. Rejects cross-host redirects in both link-header and next-url pagination strategies. - Add URL scheme validation in config (must be http:// or https://). - Truncate OAuth2 error response bodies to 256 chars in logs. Correctness: - Cap exponential backoff at 60 seconds (MAX_RETRY_DELAY) to prevent unbounded sleep times. - Fix Link header parsing to split on commas outside angle brackets, preventing false splits on URLs containing commas. - Replace unwrap_or(Null) with expect() in context_to_json to fail fast on serialization issues rather than silently dropping data. Design: - Bundle resolved_auths into ResolvedEndpoint struct to eliminate fragile parallel-array indexing. - Move TemplateEngine construction to provider constructor (avoid repeated Handlebars initialization per endpoint per call). Testing: - Fix collect_events() to use recv().await (blocking) instead of racy try_recv() that could miss events. - Add test_retries_exhausted_returns_error integration test. - Add 3 OAuth2 unit tests: token caching, expiry refresh, error truncation. - Add 3 SSRF unit tests: rejects different host, allows same host, rejects non-HTTP scheme. Documentation: - Fix README backoff description (exponential, capped at 60s). - Remove password_env: null from example (omit optional fields). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addressing Review FindingsAll findings from the drasi-reviewer bot have been addressed in commit 🔴 Critical — SSRF (Threads 8, 9)Fixed. Added 🟡 Should-Fix
🔵 Nits
Thread 1–2 (copilot-pull-request-reviewer): Unused imports/varsThese were from an older revision and no longer apply to the current code. Thread 3 (copilot-pull-request-reviewer): JSON objects stringifiedThis was fixed in an earlier commit — Test results: 46 unit tests + 21 integration tests = 67 total, all passing. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds a new HTTP bootstrap provider plugin that fetches initial state from REST APIs to populate graph queries.
Features
ToSchemafor OpenAPI,deny_unknown_fields, andDtoMapperintegrationTesting
Files
components/bootstrappers/http/— full plugin crateCargo.toml— workspace member added