Skip to content

Add HTTP bootstrapper plugin#424

Open
danielgerlag wants to merge 15 commits intodrasi-project:mainfrom
danielgerlag:http-boot
Open

Add HTTP bootstrapper plugin#424
danielgerlag wants to merge 15 commits intodrasi-project:mainfrom
danielgerlag:http-boot

Conversation

@danielgerlag
Copy link
Copy Markdown
Contributor

Summary

Adds a new HTTP bootstrap provider plugin that fetches initial state from REST APIs to populate graph queries.

Features

  • Multi-endpoint support — configure multiple API endpoints per bootstrapper
  • 5 pagination strategies — offset/limit, page number, cursor, Link header (RFC 5988), next-URL-in-body
  • 4 authentication methods — Bearer token, API key (header/query), HTTP Basic, OAuth2 Client Credentials with token caching
  • Multi-format parsing — JSON, YAML, and XML response bodies (auto-detected from Content-Type or configurable override)
  • Handlebars templates — flexible element mapping with type-preserving property resolution
  • Retry with exponential backoff — configurable retries with overflow-safe delay calculation
  • Pagination safety — MAX_PAGES (10,000) limit and cycle detection to prevent infinite loops
  • Full DTO schema — 11 typed DTOs with ToSchema for OpenAPI, deny_unknown_fields, and DtoMapper integration
  • Config validation — validates non-empty URLs, page sizes, required fields, and relation template constraints

Testing

  • 40 unit tests — config deserialization, validation, content parsing, pagination strategies, template type preservation, path extraction
  • 20 integration tests — real HTTP servers via axum covering all pagination types, all auth methods, XML/YAML parsing, descriptor/DTO pipeline, unknown field rejection, validation error propagation, type preservation, label filtering, multi-endpoint, relations, and retry behavior

Files

  • components/bootstrappers/http/ — full plugin crate
  • Cargo.toml — workspace member added

@danielgerlag danielgerlag requested a review from a team as a code owner May 1, 2026 01:12
Comment thread components/bootstrappers/http/tests/integration_test.rs Fixed
Comment thread components/bootstrappers/http/tests/integration_test.rs Dismissed
Comment thread components/bootstrappers/http/tests/integration_test.rs Fixed
Comment thread components/bootstrappers/http/tests/integration_test.rs Dismissed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new drasi-bootstrap-http bootstrapper plugin crate that can fetch initial graph state from HTTP REST APIs and emit bootstrap events, including support for pagination, authentication, and response-to-element mapping.

Changes:

  • Introduces a new HTTP bootstrapper provider with multi-endpoint fetching, auth strategies, pagination, and template-based mapping.
  • Adds DTO + OpenAPI schema/descriptor integration for plugin config deserialization and validation.
  • Adds extensive integration tests using real axum HTTP servers across pagination/auth/content-type scenarios.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
components/bootstrappers/http/tests/integration_test.rs End-to-end tests for pagination/auth/parsing/mapping and descriptor pipeline
components/bootstrappers/http/src/template_engine.rs Handlebars-based template rendering with “simple path” type preservation
components/bootstrappers/http/src/response.rs Response item extraction + mapping items to Drasi Elements
components/bootstrappers/http/src/provider.rs Main bootstrap provider: request loop, pagination orchestration, retry, label filtering
components/bootstrappers/http/src/pagination.rs Pagination strategy implementations + lightweight JSONPath-like navigation
components/bootstrappers/http/src/lib.rs Plugin crate module exports + dynamic plugin entrypoint
components/bootstrappers/http/src/descriptor.rs DTO types, schema export, DTO→config mapping, provider construction
components/bootstrappers/http/src/content_parser.rs Content-Type detection and JSON/YAML/XML body parsing
components/bootstrappers/http/src/config.rs Core config types + validation logic
components/bootstrappers/http/src/auth.rs Auth resolution (env-driven) + OAuth2 client-credentials token caching
components/bootstrappers/http/README.md User-facing configuration and examples
components/bootstrappers/http/Cargo.toml New crate manifest and dependencies
Cargo.toml Adds the new crate to the workspace members

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread components/bootstrappers/http/tests/integration_test.rs Outdated
Comment thread components/bootstrappers/http/tests/integration_test.rs Outdated
Comment thread components/bootstrappers/http/src/pagination.rs Outdated
Comment thread components/bootstrappers/http/tests/integration_test.rs Outdated
Comment thread components/bootstrappers/http/tests/integration_test.rs Outdated
Comment thread components/bootstrappers/http/src/provider.rs Outdated
Comment thread components/bootstrappers/http/src/provider.rs
Comment thread components/bootstrappers/http/tests/integration_test.rs
Comment thread components/bootstrappers/http/tests/integration_test.rs Outdated
Comment thread components/bootstrappers/http/tests/integration_test.rs Outdated
danielgerlag and others added 3 commits May 1, 2026 15:34
…rsion, error truncation

- Cycle detection uses HashSet to catch multi-hop cycles (A→B→A), not just
  immediate repeats
- Retry backoff is now exponential (base * 2^(attempt-1)) matching docs
- JSON objects convert recursively to ElementValue::Object instead of
  being stringified
- Error response bodies truncated to 256 chars to prevent PII leakage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ency

Force HTTP/1.1 to prevent body corruption when multiple bootstrap calls
share the same reqwest::Client. HTTP/2 frame reassembly under heavy
multiplexing on a single connection corrupts large response bodies on
page 2+.

Also adds diagnostic logging (body length + first 200 chars) on parse
failures to aid future debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 3 end-to-end integration tests that verify bootstrapped HTTP data
is queryable via DrasiLib's Cypher engine:

- test_bootstrapped_nodes_queryable_via_cypher: simple JSON endpoint → 3 User nodes → MATCH query
- test_paginated_bootstrap_queryable_via_cypher: offset/limit pagination across 3 pages → 15 Product nodes
- test_bootstrapped_relationships_queryable_via_cypher: multi-endpoint (nodes + relations) → graph traversal query

Each test spins up a real axum HTTP server, creates an HttpBootstrapProvider,
wires it through DrasiLib with a Cypher query, and asserts on get_query_results().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread components/bootstrappers/http/tests/e2e_query_test.rs Dismissed
Comment thread components/bootstrappers/http/tests/e2e_query_test.rs Dismissed
@danielgerlag
Copy link
Copy Markdown
Contributor Author

Review feedback addressed in commit 88a1530

All issues from this review have been addressed:

Already fixed in prior commits:

  • Exponential backoff (provider.rs:277) — Fixed in earlier commit with base * 2^(attempt-1) using checked_shl
  • Object conversion (response.rs:197) — Fixed in earlier commit with recursive ElementValue::Object instead of JSON stringify
  • Cycle detection (outdated) — Fixed with HashSet<(url, params)> tracking all seen states
  • Error body truncation (outdated) — Fixed with 256 char limit
  • Unused rx variables (lines 642, 714, 796, 894, 1066, 1288) — Already fixed in prior refactor

Fixed in this commit:

  • Unused Cargo.toml deps — Removed uuid, jsonpath-rust, and reqwest stream feature
  • Unused variable bindingsmut rxrx, properties.. in match patterns
  • Label filter allocation (provider.rs:384) — Eliminated per-label String allocation by comparing Arc<str> directly against &str via as_str() == &**l

DevSkim false positives:

The localhost references (integration_test.rs:48, 53, 370 and e2e_query_test.rs:116, 121) are in test code that binds to 127.0.0.1:0 for ephemeral test servers. These already have // DevSkim: ignore DS137138 suppression comments where needed. The remaining unsuppressed ones are in the axum TcpListener::bind call which is test-only infrastructure.

- Remove unused imports: Context (pagination.rs), anyhow! (content_parser.rs),
  State/SocketAddr/BootstrapResult (integration_test.rs)
- Remove unused Cargo.toml deps: uuid, jsonpath-rust, reqwest stream feature
- Fix unused variable bindings: mut rx → rx, properties → ..
- Eliminate per-label String allocation in should_include_element by
  comparing Arc<str> directly against &str without allocating

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread components/bootstrappers/http/tests/integration_test.rs Dismissed
@agentofreality agentofreality marked this pull request as draft May 6, 2026 17:35
danielgerlag and others added 4 commits May 6, 2026 13:35
Add with_bootstrap_for_source() to DrasiLibBuilder so that programmatic
users can register bootstrap provider metadata (kind + properties) that
gets persisted in the component graph and appears in configuration
snapshots.

This enables config persistence for the HTTP bootstrapper when used
through the builder API (not just the descriptor/server path).

Includes two E2E tests:
- snapshot_captures_bootstrap_via_builder_api: verifies metadata
  registration and snapshot retrieval
- snapshot_bootstrap_builder_api_json_roundtrip: verifies lossless
  JSON serialization/deserialization of bootstrap config

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the HTTP source's convention of using kebab-case for serde
discriminator tags (e.g., 'offset-limit', 'link-header',
'oauth2-client-credentials', 'api-key', 'next-url') instead of
snake_case. Single-word variants (bearer, basic, cursor, header,
query) are unaffected.

Updated: DTOs, internal config, tests, and README.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@drasi-reviewer drasi-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Correctness Review

The HTTP bootstrap plugin is well-structured: double-checked locking for OAuth2 token caching is correct, the cycle-detection guard (seen_requests) prevents infinite pagination loops, navigate_segment handles negative array indices with a proper bounds check, and the Paginator state-machine pattern is sound. Error propagation follows project conventions throughout.

Three findings, none of them blockers:

  • 🟡 Exponential backoff not capped (provider.rs line 274-275): 1u64 << (attempt - 1) grows without bound. At max_retries = 31 the last delay is ≈12 days; at ≥65 the checked_shl overflows to u64::MAX, clamps to u32::MAX, and produces a ≈136-year sleep. max_retries has no validated upper bound in config. Inline comment with suggested fix posted.

  • 🔵 Link header comma-split (pagination.rs line 441): split(',') on the raw Link header value can split a URL that legally contains an unencoded comma, silently missing rel="next". Inline comment with a bracket-aware alternative posted.

  • 🔵 context_to_json silently swallows errors (template_engine.rs line 170): unwrap_or(JsonValue::Null) hides a serialization failure; should be expect(...) or propagate the error. Inline comment posted.

Generated by pr-correctness-reviewer for issue #424 · ● 1.2M

Comment thread components/bootstrappers/http/src/provider.rs Outdated
Comment thread components/bootstrappers/http/src/pagination.rs Outdated
Comment thread components/bootstrappers/http/src/template_engine.rs Outdated
Copy link
Copy Markdown

@drasi-reviewer drasi-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security Review

This PR adds a new HTTP bootstrap provider with solid fundamentals (credentials via env vars, cycle detection, page limits), but the two redirect-following pagination modes introduce SSRF vulnerabilities that should be addressed before merge.

The next-url and link-header paginators follow URLs sourced from attacker-influenced data (HTTP response body and response headers, respectively) without restricting them to the original endpoint's hostname. In a cloud-deployed Drasi instance, a compromised or malicious API endpoint could redirect the bootstrapper to IMDS (169.254.169.254), internal Kubernetes service IPs, or any other HTTP-reachable service, with the responses ingested as graph data. Both are filed as inline comments above.

A secondary issue is that the full OAuth2 token error response body is logged, which can leak auth-server internals to log aggregation systems.

Summary of findings (all filed inline):

  • 🔴 pagination.rs:308 — SSRF via attacker-controlled next-URL in response body
  • 🔴 pagination.rs:280 — SSRF via attacker-controlled URL in Link response header
  • 🟡 auth.rs:152 — Full OAuth2 error response body propagated to logs
  • 🔵 config.rs:234 — URL scheme not validated (defense-in-depth)

Generated by pr-security-reviewer for issue #424 · ● 1.2M

Comment thread components/bootstrappers/http/src/pagination.rs
Comment thread components/bootstrappers/http/src/pagination.rs Outdated
Comment thread components/bootstrappers/http/src/auth.rs
Comment thread components/bootstrappers/http/src/config.rs
Copy link
Copy Markdown

@drasi-reviewer drasi-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏗️ Design Review

The new components/bootstrappers/http crate follows the established bootstrapper plugin pattern (separate crate, BootstrapProvider trait, export_plugin! entry point) and is structurally consistent with drasi-bootstrap-mssql. The module decomposition into config, auth, pagination, template_engine, response, and descriptor is clean and appropriately separates concerns. The dual config/DTO model (internal config.rs types vs. descriptor.rs DTO types with ConfigValue<T>) is consistent with existing plugin patterns.

Four design issues are flagged via inline comments:

  • provider.rs line 48 🟡 — resolved_auths parallel-indexed to endpoints without type-system enforcement; bundle them into a ResolvedEndpoint struct.
  • provider.rs line 93 🟡 — TemplateEngine (Handlebars) reconstructed on every endpoint fetch; move it to the provider constructor.
  • pagination.rs line 32 🔵 — Paginator trait adds heap allocation and indirection with no real extensibility since all construction goes through an exhaustive-match factory.
  • lib/src/builder.rs line 133 🟡 — with_bootstrap_for_source registers metadata independently from the actual provider instance, making it easy to silently mismatch; couple them at the same call site.

Generated by pr-design-reviewer for issue #424 · ● 652.4K

Comment thread components/bootstrappers/http/src/provider.rs Outdated
Comment thread components/bootstrappers/http/src/pagination.rs
Comment thread lib/src/builder.rs
Comment thread components/bootstrappers/http/src/provider.rs Outdated
Copy link
Copy Markdown

@drasi-reviewer drasi-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧪 Testing Review

Overall coverage is solid: 40 unit tests span config deserialization, validation, content parsing, pagination state machines, and template type preservation; 20 integration tests exercise real HTTP servers via axum covering all pagination strategies, all auth methods, XML/YAML, descriptor/DTO pipeline, multi-endpoint, relations, and retry recovery. The e2e query tests tie the full stack together. Three issues are worth addressing before merge:

  • The collect_events test helper uses try_recv() which is non-blocking and will silently miss events if they haven't been scheduled into the channel yet, making several tests potentially flaky (see inline comment at line 71 of integration_test.rs).
  • The retry test only covers the happy path (failure → success); the path where all retries are exhausted is untested (see inline comment at line 995 of integration_test.rs).
  • The OAuth2TokenProvider::get_token() caching and expiry logic has no unit tests in auth.rs; the integration test only exercises a single successful fetch (see inline comment at line 94 of auth.rs).

Generated by pr-testing-reviewer for issue #424 · ● 1.4M

Comment thread components/bootstrappers/http/tests/integration_test.rs Outdated
Comment thread components/bootstrappers/http/tests/integration_test.rs
Comment thread components/bootstrappers/http/src/auth.rs
Copy link
Copy Markdown

@drasi-reviewer drasi-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Documentation Review

The README is thorough and well-structured: configuration reference tables are complete, all five pagination strategies and four auth methods have concise explanations with JSON examples, and the real-world API walkthroughs add good practical value. Rust doc comments on public structs, enums, and methods across config.rs, auth.rs, provider.rs, and builder.rs are complete and accurate. Two minor issues found — one factual inaccuracy in the config table and one example snippet inconsistency.

Generated by pr-docs-reviewer for issue #424 · ● 1.1M

Comment thread components/bootstrappers/http/README.md Outdated
Comment thread components/bootstrappers/http/README.md Outdated
Copy link
Copy Markdown

@drasi-reviewer drasi-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Prior Art Review

The PR makes good use of existing crates throughout: reqwest for HTTP, handlebars for templating, serde_json/serde_yaml/quick-xml for content parsing, and a sensible hand-rolled OAuth2 client-credentials flow (the oauth2 crate would be disproportionately complex here). The only place where a well-maintained crate could replace custom code is the JSONPath navigation helper — see the inline comment on pagination.rs:351.

Generated by pr-prior-art-reviewer for issue #424 · ● 487.1K

Comment thread components/bootstrappers/http/src/pagination.rs
…w findings

Security (Critical):
- Add SSRF protection: pagination-followed URLs must share the same
  host as the original endpoint URL. Rejects cross-host redirects
  in both link-header and next-url pagination strategies.
- Add URL scheme validation in config (must be http:// or https://).
- Truncate OAuth2 error response bodies to 256 chars in logs.

Correctness:
- Cap exponential backoff at 60 seconds (MAX_RETRY_DELAY) to prevent
  unbounded sleep times.
- Fix Link header parsing to split on commas outside angle brackets,
  preventing false splits on URLs containing commas.
- Replace unwrap_or(Null) with expect() in context_to_json to fail
  fast on serialization issues rather than silently dropping data.

Design:
- Bundle resolved_auths into ResolvedEndpoint struct to eliminate
  fragile parallel-array indexing.
- Move TemplateEngine construction to provider constructor (avoid
  repeated Handlebars initialization per endpoint per call).

Testing:
- Fix collect_events() to use recv().await (blocking) instead of
  racy try_recv() that could miss events.
- Add test_retries_exhausted_returns_error integration test.
- Add 3 OAuth2 unit tests: token caching, expiry refresh, error
  truncation.
- Add 3 SSRF unit tests: rejects different host, allows same host,
  rejects non-HTTP scheme.

Documentation:
- Fix README backoff description (exponential, capped at 60s).
- Remove password_env: null from example (omit optional fields).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielgerlag
Copy link
Copy Markdown
Contributor Author

Addressing Review Findings

All findings from the drasi-reviewer bot have been addressed in commit 10c4b41. Here is how each was resolved:

🔴 Critical — SSRF (Threads 8, 9)

Fixed. Added validate_pagination_url() that enforces same-host policy for pagination-followed URLs. Both link-header and next-url paginators now reject URLs whose host differs from the original endpoint. Added 3 unit tests covering: different-host rejection, same-host acceptance, and non-HTTP scheme rejection.

🟡 Should-Fix

# Finding Resolution
5 Exponential backoff unbounded Fixed. Added MAX_RETRY_DELAY = 60s cap via .min(MAX_RETRY_DELAY)
10 OAuth2 error body in logs Fixed. Truncated to 256 chars with "... (truncated)" suffix
12 Parallel array coupling Fixed. Introduced ResolvedEndpoint struct bundling config + auth
14 Builder metadata disconnect Acknowledged. This is inherent to the external-metadata pattern (no FFI changes). The builder docs make the coupling requirement explicit.
15 TemplateEngine per-call Fixed. Moved to provider constructor, stored as field
16 try_recv() racy Fixed. Changed collect_events() to use rx.recv().await (blocks until sender drops)
17 Exhausted retries untested Fixed. Added test_retries_exhausted_returns_error integration test
18 OAuth2 caching untested Fixed. Added 3 unit tests: test_oauth2_token_caching, test_oauth2_token_refresh_on_expiry, test_oauth2_error_is_truncated
19 README backoff description Fixed. Updated to "grows exponentially (delay × 2^(attempt−1)), capped at 60 seconds"
20 password_env: null in example Fixed. Omitted the optional field entirely

🔵 Nits

# Finding Resolution
6 Link header comma-split Fixed. Replaced split(",") with bracket-aware split_link_header() that only splits outside <>
7 context_to_json swallows errors Fixed. Changed to expect() with descriptive message
11 No URL scheme validation Fixed. Added check in EndpointConfig::validate() requiring http:// or https:// prefix
13 Paginator trait adds indirection Won't fix. The trait keeps each pagination strategy's state machine cleanly encapsulated; the heap allocation is negligible for a bootstrap-time operation.
21 Use serde_json_path crate Won't fix for now. Our subset (field access, array index, negative index) is 40 lines and avoids a new dependency. If we need wildcard/filter expressions later, we'll adopt the crate.

Thread 1–2 (copilot-pull-request-reviewer): Unused imports/vars

These were from an older revision and no longer apply to the current code.

Thread 3 (copilot-pull-request-reviewer): JSON objects stringified

This was fixed in an earlier commit — json_value_to_element_value now recursively converts objects to ElementValue::Object(ElementPropertyMap).


Test results: 46 unit tests + 21 integration tests = 67 total, all passing.

Comment thread components/bootstrappers/http/src/auth.rs
Comment thread components/bootstrappers/http/src/auth.rs
Comment thread components/bootstrappers/http/src/auth.rs
Comment thread components/bootstrappers/http/src/auth.rs
Comment thread components/bootstrappers/http/src/auth.rs Dismissed
Comment thread components/bootstrappers/http/src/auth.rs
danielgerlag and others added 2 commits May 7, 2026 15:46
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielgerlag danielgerlag marked this pull request as ready for review May 7, 2026 15:55
@github-actions github-actions Bot added the need-2nd-review Has one approval, needs a second review label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

need-2nd-review Has one approval, needs a second review review:all

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants