-
Notifications
You must be signed in to change notification settings - Fork 8
JS Asset Auditor engineering spec #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
89fab0b
d8a0d84
1370d0b
84f9182
52b959d
91f800d
4bc520a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,216 @@ | ||
| # JS Asset Auditor — Engineering Spec | ||
|
|
||
| **Date:** 2026-04-01 | ||
| **Status:** Approved for engineering breakdown | ||
| **Related:** [JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md) | ||
|
|
||
| --- | ||
|
|
||
| ## Context | ||
|
|
||
| The JS Asset Proxy requires a `js-assets.toml` file declaring which third-party JS assets to proxy. Without tooling, populating this file requires manually inspecting network requests in browser DevTools, extracting URLs, generating opaque slugs, and writing TOML — a tedious error-prone process that is a barrier to publisher onboarding. | ||
|
|
||
| The Auditor eliminates this friction. It sweeps a publisher's page using the Chrome DevTools MCP, detects third-party JS assets, auto-generates `js-assets.toml` entries, and auto-detects `inject_in_head` from the page DOM. The operator's only remaining decision is reviewing the output before committing. | ||
|
|
||
| It also runs as a monitoring tool — `--diff` mode compares a new sweep against the existing config and surfaces new or removed assets, giving publishers ongoing visibility into their third-party JS footprint. | ||
|
|
||
| **Implementation:** Pure Claude Code skill — no Rust, no compiled code, no additional dependencies. Uses the Chrome DevTools MCP already configured in `.claude/settings.json`. | ||
|
|
||
| --- | ||
|
|
||
| ## Command Interface | ||
|
|
||
| ```bash | ||
| /audit-js-assets https://www.publisher.com # init — generate js-assets.toml | ||
| /audit-js-assets https://www.publisher.com --diff # diff — compare against existing file | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Sweep Protocol | ||
|
|
||
| 1. Read `trusted-server.toml` → extract `publisher.domain` (defines first-party boundary) | ||
| 2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page` | ||
| 3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`) | ||
| 4. In parallel: | ||
| - `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain` | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| - `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| 5. Apply heuristic filter (see below) | ||
| 6. For each surviving asset, generate a `[[js_assets]]` entry (see below) | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| 7. Write output (init or diff mode) | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| 8. Print terminal summary | ||
| 9. Close page via `mcp__chrome-devtools__close_page` | ||
|
|
||
| --- | ||
|
|
||
| ## Heuristic Filter | ||
|
|
||
| The following origin categories are excluded silently. The terminal summary reports what was filtered and why so operators can manually add entries if needed. | ||
|
|
||
| | Category | Excluded origins | | ||
| |---|---| | ||
| | Framework CDNs | `cdnjs.cloudflare.com`, `ajax.googleapis.com`, `cdn.jsdelivr.net`, `unpkg.com` | | ||
| | Error tracking | `sentry.io`, `bugsnag.com`, `rollbar.com` | | ||
| | Font services | `fonts.googleapis.com`, `fonts.gstatic.com` | | ||
| | Social embeds | `platform.twitter.com`, `connect.facebook.net` | | ||
|
|
||
| **`googletagmanager.com` is not filtered** — GTM is ad tech and should be proxied. | ||
|
|
||
|
ChristianPavilonis marked this conversation as resolved.
|
||
| Everything else surfaces for operator review. | ||
|
|
||
| --- | ||
|
|
||
| ## Asset Entry Generation | ||
|
|
||
| | Field | Derivation | | ||
| |---|---| | ||
| | `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below | | ||
| | `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected | | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| | `origin_url` | Full captured URL, with wildcard substitution applied if versioned | | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| | `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) | | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| | `inject_in_head` | `true` if URL appeared in head script list from DOM evaluation, else `false` | | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### Slug algorithm | ||
|
|
||
| ``` | ||
| publisher_prefix = first_8_chars(base62(sha256(publisher.domain + origin_url))) | ||
| asset_stem = filename_without_extension(origin_url) | ||
| slug = "{publisher_prefix}:{asset_stem}" | ||
| ``` | ||
|
|
||
| **Rationale:** Fully opaque and hash-derived — no human naming required, no ambiguity for cryptic vendor filenames. The KV metadata (`origin_url`, `content_type`, `asset_slug`) serves as the lookup table. Operators can query `js-asset:{slug}` in the KV store to retrieve full provenance. The terminal summary also prints slug → origin_url at generation time. | ||
|
|
||
| **Important:** This algorithm must produce identical output to the Proxy's KV key derivation. Engineering should implement this as a shared utility (e.g., a small JS/TS helper in the skill, or a standalone `scripts/` utility) rather than duplicating the logic. | ||
|
|
||
| ### Wildcard detection | ||
|
|
||
| Path segments matching either pattern are replaced with `*`: | ||
| - Semver: `\d+\.\d+[\.\d-]*` (e.g., `1.19.8-hcskhn`) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❓ question — Slug algorithm has two ambiguities 1. Concatenation separator: 2. base62 character set: base62 is not standardized — different implementations use |
||
| - Hash-like: `[a-f0-9]{6,}` or `[A-Za-z0-9]{8,}` between path separators | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
|
|
||
| The original URL is preserved as a comment above the generated entry so operators can verify the wildcard substitution is correct. | ||
|
|
||
| --- | ||
|
|
||
| ## Init Mode Output | ||
|
|
||
| ### `js-assets.toml` (written to repo root) | ||
|
|
||
| ```toml | ||
| # Generated by /audit-js-assets on 2026-04-01 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤔 thinking — Hash-like regex The hex pattern Consider requiring mixed character classes (must contain both letters and digits), a higher minimum (12+), or excluding common dictionary words. |
||
| # Publisher: publisher.com | ||
| # Source URL: https://www.publisher.com | ||
|
|
||
| [[js_assets]] | ||
| # https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js | ||
| slug = "aB3kR7mN:prebid-load" | ||
| path = "/sdk/aB3kR7mN.js" | ||
| origin_url = "https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js" | ||
| inject_in_head = true | ||
|
|
||
| [[js_assets]] | ||
| # https://raven-static.vendor.io/prod/1.19.8-hcskhn/raven.js (wildcard detected) | ||
| slug = "xQ9pL2wY:raven" | ||
| path = "/raven-static/*" | ||
| origin_url = "https://raven-static.vendor.io/prod/*/raven.js" | ||
| inject_in_head = false | ||
| ``` | ||
|
|
||
| ### Terminal summary | ||
|
|
||
| ``` | ||
| JS Asset Audit — publisher.com | ||
| ──────────────────────────────── | ||
| Detected: 8 third-party JS requests | ||
| Filtered: 3 (cdnjs.cloudflare.com ×2, sentry.io ×1) | ||
| Surfaced: 5 assets → js-assets.toml | ||
|
|
||
| aB3kR7mN inject_in_head=true web.prebidwrapper.com/.../prebid-load.js | ||
| xQ9pL2wY inject_in_head=false raven-static.vendor.io/prod/*/raven.js [wildcard] | ||
| zM4nK8vP inject_in_head=true googletagmanager.com/gtm.js | ||
| ... | ||
|
|
||
| Review inject_in_head values and commit js-assets.toml when ready. | ||
| Diff mode: /audit-js-assets <url> --diff | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
|
ChristianPavilonis marked this conversation as resolved.
|
||
| ## Diff Mode Output | ||
|
|
||
| Compares sweep results against the existing `js-assets.toml`. | ||
|
|
||
| | Condition | Behavior | | ||
| |---|---| | ||
| | Asset in sweep, not in file | **New** — appended to `js-assets.toml` as a commented-out block | | ||
| | Asset in file, not in sweep | **Missing** — flagged in terminal summary with `⚠`. Never auto-removed. | | ||
| | Asset in both | **Confirmed** — listed as present | | ||
|
|
||
| New entries are appended as TOML comments so the file stays valid and nothing is activated without the operator explicitly uncommenting. | ||
|
|
||
| ### `js-assets.toml` (new entry appended as comment) | ||
|
|
||
| ```toml | ||
| # --- NEW (detected by /audit-js-assets --diff on 2026-04-01, uncomment to activate) --- | ||
| # [[js_assets]] | ||
| # # https://googletagmanager.com/gtm.js | ||
| # slug = "zM4nK8vP:gtm" | ||
| # path = "/sdk/zM4nK8vP.js" | ||
| # origin_url = "https://googletagmanager.com/gtm.js" | ||
| # inject_in_head = true | ||
| ``` | ||
|
|
||
| ### Terminal summary (diff mode) | ||
|
|
||
| ``` | ||
| JS Asset Audit (diff) — publisher.com | ||
| ──────────────────────────────── | ||
| Confirmed: 4 assets still present on page | ||
| New: 1 asset detected (appended as comment to js-assets.toml) | ||
| Missing: 1 asset no longer seen on page ⚠ | ||
|
|
||
| NEW zM4nK8vP googletagmanager.com/gtm.js → review in js-assets.toml | ||
| MISSING xQ9pL2wY raven-static.vendor.io/... → may have been removed or renamed | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Implementation | ||
|
|
||
| The Auditor is a Claude Code skill file. No compiled code. | ||
|
|
||
| **Skill location:** `.claude/skills/audit-js-assets.md` | ||
|
|
||
| **MCP tools used:** | ||
| - `mcp__chrome-devtools__new_page` — open browser tab | ||
| - `mcp__chrome-devtools__navigate_page` — load publisher URL | ||
|
ChristianPavilonis marked this conversation as resolved.
Outdated
|
||
| - `mcp__chrome-devtools__wait_for` — settle after page load | ||
| - `mcp__chrome-devtools__list_network_requests` — capture JS requests | ||
| - `mcp__chrome-devtools__evaluate_script` — detect head-loaded scripts via DOM query | ||
| - `mcp__chrome-devtools__close_page` — clean up tab | ||
|
|
||
| **File tools used:** | ||
| - `Read` — read `trusted-server.toml` (publisher domain) and existing `js-assets.toml` (diff mode) | ||
| - `Write` — write generated/updated `js-assets.toml` | ||
|
|
||
| --- | ||
|
|
||
| ## Delivery Order | ||
|
|
||
| The Auditor should be delivered **after Proxy Phase 1** (so `js-assets.toml` schema is defined) and **before Proxy Phase 2** (so engineering has real populated entries to test the cache pipeline against actual vendor origins). | ||
|
|
||
| See [delivery order in the Proxy spec](2026-04-01-js-asset-proxy-design.md). | ||
|
|
||
| --- | ||
|
|
||
| ## Verification | ||
|
|
||
| - Run `/audit-js-assets https://www.publisher.com` against a known test publisher page with identified third-party JS | ||
| - Verify generated entries match actual third-party JS observed on the page (cross-check in browser DevTools) | ||
| - Verify `inject_in_head = true` only for scripts that appear in `<head>` (not `<body>`) | ||
| - Verify wildcard detection fires for versioned path segments and not for stable paths | ||
| - Verify GTM (`googletagmanager.com`) is captured and not filtered | ||
| - Verify framework CDNs (`cdnjs.cloudflare.com` etc.) are filtered with reason in summary | ||
| - Run `--diff` against an unchanged page → all entries confirmed, no new/missing | ||
| - Run `--diff` after adding a new vendor script to the page → appears as `NEW` in summary | ||
| - Run `--diff` after removing a script → appears as `MISSING ⚠` in summary, file unchanged | ||
Uh oh!
There was an error while loading. Please reload this page.