Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
# JS Asset Auditor — Engineering Spec

**Date:** 2026-04-01
**Status:** Approved for engineering breakdown
**Related:** [JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md)

Comment thread
ChristianPavilonis marked this conversation as resolved.
---

## Context

The JS Asset Proxy requires a `js-assets.toml` file declaring which third-party JS assets to proxy. Without tooling, populating this file requires manually inspecting network requests in browser DevTools, extracting URLs, generating opaque slugs, and writing TOML — a tedious error-prone process that is a barrier to publisher onboarding.

The Auditor eliminates this friction. It sweeps a publisher's page using the Chrome DevTools MCP, detects third-party JS assets, auto-generates `js-assets.toml` entries, and auto-detects `inject_in_head` from the page DOM. The operator's only remaining decision is reviewing the output before committing.

It also runs as a monitoring tool — `--diff` mode compares a new sweep against the existing config and surfaces new or removed assets, giving publishers ongoing visibility into their third-party JS footprint.

**Implementation:** Pure Claude Code skill — no Rust, no compiled code, no additional dependencies. Uses the Chrome DevTools MCP already configured in `.claude/settings.json`.

---

## Command Interface

```bash
/audit-js-assets https://www.publisher.com # init — generate js-assets.toml
/audit-js-assets https://www.publisher.com --diff # diff — compare against existing file
```

---

## Sweep Protocol

1. Read `trusted-server.toml` → extract `publisher.domain` (defines first-party boundary)
2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page`
3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
4. In parallel:
- `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
- `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
5. Apply heuristic filter (see below)
6. For each surviving asset, generate a `[[js_assets]]` entry (see below)
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
7. Write output (init or diff mode)
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
8. Print terminal summary
9. Close page via `mcp__chrome-devtools__close_page`

---

## Heuristic Filter

The following origin categories are excluded silently. The terminal summary reports what was filtered and why so operators can manually add entries if needed.

| Category | Excluded origins |
|---|---|
| Framework CDNs | `cdnjs.cloudflare.com`, `ajax.googleapis.com`, `cdn.jsdelivr.net`, `unpkg.com` |
| Error tracking | `sentry.io`, `bugsnag.com`, `rollbar.com` |
| Font services | `fonts.googleapis.com`, `fonts.gstatic.com` |
| Social embeds | `platform.twitter.com`, `connect.facebook.net` |

**`googletagmanager.com` is not filtered** — GTM is ad tech and should be proxied.

Comment thread
ChristianPavilonis marked this conversation as resolved.
Everything else surfaces for operator review.

---

## Asset Entry Generation

| Field | Derivation |
|---|---|
| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
| `inject_in_head` | `true` if URL appeared in head script list from DOM evaluation, else `false` |
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated

### Slug algorithm

```
publisher_prefix = first_8_chars(base62(sha256(publisher.domain + origin_url)))
asset_stem = filename_without_extension(origin_url)
slug = "{publisher_prefix}:{asset_stem}"
```

**Rationale:** Fully opaque and hash-derived — no human naming required, no ambiguity for cryptic vendor filenames. The KV metadata (`origin_url`, `content_type`, `asset_slug`) serves as the lookup table. Operators can query `js-asset:{slug}` in the KV store to retrieve full provenance. The terminal summary also prints slug → origin_url at generation time.

**Important:** This algorithm must produce identical output to the Proxy's KV key derivation. Engineering should implement this as a shared utility (e.g., a small JS/TS helper in the skill, or a standalone `scripts/` utility) rather than duplicating the logic.

### Wildcard detection

Path segments matching either pattern are replaced with `*`:
- Semver: `\d+\.\d+[\.\d-]*` (e.g., `1.19.8-hcskhn`)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — Slug algorithm has two ambiguities

1. Concatenation separator: sha256(publisher.domain + origin_url) — what separates the two? Raw concatenation produces example.comhttps://vendor.io/script.js. Since Auditor and Proxy must produce identical slugs, the separator (or its explicit absence) needs to be specified.

2. base62 character set: base62 is not standardized — different implementations use [0-9A-Za-z] vs [0-9a-zA-Z] vs [A-Za-z0-9]. The spec should pin the exact ordering or name a reference implementation.

- Hash-like: `[a-f0-9]{6,}` or `[A-Za-z0-9]{8,}` between path separators
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated

The original URL is preserved as a comment above the generated entry so operators can verify the wildcard substitution is correct.

---

## Init Mode Output

### `js-assets.toml` (written to repo root)

```toml
# Generated by /audit-js-assets on 2026-04-01
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Hash-like regex [A-Za-z0-9]{8,} is too broad

The hex pattern [a-f0-9]{6,} is well-scoped, but [A-Za-z0-9]{8,} between path separators would match stable, legitimate segments like analytics (9), bootstrap (9), modernizr (9), dashboard (9). These would be incorrectly wildcarded.

Consider requiring mixed character classes (must contain both letters and digits), a higher minimum (12+), or excluding common dictionary words.

# Publisher: publisher.com
# Source URL: https://www.publisher.com

[[js_assets]]
# https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js
slug = "aB3kR7mN:prebid-load"
path = "/sdk/aB3kR7mN.js"
origin_url = "https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js"
inject_in_head = true

[[js_assets]]
# https://raven-static.vendor.io/prod/1.19.8-hcskhn/raven.js (wildcard detected)
slug = "xQ9pL2wY:raven"
path = "/raven-static/*"
origin_url = "https://raven-static.vendor.io/prod/*/raven.js"
inject_in_head = false
```

### Terminal summary

```
JS Asset Audit — publisher.com
────────────────────────────────
Detected: 8 third-party JS requests
Filtered: 3 (cdnjs.cloudflare.com ×2, sentry.io ×1)
Surfaced: 5 assets → js-assets.toml

aB3kR7mN inject_in_head=true web.prebidwrapper.com/.../prebid-load.js
xQ9pL2wY inject_in_head=false raven-static.vendor.io/prod/*/raven.js [wildcard]
zM4nK8vP inject_in_head=true googletagmanager.com/gtm.js
...

Review inject_in_head values and commit js-assets.toml when ready.
Diff mode: /audit-js-assets <url> --diff
```

---

Comment thread
ChristianPavilonis marked this conversation as resolved.
## Diff Mode Output

Compares sweep results against the existing `js-assets.toml`.

| Condition | Behavior |
|---|---|
| Asset in sweep, not in file | **New** — appended to `js-assets.toml` as a commented-out block |
| Asset in file, not in sweep | **Missing** — flagged in terminal summary with `⚠`. Never auto-removed. |
| Asset in both | **Confirmed** — listed as present |

New entries are appended as TOML comments so the file stays valid and nothing is activated without the operator explicitly uncommenting.

### `js-assets.toml` (new entry appended as comment)

```toml
# --- NEW (detected by /audit-js-assets --diff on 2026-04-01, uncomment to activate) ---
# [[js_assets]]
# # https://googletagmanager.com/gtm.js
# slug = "zM4nK8vP:gtm"
# path = "/sdk/zM4nK8vP.js"
# origin_url = "https://googletagmanager.com/gtm.js"
# inject_in_head = true
```

### Terminal summary (diff mode)

```
JS Asset Audit (diff) — publisher.com
────────────────────────────────
Confirmed: 4 assets still present on page
New: 1 asset detected (appended as comment to js-assets.toml)
Missing: 1 asset no longer seen on page ⚠

NEW zM4nK8vP googletagmanager.com/gtm.js → review in js-assets.toml
MISSING xQ9pL2wY raven-static.vendor.io/... → may have been removed or renamed
```

---

## Implementation

The Auditor is a Claude Code skill file. No compiled code.

**Skill location:** `.claude/skills/audit-js-assets.md`

**MCP tools used:**
- `mcp__chrome-devtools__new_page` — open browser tab
- `mcp__chrome-devtools__navigate_page` — load publisher URL
Comment thread
ChristianPavilonis marked this conversation as resolved.
Outdated
- `mcp__chrome-devtools__wait_for` — settle after page load
- `mcp__chrome-devtools__list_network_requests` — capture JS requests
- `mcp__chrome-devtools__evaluate_script` — detect head-loaded scripts via DOM query
- `mcp__chrome-devtools__close_page` — clean up tab

**File tools used:**
- `Read` — read `trusted-server.toml` (publisher domain) and existing `js-assets.toml` (diff mode)
- `Write` — write generated/updated `js-assets.toml`

---

## Delivery Order

The Auditor should be delivered **after Proxy Phase 1** (so `js-assets.toml` schema is defined) and **before Proxy Phase 2** (so engineering has real populated entries to test the cache pipeline against actual vendor origins).

See [delivery order in the Proxy spec](2026-04-01-js-asset-proxy-design.md).

---

## Verification

- Run `/audit-js-assets https://www.publisher.com` against a known test publisher page with identified third-party JS
- Verify generated entries match actual third-party JS observed on the page (cross-check in browser DevTools)
- Verify `inject_in_head = true` only for scripts that appear in `<head>` (not `<body>`)
- Verify wildcard detection fires for versioned path segments and not for stable paths
- Verify GTM (`googletagmanager.com`) is captured and not filtered
- Verify framework CDNs (`cdnjs.cloudflare.com` etc.) are filtered with reason in summary
- Run `--diff` against an unchanged page → all entries confirmed, no new/missing
- Run `--diff` after adding a new vendor script to the page → appears as `NEW` in summary
- Run `--diff` after removing a script → appears as `MISSING ⚠` in summary, file unchanged
Loading