JS Asset Auditor engineering spec by jevansnyc · Pull Request #608 · IABTechLab/trusted-server

jevansnyc · 2026-04-01T18:18:00Z

Summary

Engineering spec for the JS Asset Auditor — a Claude Code slash command that sweeps a publisher page via Chrome DevTools MCP, detects third-party JS assets, and generates or diffs js-assets.toml.

Closes #606

Docs added

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md — engineering spec covering sweep protocol, Chrome DevTools MCP tooling, heuristic filtering, slug generation, init and diff modes

Test plan

Review sweep protocol and verify Chrome DevTools MCP tools match what is available
Verify heuristic filter list covers expected ad tech vs non-ad tech origins
Confirm slug algorithm matches proxy spec derivation

Engineering spec for the /audit-js-assets . Covers sweep protocol, Chrome DevTools MCP tooling, heuristic filtering, slug generation, init and diff modes. Closes #606

prk-Jr

Summary

Reviewed the engineering spec and left a few implementation-facing questions inline. The main areas are URL normalization, wildcarding, third-party boundary matching, path derivation, and inject_in_head semantics for dynamic scripts.

prk-Jr · 2026-04-02T11:17:15Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+|---|---|
+| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
+| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
+| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |


❓ question — origin_url needs an explicit normalization policy

This reads as though we would persist the captured URL more or less verbatim. That seems likely to commit cache-busters, per-session tokens, consent params, or other transient query values into js-assets.toml, which would create churn and could leak noisy values into config.

Could we define the normalization rules here: whether fragments are dropped, which query params are preserved or stripped, and whether wildcarding happens before or after query normalization?

prk-Jr · 2026-04-02T11:17:15Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+
+Path segments matching either pattern are replaced with `*`:
+- Semver: `\d+\.\d+[\.\d-]*` (e.g., `1.19.8-hcskhn`)
+- Hash-like: `[a-f0-9]{6,}` or `[A-Za-z0-9]{8,}` between path separators


❓ question — The hash-like wildcard heuristic looks broad enough to overmatch stable paths

[A-Za-z0-9]{8,} will match a lot of stable path segments in the wild, not just hashes. That seems likely to over-wildcard legitimate routes and produce unstable proxy config.

Would it make sense to tighten this to a higher-entropy pattern or otherwise define a more hash-specific rule so engineering does not have to guess where the false-positive boundary should be?

prk-Jr · 2026-04-02T11:17:15Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page`
+3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
+4. In parallel:
+   - `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`


❓ question — Third-party boundary and filter matching semantics need to be explicit

origin ≠ publisher.domain leaves a few important cases open: www. variants, publisher-owned subdomains/CDNs, and how the heuristic filter list should match real hosts like www.googletagmanager.com instead of just apex domains.

Could we specify whether host matching here is exact-host, eTLD+1, or suffix-with-dot-boundary matching? That will directly affect what gets surfaced vs filtered.

prk-Jr · 2026-04-02T11:17:15Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+| Field | Derivation |
+|---|---|
+| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
+| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |


❓ question — The path derivation formula conflicts with the examples below

The table says /{publisher_prefix}/{asset_stem}.js, but the examples later use /sdk/aB3kR7mN.js and /raven-static/*. That leaves the implementation having to infer the canonical route shape.

Can we pick one formula and update the examples so they match it exactly?

prk-Jr · 2026-04-02T11:17:15Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
+| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
+| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |
+| `inject_in_head` | `true` if URL appeared in head script list from DOM evaluation, else `false` |


❓ question — How should inject_in_head behave for dynamically inserted scripts that are no longer in document.head at snapshot time?

The current rule makes sense for static markup, but some loaders briefly insert or move script tags and then remove them. In those cases the network sweep may see the request even though the later DOM query does not.

Do we want the source of truth here to be the final DOM snapshot only, or should the implementation treat some dynamic head insertions as inject_in_head = true as well?

aram356

Summary

Well-structured engineering spec for the JS Asset Auditor. The Auditor/Proxy separation and diff mode design are strong. However, several MCP tool assumptions are incorrect and would cause the skill to fail at implementation time, and the path derivation algorithm contradicts its own examples.

Blocking

🔧 wrench

MCP tool names are wrong: every mcp__chrome-devtools__* reference uses an incorrect prefix — actual names are mcp__plugin_chrome-devtools-mcp_chrome-devtools__* (line 36+)
wait_for misuse: tool waits for text on page, not a settle window — need evaluate_script with setTimeout instead (line 37)
list_network_requests filtering: tool supports resourceTypes filtering, not URL/Content-Type filtering — post-processing needed (line 39)
Path derivation contradicts examples: algorithm says /{publisher_prefix}/{asset_stem}.js but example shows /sdk/aB3kR7mN.js — these are different structures (line 68 vs 119)

❓ question

Slug concatenation separator: sha256(publisher.domain + origin_url) doesn't specify a separator — and base62 encoding isn't standardized (character set ordering). Both must be pinned for Auditor/Proxy consistency (line 88)
Skill location: spec says .claude/skills/audit-js-assets.md but project uses .claude/commands/ for all slash commands — reason for new directory? (line 186)

Non-blocking

🤔 thinking

Wildcard regex too broad: [A-Za-z0-9]{8,} matches stable path segments like analytics, bootstrap (line 100)
stale_ttl_sec omitted: Proxy spec defines this field but Auditor doesn't mention it in generated entries (line 70)
platform.twitter.com: may need updating to platform.x.com post-rebrand (line 58)
6s settle window: may miss late-loading ad tech scripts — consider making configurable (line 37)
MCP permission grants: close_page and wait_for are not pre-approved in .claude/settings.json — will trigger permission prompts
No error handling specified: DNS failures, ad blockers, browser crashes during sweep — worth documenting expected behavior

🌱 seedling

docs/superpowers/specs/ is new: existing docs use docs/guide/, docs/internal/, docs/epics/ — convention should be documented
Wildcard detection asymmetry: only defined in Auditor spec, not cross-referenced in Proxy spec — future tools won't know the patterns

⛏ nitpick

Relative link to Proxy spec: dead if this PR merges first (line 6, 207)

👍 praise

Diff mode design: commented-out TOML + never-auto-remove is excellent
Auditor/Proxy separation: clean architectural boundary with minimal coupling
Terminal summary: shows filtered-with-reason, slug mapping, wildcard indicators
inject_in_head DOM detection: sound approach using document.head.querySelectorAll

CI Status

cargo fmt: PASS
cargo test: PASS
vitest: PASS
format-typescript: PASS
format-docs: FAIL
CodeQL: PASS
integration tests: PASS

aram356 · 2026-04-06T04:17:55Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page`
+3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
+4. In parallel:
+   - `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`


🔧 wrench — MCP tool names are wrong throughout the spec

Every mcp__chrome-devtools__* reference uses an incorrect prefix. The actual tool names in .claude/settings.json and .claude/settings.local.json are:

mcp__plugin_chrome-devtools-mcp_chrome-devtools__new_page mcp__plugin_chrome-devtools-mcp_chrome-devtools__navigate_page mcp__plugin_chrome-devtools-mcp_chrome-devtools__list_network_requests ...etc

Affects 12 occurrences across the Sweep Protocol (lines 36-43) and Implementation (lines 188-193) sections. A skill implemented from this spec will fail on every MCP call.

Fix: Replace all mcp__chrome-devtools__ with mcp__plugin_chrome-devtools-mcp_chrome-devtools__.

aram356 · 2026-04-06T04:17:55Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
+4. In parallel:
+   - `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
+   - `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs


🔧 wrench — wait_for does not provide a settle window

The spec says: "Wait for full page load + ~6s settle window for async script loads (wait_for)"

But wait_for waits for specific text to appear on the page (params: text — list of strings, timeout — max wait ms). It is not a network-idle or timeout mechanism.

Fix: Use evaluate_script with:

await new Promise(r => setTimeout(r, 6000))

Or poll list_network_requests to detect when no new script requests appear.

🤔 thinking — 6s may be too short for complex ad tech pages

Header bidding waterfalls, consent-gated loading, and lazy demand partners can take 10-15s. Consider making this configurable (e.g., --settle 10s) or documenting the tradeoff.

aram356 · 2026-04-06T04:17:55Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+   - `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
+   - `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
+5. Apply heuristic filter (see below)
+6. For each surviving asset, generate a `[[js_assets]]` entry (see below)


🔧 wrench — list_network_requests filtering claim is inaccurate

The spec says: "filter for requests where URL ends in .js or Content-Type: application/javascript"

The tool supports resourceTypes filtering (e.g., ["script"]) but does not support URL pattern or Content-Type filtering. Those must be done in post-processing.

Fix: Rewrite to: "list_network_requests with resourceTypes: ["script"] → post-filter to exclude first-party origins matching publisher.domain"

aram356 · 2026-04-06T04:17:56Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+| Social embeds | `platform.twitter.com`, `connect.facebook.net` |
+
+**`googletagmanager.com` is not filtered** — GTM is ad tech and should be proxied.
+


🤔 thinking — platform.twitter.com may be outdated

Twitter rebranded to X. The domain may already be platform.x.com or will change. Consider listing both, or verifying the current domain.

aram356 · 2026-04-06T04:17:57Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+| Field | Derivation |
+|---|---|
+| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
+| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |


🔧 wrench — Path derivation algorithm contradicts the examples

This table says path = "/{publisher_prefix}/{asset_stem}.js". For the first example (publisher_prefix = aB3kR7mN, asset_stem = prebid-load), this yields /aB3kR7mN/prebid-load.js.

But the example on line 119 shows path = "/sdk/aB3kR7mN.js" — a completely different structure. The Proxy spec shows the same contradictory examples.

Since path is the routing key between Auditor and Proxy, this must be resolved.

Fix: Align the algorithm description with the examples (or vice versa).

aram356 · 2026-04-06T04:17:57Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
+| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
+| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
+| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |


🤔 thinking — stale_ttl_sec is not mentioned

The Proxy spec defines stale_ttl_sec: Option<u32> (default 86400) in the JsAsset struct. This field is omitted from the Auditor's Asset Entry Generation table and generated TOML examples. While the proxy defaults handle the omission, it would be consistent to mention it (like ttl_sec is): "Omitted — proxy defaults to 86400".

aram356 · 2026-04-06T04:17:57Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+
+**MCP tools used:**
+- `mcp__chrome-devtools__new_page` — open browser tab
+- `mcp__chrome-devtools__navigate_page` — load publisher URL


❓ question — Skill location: .claude/skills/ vs .claude/commands/

The spec places the skill at .claude/skills/audit-js-assets.md, but the project has no .claude/skills/ directory. All existing slash commands live in .claude/commands/ (check-ci.md, verify.md, test-all.md, etc.).

Is there a reason to introduce a new directory? If not, .claude/commands/audit-js-assets.md would be consistent with existing patterns.

aram356 · 2026-04-06T04:17:57Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+**Date:** 2026-04-01  
+**Status:** Approved for engineering breakdown  
+**Related:** [JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md)
+


⛏ nitpick — Relative link to Proxy spec may be broken at merge time

[JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md) assumes the Proxy spec exists in the same directory. If this PR merges first, the link is dead. Same issue on line 207.

aram356 · 2026-04-06T04:17:57Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+```
+
+---
+


👍 praise — Diff mode design is excellent

Appending new entries as commented-out TOML blocks keeps the file valid, requires explicit operator action, and provides full provenance. The "never auto-remove" policy for missing assets is the right default for a monitoring tool.

aram356 · 2026-04-06T04:17:57Z

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

+   - `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
+5. Apply heuristic filter (see below)
+6. For each surviving asset, generate a `[[js_assets]]` entry (see below)
+7. Write output (init or diff mode)


👍 praise — inject_in_head detection via DOM query is sound

document.head.querySelectorAll('script[src]') correctly captures what's in <head> at load time, which is exactly what the Proxy needs to replicate at runtime.

Add JS Asset Auditor engineering spec

89fab0b

Engineering spec for the /audit-js-assets . Covers sweep protocol, Chrome DevTools MCP tooling, heuristic filtering, slug generation, init and diff modes. Closes #606

jevansnyc requested review from ChristianPavilonis, aram356 and prk-Jr April 1, 2026 18:18

jevansnyc self-assigned this Apr 1, 2026

prk-Jr reviewed Apr 2, 2026

View reviewed changes

aram356 requested changes Apr 6, 2026

View reviewed changes

		\| Social embeds \| `platform.twitter.com`, `connect.facebook.net` \|

		`googletagmanager.com` is not filtered — GTM is ad tech and should be proxied.

		```

		---

Conversation

jevansnyc commented Apr 1, 2026

Summary

Docs added

Test plan

Uh oh!

prk-Jr left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aram356 left a comment

Choose a reason for hiding this comment

Summary

Blocking

🔧 wrench

❓ question

Non-blocking

🤔 thinking

🌱 seedling

⛏ nitpick

👍 praise

CI Status

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants