Skip to content

JS Asset Auditor engineering spec#608

Open
jevansnyc wants to merge 1 commit intomainfrom
js-asset-auditor-spec
Open

JS Asset Auditor engineering spec#608
jevansnyc wants to merge 1 commit intomainfrom
js-asset-auditor-spec

Conversation

@jevansnyc
Copy link
Copy Markdown
Collaborator

Summary

Engineering spec for the JS Asset Auditor — a Claude Code slash command that sweeps a publisher page via Chrome DevTools MCP, detects third-party JS assets, and generates or diffs js-assets.toml.

Closes #606

Docs added

  • docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md — engineering spec covering sweep protocol, Chrome DevTools MCP tooling, heuristic filtering, slug generation, init and diff modes

Test plan

  • Review sweep protocol and verify Chrome DevTools MCP tools match what is available
  • Verify heuristic filter list covers expected ad tech vs non-ad tech origins
  • Confirm slug algorithm matches proxy spec derivation

Engineering spec for the /audit-js-assets .
Covers sweep protocol, Chrome DevTools MCP tooling, heuristic filtering,
slug generation, init and diff modes.

Closes #606
@jevansnyc jevansnyc self-assigned this Apr 1, 2026
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Reviewed the engineering spec and left a few implementation-facing questions inline. The main areas are URL normalization, wildcarding, third-party boundary matching, path derivation, and inject_in_head semantics for dynamic scripts.

|---|---|
| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questionorigin_url needs an explicit normalization policy

This reads as though we would persist the captured URL more or less verbatim. That seems likely to commit cache-busters, per-session tokens, consent params, or other transient query values into js-assets.toml, which would create churn and could leak noisy values into config.

Could we define the normalization rules here: whether fragments are dropped, which query params are preserved or stripped, and whether wildcarding happens before or after query normalization?


Path segments matching either pattern are replaced with `*`:
- Semver: `\d+\.\d+[\.\d-]*` (e.g., `1.19.8-hcskhn`)
- Hash-like: `[a-f0-9]{6,}` or `[A-Za-z0-9]{8,}` between path separators
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — The hash-like wildcard heuristic looks broad enough to overmatch stable paths

[A-Za-z0-9]{8,} will match a lot of stable path segments in the wild, not just hashes. That seems likely to over-wildcard legitimate routes and produce unstable proxy config.

Would it make sense to tighten this to a higher-entropy pattern or otherwise define a more hash-specific rule so engineering does not have to guess where the false-positive boundary should be?

2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page`
3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
4. In parallel:
- `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — Third-party boundary and filter matching semantics need to be explicit

origin ≠ publisher.domain leaves a few important cases open: www. variants, publisher-owned subdomains/CDNs, and how the heuristic filter list should match real hosts like www.googletagmanager.com instead of just apex domains.

Could we specify whether host matching here is exact-host, eTLD+1, or suffix-with-dot-boundary matching? That will directly affect what gets surfaced vs filtered.

| Field | Derivation |
|---|---|
| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — The path derivation formula conflicts with the examples below

The table says /{publisher_prefix}/{asset_stem}.js, but the examples later use /sdk/aB3kR7mN.js and /raven-static/*. That leaves the implementation having to infer the canonical route shape.

Can we pick one formula and update the examples so they match it exactly?

| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |
| `inject_in_head` | `true` if URL appeared in head script list from DOM evaluation, else `false` |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — How should inject_in_head behave for dynamically inserted scripts that are no longer in document.head at snapshot time?

The current rule makes sense for static markup, but some loaders briefly insert or move script tags and then remove them. In those cases the network sweep may see the request even though the later DOM query does not.

Do we want the source of truth here to be the final DOM snapshot only, or should the implementation treat some dynamic head insertions as inject_in_head = true as well?

Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-structured engineering spec for the JS Asset Auditor. The Auditor/Proxy separation and diff mode design are strong. However, several MCP tool assumptions are incorrect and would cause the skill to fail at implementation time, and the path derivation algorithm contradicts its own examples.

Blocking

🔧 wrench

  • MCP tool names are wrong: every mcp__chrome-devtools__* reference uses an incorrect prefix — actual names are mcp__plugin_chrome-devtools-mcp_chrome-devtools__* (line 36+)
  • wait_for misuse: tool waits for text on page, not a settle window — need evaluate_script with setTimeout instead (line 37)
  • list_network_requests filtering: tool supports resourceTypes filtering, not URL/Content-Type filtering — post-processing needed (line 39)
  • Path derivation contradicts examples: algorithm says /{publisher_prefix}/{asset_stem}.js but example shows /sdk/aB3kR7mN.js — these are different structures (line 68 vs 119)

❓ question

  • Slug concatenation separator: sha256(publisher.domain + origin_url) doesn't specify a separator — and base62 encoding isn't standardized (character set ordering). Both must be pinned for Auditor/Proxy consistency (line 88)
  • Skill location: spec says .claude/skills/audit-js-assets.md but project uses .claude/commands/ for all slash commands — reason for new directory? (line 186)

Non-blocking

🤔 thinking

  • Wildcard regex too broad: [A-Za-z0-9]{8,} matches stable path segments like analytics, bootstrap (line 100)
  • stale_ttl_sec omitted: Proxy spec defines this field but Auditor doesn't mention it in generated entries (line 70)
  • platform.twitter.com: may need updating to platform.x.com post-rebrand (line 58)
  • 6s settle window: may miss late-loading ad tech scripts — consider making configurable (line 37)
  • MCP permission grants: close_page and wait_for are not pre-approved in .claude/settings.json — will trigger permission prompts
  • No error handling specified: DNS failures, ad blockers, browser crashes during sweep — worth documenting expected behavior

🌱 seedling

  • docs/superpowers/specs/ is new: existing docs use docs/guide/, docs/internal/, docs/epics/ — convention should be documented
  • Wildcard detection asymmetry: only defined in Auditor spec, not cross-referenced in Proxy spec — future tools won't know the patterns

⛏ nitpick

  • Relative link to Proxy spec: dead if this PR merges first (line 6, 207)

👍 praise

  • Diff mode design: commented-out TOML + never-auto-remove is excellent
  • Auditor/Proxy separation: clean architectural boundary with minimal coupling
  • Terminal summary: shows filtered-with-reason, slug mapping, wildcard indicators
  • inject_in_head DOM detection: sound approach using document.head.querySelectorAll

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: FAIL
  • CodeQL: PASS
  • integration tests: PASS

2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page`
3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
4. In parallel:
- `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — MCP tool names are wrong throughout the spec

Every mcp__chrome-devtools__* reference uses an incorrect prefix. The actual tool names in .claude/settings.json and .claude/settings.local.json are:

mcp__plugin_chrome-devtools-mcp_chrome-devtools__new_page
mcp__plugin_chrome-devtools-mcp_chrome-devtools__navigate_page
mcp__plugin_chrome-devtools-mcp_chrome-devtools__list_network_requests
...etc

Affects 12 occurrences across the Sweep Protocol (lines 36-43) and Implementation (lines 188-193) sections. A skill implemented from this spec will fail on every MCP call.

Fix: Replace all mcp__chrome-devtools__ with mcp__plugin_chrome-devtools-mcp_chrome-devtools__.

3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
4. In parallel:
- `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
- `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchwait_for does not provide a settle window

The spec says: "Wait for full page load + ~6s settle window for async script loads (wait_for)"

But wait_for waits for specific text to appear on the page (params: text — list of strings, timeout — max wait ms). It is not a network-idle or timeout mechanism.

Fix: Use evaluate_script with:

await new Promise(r => setTimeout(r, 6000))

Or poll list_network_requests to detect when no new script requests appear.


🤔 thinking — 6s may be too short for complex ad tech pages

Header bidding waterfalls, consent-gated loading, and lazy demand partners can take 10-15s. Consider making this configurable (e.g., --settle 10s) or documenting the tradeoff.

- `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
- `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
5. Apply heuristic filter (see below)
6. For each surviving asset, generate a `[[js_assets]]` entry (see below)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchlist_network_requests filtering claim is inaccurate

The spec says: "filter for requests where URL ends in .js or Content-Type: application/javascript"

The tool supports resourceTypes filtering (e.g., ["script"]) but does not support URL pattern or Content-Type filtering. Those must be done in post-processing.

Fix: Rewrite to: "list_network_requests with resourceTypes: ["script"] → post-filter to exclude first-party origins matching publisher.domain"

| Social embeds | `platform.twitter.com`, `connect.facebook.net` |

**`googletagmanager.com` is not filtered** — GTM is ad tech and should be proxied.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingplatform.twitter.com may be outdated

Twitter rebranded to X. The domain may already be platform.x.com or will change. Consider listing both, or verifying the current domain.

| Field | Derivation |
|---|---|
| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Path derivation algorithm contradicts the examples

This table says path = "/{publisher_prefix}/{asset_stem}.js". For the first example (publisher_prefix = aB3kR7mN, asset_stem = prebid-load), this yields /aB3kR7mN/prebid-load.js.

But the example on line 119 shows path = "/sdk/aB3kR7mN.js" — a completely different structure. The Proxy spec shows the same contradictory examples.

Since path is the routing key between Auditor and Proxy, this must be resolved.

Fix: Align the algorithm description with the examples (or vice versa).

| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingstale_ttl_sec is not mentioned

The Proxy spec defines stale_ttl_sec: Option<u32> (default 86400) in the JsAsset struct. This field is omitted from the Auditor's Asset Entry Generation table and generated TOML examples. While the proxy defaults handle the omission, it would be consistent to mention it (like ttl_sec is): "Omitted — proxy defaults to 86400".


**MCP tools used:**
- `mcp__chrome-devtools__new_page` — open browser tab
- `mcp__chrome-devtools__navigate_page` — load publisher URL
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — Skill location: .claude/skills/ vs .claude/commands/

The spec places the skill at .claude/skills/audit-js-assets.md, but the project has no .claude/skills/ directory. All existing slash commands live in .claude/commands/ (check-ci.md, verify.md, test-all.md, etc.).

Is there a reason to introduce a new directory? If not, .claude/commands/audit-js-assets.md would be consistent with existing patterns.

**Date:** 2026-04-01
**Status:** Approved for engineering breakdown
**Related:** [JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick — Relative link to Proxy spec may be broken at merge time

[JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md) assumes the Proxy spec exists in the same directory. If this PR merges first, the link is dead. Same issue on line 207.

```

---

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 praise — Diff mode design is excellent

Appending new entries as commented-out TOML blocks keeps the file valid, requires explicit operator action, and provides full provenance. The "never auto-remove" policy for missing assets is the right default for a monitoring tool.

- `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
5. Apply heuristic filter (see below)
6. For each surviving asset, generate a `[[js_assets]]` entry (see below)
7. Write output (init or diff mode)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 praiseinject_in_head detection via DOM query is sound

document.head.querySelectorAll('script[src]') correctly captures what's in <head> at load time, which is exactly what the Proxy needs to replicate at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JS Asset Auditor — Publisher Page Sweep and Config Generation

3 participants