Skip to content

Restore stealth plugin and fix fetcher language configuration#1247

Open
clementbiron wants to merge 17 commits into
mainfrom
fetcher-language-config
Open

Restore stealth plugin and fix fetcher language configuration#1247
clementbiron wants to merge 17 commits into
mainfrom
fetcher-language-config

Conversation

@clementbiron
Copy link
Copy Markdown
Member

@clementbiron clementbiron commented May 26, 2026

This PR reactivates puppeteer-extra-plugin-stealth in the full DOM fetcher. Since v10.3.1 (commit 5e4945d2), puppeteer.use(stealthPlugin(...)) was called inside fetch() instead of before puppeteer.launch(), so puppeteer-extra never bound its onPageCreated hooks and the plugin was silently inert. navigator.webdriver and HeadlessChrome were exposed to every tracked service for around four months.

It also applies the @opentermsarchive/engine.fetcher.language configuration to navigator.language and navigator.languages through the stealth sub-evasions, not just the Accept-Language HTTP header that was already affected. The stealth wrapper's locale option was being silently ignored. The redundant setExtraHTTPHeaders and CDP Network.setUserAgentOverride calls in configurePage are removed: they duplicated stealth's path with subtly different results.

Breaking: quality factors (;q=...) are no longer accepted in @opentermsarchive/engine.fetcher.language. Previously accepted values such as en-IE,en-GB;q=0.9,en;q=0.8 now throw at launchHeadlessBrowser. Combined with the now-removed setExtraHTTPHeaders call, those values used to produce malformed headers like q=0.9;q=0.9, and the surface area was not worth preserving. Provide a plain comma-separated priority list instead (e.g. en-IE,en-GB,en); the browser derives the Accept-Language quality factors from tag order.

As a side effect of applying the configuration to navigator.languages, language: "en" now exposes navigator.languages as ["en"] instead of the previous default ["en-US", "en"]. Setting language: "en-US,en" restores the previous default.

@clementbiron clementbiron requested a review from Ndpnt May 26, 2026 14:52
Comment thread src/archivist/fetcher/fullDomFetcher.js Outdated
Comment thread src/archivist/fetcher/fullDomFetcher.test.js Outdated
@clementbiron clementbiron requested a review from Ndpnt May 27, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants