fix(harvest): retry transient network errors + secure deploy plan#3
Open
yashksaini-coder wants to merge 6 commits into
Open
fix(harvest): retry transient network errors + secure deploy plan#3yashksaini-coder wants to merge 6 commits into
yashksaini-coder wants to merge 6 commits into
Conversation
The top-level fetchWeeklyContributions had zero retry, so one TCP hiccup during the response gzip-decode (ERR_STREAM_PREMATURE_CLOSE) killed the whole weekly workflow — that's what knocked out run 28319315389. Wrap client.request in p-retry (already a dep). Retry network-class codes (ECONNRESET, ETIMEDOUT, ENOTFOUND, EAI_AGAIN, ERR_STREAM_PREMATURE_CLOSE, UND_ERR_SOCKET) and HTTP 5xx. Abort via AbortError on 4xx so a bad token doesn't burn the workflow's 10-min budget on three pointless retries. 3 attempts, exponential backoff (1s → 10s). Per-commit enrichment path already retries via githubLimiter; this just protects the top.
6-phase plan to ship a public read-only blog viewer on Vercel while keeping the long-running pipeline + owner-only admin on Render. Replaces the scrypt PIN with GitHub OAuth + OWNER_GITHUB_ID check — no credential exists to brute-force; GitHub's 2FA/passkey stack handles the auth. Pipeline can't move to Vercel (60s function budget vs. ~30-90s narration + harvest + publish), so the split keeps the right host for each job.
|
|
Mirrors the Node 22 + pnpm setup used by weekly-dispatch.yml. Runs: - pnpm lint - pnpm exec tsc --noEmit (catches type errors in non-test code) - pnpm test concurrency cancels in-flight runs on a rapid push so we don't queue stale jobs. Read-only permissions (no need to write anything from CI). Closes the gap where PRs got zero automated feedback — every PR from here on gets a green/red signal before merge.
Per docs/plans/secure-public-deploy.md Phase 1: move every authenticated route under /admin/* so the public/private boundary is purely a URL prefix. Sets up Phase 2 (GitHub OAuth) to swap auth at one place without touching route definitions. Path changes: - /runs → /admin/runs - /run → /admin/run - /preview → /admin/preview - /login → /admin/login - /logout → /admin/logout Kept public: /, /health, /generated/* Adds a regex-based 301 redirect middleware for the legacy paths (~1 week grace period — remove after 2026-07-06). GET/HEAD only; POST callers (Bearer scripts) need to update to the new paths explicitly. Updates every internal link (forms, redirects, header nav, tokenGate), safeInternalPath default, auth-middleware login redirect, and the existing tests that reference the old paths. No auth-scheme changes in this phase — scrypt + session cookie still gate /admin/*. Verified: pnpm test (78/78), pnpm exec tsc --noEmit clean, pnpm lint (0 errors), manual smoke confirmed legacy 301s and new /admin/* routes respond correctly with auth disabled.
CI on PR #3 failed because validateEnv() runs at module import time and calls process.exit(1) when required env vars are missing. CI has no .env.local, so every test file that transitively imports config/env.ts crashed at the import phase — vitest reported these as "(0 tests)" rather than failures, which made the symptom easy to miss locally. Same pattern bit narrator.agent.ts: it constructs a Google model at module load via createGoogleModel → nextKey, which throws on empty GOOGLE_API_KEYS. Fix: when process.env.VITEST is set (auto by the runner), short-circuit validation with stub values for the five required fields. Real env vars from process.env override the stubs via spread order, so .env.local values still take effect locally. The strict production path is unchanged — failed validation still exits. Verified: pnpm test passes 78/78 both with AND without .env.local (simulating the CI runner). tsc clean, lint clean.
Adds passwordless owner authentication via GitHub OAuth + an owner-id
gate. Strictly additive — scrypt password stays as a fallback so the
existing Render deploy keeps working until OAuth env vars are wired up.
Flow:
GET /admin/auth/github → mint random state, set state cookie,
redirect to github.com authorize
GET /admin/auth/github/callback → verify state (constant-time), exchange
code for token, fetch /user, check
id === OWNER_GITHUB_ID, mint session
New env vars (all four required together, or none):
GITHUB_OAUTH_CLIENT_ID
GITHUB_OAUTH_CLIENT_SECRET
OWNER_GITHUB_ID — numeric (gh api /user --jq .id)
OAUTH_CALLBACK_URL — must match the OAuth app's callback URL exactly
Cross-field check in validateEnv() refuses partial config. .env.example
and render.yaml updated with the new section. Login page now shows a
"Sign in with GitHub" button when OAuth is on; both schemes mint the
same in-memory session cookie via createSession().
Security:
- read:user scope only (just enough to fetch the user id)
- state cookie HttpOnly + Secure + SameSite=Lax + Path=/admin/auth + 5min TTL
- constant-time state compare via safeEqual()
- non-owner user id → 403 (credentials valid, just not authorized)
- no PKCE — server-side confidential client with a real client_secret
- token exchange + /user responses parsed via zod at the boundary
- token-exchange internals never leak to the browser
Tests: 9 new (87 total). Verified end-to-end via smoke:
- with no OAuth env: /admin/auth/* → 404, login page falls back to password
- with OAuth env: login shows GitHub button, /admin/auth/github → 302 to
github.com, callback with bad state → 400.
Phase 3 (HMAC session cookie) and Phase 4 (CSRF) follow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related changes:
CI fix — wrap the top-level GitHub GraphQL harvest in
p-retryso a transientERR_STREAM_PREMATURE_CLOSE(or any network-class blip) doesn't kill the weekly workflow. This was the root cause of run 28319315389 failing on 2026-06-28.Plan doc —
docs/plans/secure-public-deploy.mdcovers the 6-phase migration to a Vercel-read + Render-write split with GitHub OAuth replacing the scrypt PIN. No code from the plan is in this PR — that lands in follow-up phased PRs.Why the retry is scoped to the top-level call
enrichWithCommitDetailsalready routes throughgithubLimiter(which has its ownpRetry) and is wrapped in best-effort try/catch — per-commit failures are non-fatal. But the top-levelclient.requesthad zero retry, so one TCP hiccup = whole pipeline dead. This PR closes that gap and leaves the rest of the call graph alone.What the classifier retries vs. aborts
ECONNRESET,ETIMEDOUT,ENOTFOUND,EAI_AGAIN,ERR_STREAM_PREMATURE_CLOSE,UND_ERR_SOCKET, HTTP 5xxAbortError: HTTP 4xx (bad token, missing user), unknown codes, genericErrorAvoids burning the workflow's 10-minute budget retrying a hard auth failure.
Confirmed decisions for the secure-deploy plan (locked this session)
OWNER_GITHUB_IDcheck (no password to crack)*.vercel.appURL for nowDEVNOTION_API_TOKENBearer for scripts/CITest plan
pnpm test— 78/78 pass (11 new ingithub-retry.test.ts)pnpm lint— 0 errors (5 pre-existinganywarnings, unrelated)gh run rerun 28319315389to verify the retry actually catches the flake on replayFollow-ups (separate PRs, in order per the plan doc)
/admin/*(mechanical, ~1h)OWNER_GITHUB_IDcheck (~2-3h)