Skip to content

Fix Supabase Disk IO bottleneck: drop unused deleted column + index hot queries#135

Draft
Hacksore wants to merge 2 commits into
mainfrom
cursor/diagnose-disk-io-bottleneck-9a3e
Draft

Fix Supabase Disk IO bottleneck: drop unused deleted column + index hot queries#135
Hacksore wants to merge 2 commits into
mainfrom
cursor/diagnose-disk-io-bottleneck-9a3e

Conversation

@Hacksore

@Hacksore Hacksore commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Diagnosis

The Disk IO budget alerts come from the Post table being sequentially scanned in full on nearly every page view.

The home page, /list, and /ai are all export const dynamic = "force-dynamic" and call getTodaysLaunches / getTodaysLaunchesPaginated (apps/web/src/app/lib/launches.ts) on every request. Those queries filter Post over a createdAt day-range; /api/list, /api/stats, /slop, and sitemap.ts hit the same table the same way. Post had no indexes except its primary key, so every request read the whole table to return a few hundred rows.

Two changes

1. Drop the deleted column (it was vestigial)

deleted dates back to when AI filtering ran every 15 minutes and removed posts mid-day. Now ingestion runs once a day, and the only writer that set it to true was the daily updateMany in ingest-posts that flipped every post not in today's fetch to deleted=true — which is already implied by the createdAt day-range filter every read path uses. It was also silently limiting the sitemap to today's posts only (deleted=false).

So this PR:

  • removes the deleted column from Post,
  • removes the daily updateMany (a large daily write — also IO) and the deleted writes in ingest-posts,
  • removes the now-redundant deleted filters from the launch queries,
  • lets the sitemap include all posts (fixes the latent SEO bug).

A soft-delete flag can be reintroduced if/when we actually want soft deletes.

2. Index the hot read paths

  • Post @@index([hasAi, createdAt]) — the launch queries (filter hasAi + createdAt range).
  • Post @@index([createdAt])createdAt-range/ordered reads without a hasAi filter (/api/list, /api/stats, sitemap).
  • TopicPost @@index([postId]) — the composite PK is keyed on topicId first, so include: { topics } couldn't use it.
  • Metric @@index([timestamp])/slop window + /api/stats/time ordering.

votesCount is intentionally left out of the indexes so the every‑15‑minutes update-vote-count cron doesn't pay index write amplification.

Evidence

Reproduced on Postgres with a production-shaped dataset (~154k posts). EXPLAIN (ANALYZE, BUFFERS)shared buffers = 8KB pages touched (≈ disk reads when uncached):

Hot query Before After Page reads ↓
Today's launches (hasAi=false) Parallel Seq Scan, 4400 pages, 11ms Index Scan [hasAi, createdAt], 206 pages, 0.17ms ~21×
Today's count Parallel Seq Scan, 4400 pages, 9.5ms Index Only Scan, 5 pages, 0.05ms ~880×
Today's launches (no hasAi filter) Parallel Seq Scan, 4400 pages, 8.9ms Bitmap Index Scan [createdAt], 14 pages, 0.17ms ~310×
Sitemap Seq Scan + temp-file sort, 26.7ms Index Scan Backward [createdAt] (now returns all posts) sort spill removed

Full before/after EXPLAIN ANALYZE output attached.

disk_io_explain_no_deleted.log

App still works end-to-end

After applying the schema change to a seeded DB, home → /list → detail all render real content and the AI filter works (AI • 1.2K / 2.8K • No AI):

oghunt_after_removing_deleted.mp4

Home page after removing deleted
Post detail after removing deleted

Applying in production (important)

This repo uses prisma db push. Dropping the column + creating indexes with a plain db push locks the table and needs --accept-data-loss. On the large production Post table, do it manually instead — build indexes concurrently and drop the column in a quick separate statement:

CREATE INDEX CONCURRENTLY IF NOT EXISTS "Post_hasAi_createdAt_idx" ON "Post" ("hasAi", "createdAt");
CREATE INDEX CONCURRENTLY IF NOT EXISTS "Post_createdAt_idx"       ON "Post" ("createdAt");
CREATE INDEX CONCURRENTLY IF NOT EXISTS "TopicPost_postId_idx"     ON "TopicPost" ("postId");
CREATE INDEX CONCURRENTLY IF NOT EXISTS "Metric_timestamp_idx"     ON "Metric" ("timestamp");
ALTER TABLE "Post" DROP COLUMN IF EXISTS "deleted";

Verification

  • pnpm db:push applied the drop + 4 indexes (confirmed via pg_indexes).
  • pnpm db:generate, pnpm check, pnpm build all pass (build now emits sitemap chunks 0–3, i.e. it includes all posts).
  • ✅ App serves real content across /, /list, /ai, /api/list, /slop (see demo).
  • EXPLAIN (ANALYZE, BUFFERS) before/after confirms seq scan → index scan.

To show artifacts inline, enable in settings.

Open in Web Open in Cursor 

The hot read paths (home, /list, /ai, /api/list, /api/stats) are
force-dynamic and filter Post by deleted + hasAi over a createdAt day
range, ordered by votesCount. With no matching index Postgres did a
sequential scan of the entire Post table on every request, which is the
primary driver of Supabase Disk IO budget consumption as history grows.

Add covering indexes for the launch/sitemap queries on Post, an index on
TopicPost.postId for the topics include, and an index on Metric.timestamp
for the /slop and stats routes. votesCount is intentionally left out of
the indexes so the every-15-minutes vote-count cron does not pay index
write amplification.

Co-authored-by: Sean Boult <Hacksore@users.noreply.github.com>
@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
oghunt Ready Ready Preview, Comment Jun 17, 2026 1:14am

Request Review

The deleted flag was a holdover from when AI filtering ran every 15
minutes and mid-day removed posts. Now ingestion runs once a day and the
only writer (the daily updateMany in ingest-posts) just flipped every
post not in today's fetch to deleted=true, which is already implied by
the createdAt day-range filter the read paths use. It also silently
limited the sitemap to today's posts only.

Drop the column and the daily updateMany, remove the now-redundant
deleted filters from the launch queries, and let the sitemap include all
posts. Indexes become [hasAi, createdAt] and [createdAt] (no deleted),
which still serve the hot read paths and avoid full table scans.

We can reintroduce a deleted flag if/when we actually want soft deletes.

Co-authored-by: Sean Boult <Hacksore@users.noreply.github.com>
@cursor cursor Bot changed the title Fix Supabase Disk IO bottleneck: add indexes for full-table-scan queries Fix Supabase Disk IO bottleneck: drop unused deleted column + index hot queries Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants