Fix Supabase Disk IO bottleneck: drop unused deleted column + index hot queries#135
Draft
Hacksore wants to merge 2 commits into
Draft
Fix Supabase Disk IO bottleneck: drop unused deleted column + index hot queries#135Hacksore wants to merge 2 commits into
deleted column + index hot queries#135Hacksore wants to merge 2 commits into
Conversation
The hot read paths (home, /list, /ai, /api/list, /api/stats) are force-dynamic and filter Post by deleted + hasAi over a createdAt day range, ordered by votesCount. With no matching index Postgres did a sequential scan of the entire Post table on every request, which is the primary driver of Supabase Disk IO budget consumption as history grows. Add covering indexes for the launch/sitemap queries on Post, an index on TopicPost.postId for the topics include, and an index on Metric.timestamp for the /slop and stats routes. votesCount is intentionally left out of the indexes so the every-15-minutes vote-count cron does not pay index write amplification. Co-authored-by: Sean Boult <Hacksore@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The deleted flag was a holdover from when AI filtering ran every 15 minutes and mid-day removed posts. Now ingestion runs once a day and the only writer (the daily updateMany in ingest-posts) just flipped every post not in today's fetch to deleted=true, which is already implied by the createdAt day-range filter the read paths use. It also silently limited the sitemap to today's posts only. Drop the column and the daily updateMany, remove the now-redundant deleted filters from the launch queries, and let the sitemap include all posts. Indexes become [hasAi, createdAt] and [createdAt] (no deleted), which still serve the hot read paths and avoid full table scans. We can reintroduce a deleted flag if/when we actually want soft deletes. Co-authored-by: Sean Boult <Hacksore@users.noreply.github.com>
deleted column + index hot queries
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Diagnosis
The Disk IO budget alerts come from the
Posttable being sequentially scanned in full on nearly every page view.The home page,
/list, and/aiare allexport const dynamic = "force-dynamic"and callgetTodaysLaunches/getTodaysLaunchesPaginated(apps/web/src/app/lib/launches.ts) on every request. Those queries filterPostover acreatedAtday-range;/api/list,/api/stats,/slop, andsitemap.tshit the same table the same way.Posthad no indexes except its primary key, so every request read the whole table to return a few hundred rows.Two changes
1. Drop the
deletedcolumn (it was vestigial)deleteddates back to when AI filtering ran every 15 minutes and removed posts mid-day. Now ingestion runs once a day, and the only writer that set it totruewas the dailyupdateManyiningest-poststhat flipped every post not in today's fetch todeleted=true— which is already implied by thecreatedAtday-range filter every read path uses. It was also silently limiting the sitemap to today's posts only (deleted=false).So this PR:
deletedcolumn fromPost,updateMany(a large daily write — also IO) and thedeletedwrites iningest-posts,deletedfilters from the launch queries,A soft-delete flag can be reintroduced if/when we actually want soft deletes.
2. Index the hot read paths
Post @@index([hasAi, createdAt])— the launch queries (filterhasAi+createdAtrange).Post @@index([createdAt])—createdAt-range/ordered reads without ahasAifilter (/api/list,/api/stats, sitemap).TopicPost @@index([postId])— the composite PK is keyed ontopicIdfirst, soinclude: { topics }couldn't use it.Metric @@index([timestamp])—/slopwindow +/api/stats/timeordering.votesCountis intentionally left out of the indexes so the every‑15‑minutesupdate-vote-countcron doesn't pay index write amplification.Evidence
Reproduced on Postgres with a production-shaped dataset (~154k posts).
EXPLAIN (ANALYZE, BUFFERS)—sharedbuffers = 8KB pages touched (≈ disk reads when uncached):hasAi=false)[hasAi, createdAt], 206 pages, 0.17mshasAifilter)[createdAt], 14 pages, 0.17ms[createdAt](now returns all posts)Full before/after
EXPLAIN ANALYZEoutput attached.disk_io_explain_no_deleted.log
App still works end-to-end
After applying the schema change to a seeded DB, home →
/list→ detail all render real content and the AI filter works (AI • 1.2K/2.8K • No AI):oghunt_after_removing_deleted.mp4
Home page after removing deleted
Post detail after removing deleted
Applying in production (important)
This repo uses
prisma db push. Dropping the column + creating indexes with a plaindb pushlocks the table and needs--accept-data-loss. On the large productionPosttable, do it manually instead — build indexes concurrently and drop the column in a quick separate statement:Verification
pnpm db:pushapplied the drop + 4 indexes (confirmed viapg_indexes).pnpm db:generate,pnpm check,pnpm buildall pass (build now emits sitemap chunks 0–3, i.e. it includes all posts)./,/list,/ai,/api/list,/slop(see demo).EXPLAIN (ANALYZE, BUFFERS)before/after confirms seq scan → index scan.To show artifacts inline, enable in settings.