fix: reduce enrichment payload size to prevent temporal failures (CM-1065)#4001
fix: reduce enrichment payload size to prevent temporal failures (CM-1065)#4001
Conversation
…ty failures Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
Reduces JSON payload sizes returned by enrichment-related Postgres queries to avoid exceeding Temporal activity payload limits (observed in enrichMember for orgs with very large identity counts).
Changes:
- Replaced full-row
jsonb_agg(<row>)aggregations withjsonb_build_object(...)projections to return only downstream-used columns. - Filtered org identities in
fetchMemberDataForLLMSquashingto only include verifiedprimary-domainidentities and normalized null aggregates to empty arrays. - Removed redundant application-side filtering in
doesIncomingOrgExistInExistingOrgsbased on the updated query behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/memberMergeSuggestions.repo.ts | Shrinks member identity aggregation payload for merge suggestions. |
| services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts | Shrinks enrichment query payloads and adds DB-side filtering/coalescing for org identities. |
| services/apps/members_enrichment_worker/src/activities/enrichment.ts | Removes now-redundant org identity filtering in app logic to align with query output. |
Comments suppressed due to low confidence (1)
services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts:42
- The aggregate
FILTERlimits which identities are returned, but the query still INNER JOINs allorganizationIdentitiesrows for the org, so members tied to orgs with thousands of identities still generate a large intermediate join result. To reduce work at the DB level (and better match the intent here), pushoi.type = 'primary-domain' AND oi.verified = trueinto the join/WHERE clause; if you still need to keep org rows that have no matching primary-domain identities, switch the join to a LEFT JOIN and ensure the aggregate excludes null-join rows.
)) filter (where oi.type = 'primary-domain' and oi.verified = true) as identities
from "memberOrganizations" mo
inner join organizations o on mo."organizationId" = o.id
inner join "organizationIdentities" oi on oi."organizationId" = o.id
where mo."memberId" = $(memberId)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
jsonb_agg(oi),jsonb_agg(mi.*)) with selectivejsonb_build_object(...)in three enrichment-related queries, fetching only the columns used downstreamFILTERclause infetchMemberDataForLLMSquashingto only returnprimary-domain+verifiedorg identities at the DB level, since that's all the downstream code checksdoesIncomingOrgExistInExistingOrgsContext
The
enrichMemberworkflow was failing for members associated with orgs that have many identities (e.g. IBM with ~9,750 org identities). ThefetchMemberDataForLLMSquashingactivity result was 3.8 MB, exceeding Temporal's ~2 MB payload size limit. The same full-row aggregation pattern existed in two other queries that could hit the same issue.Performance
Benchmarked against production data (worst-case members):
fetchMemberDataForLLMSquashingfetchMembersForLFIDEnrichmentgetMembers(merge suggestions)Query plans are structurally identical (same indexes, same joins) - the improvement comes from building and sorting smaller JSON objects.
Note
Medium Risk
Changes SQL aggregation shapes and filtering for member/org identities in enrichment and merge-suggestions queries, which could affect downstream consumers expecting full-row JSON or unfiltered identities. Logic impact is intended to be narrower payloads but warrants verification on edge cases (missing identities, matching by domain).
Overview
Reduces enrichment-related DB payload sizes by replacing full-row JSON aggregations with
jsonb_build_object(...)projections in the member/org identity queries used byfetchMemberDataForLLMSquashing,fetchMembersForLFIDEnrichment, and merge-suggestionsgetMembers.Moves filtering of organization identities to the database (only verified
primary-domainidentities are returned), addscoalescedefaults for empty identity arrays, and simplifiesdoesIncomingOrgExistInExistingOrgsto assume the pre-filtered identity set.Reviewed by Cursor Bugbot for commit e180176. Bugbot is set up for automated code reviews on this repo. Configure here.