Skip to content

fix: reduce enrichment payload size to prevent temporal failures (CM-1065)#4001

Merged
skwowet merged 2 commits intomainfrom
fix/enrichment-query-payload-size
Apr 6, 2026
Merged

fix: reduce enrichment payload size to prevent temporal failures (CM-1065)#4001
skwowet merged 2 commits intomainfrom
fix/enrichment-query-payload-size

Conversation

@skwowet
Copy link
Copy Markdown
Collaborator

@skwowet skwowet commented Apr 6, 2026

Summary

  • Replaced full-row aggregations (jsonb_agg(oi), jsonb_agg(mi.*)) with selective jsonb_build_object(...) in three enrichment-related queries, fetching only the columns used downstream
  • Added a FILTER clause in fetchMemberDataForLLMSquashing to only return primary-domain + verified org identities at the DB level, since that's all the downstream code checks
  • Removed the now-redundant application-level filter in doesIncomingOrgExistInExistingOrgs

Context

The enrichMember workflow was failing for members associated with orgs that have many identities (e.g. IBM with ~9,750 org identities). The fetchMemberDataForLLMSquashing activity result was 3.8 MB, exceeding Temporal's ~2 MB payload size limit. The same full-row aggregation pattern existed in two other queries that could hit the same issue.

Performance

Benchmarked against production data (worst-case members):

Query Execution time Payload size
fetchMemberDataForLLMSquashing 101ms → 14ms (7x) 3.8 MB → 128 KB (30x)
fetchMembersForLFIDEnrichment 9ms → 0.26ms (35x) 3 MB → negligible
getMembers (merge suggestions) 144ms → 81ms (1.8x) 3 MB → negligible

Query plans are structurally identical (same indexes, same joins) - the improvement comes from building and sorting smaller JSON objects.


Note

Medium Risk
Changes SQL aggregation shapes and filtering for member/org identities in enrichment and merge-suggestions queries, which could affect downstream consumers expecting full-row JSON or unfiltered identities. Logic impact is intended to be narrower payloads but warrants verification on edge cases (missing identities, matching by domain).

Overview
Reduces enrichment-related DB payload sizes by replacing full-row JSON aggregations with jsonb_build_object(...) projections in the member/org identity queries used by fetchMemberDataForLLMSquashing, fetchMembersForLFIDEnrichment, and merge-suggestions getMembers.

Moves filtering of organization identities to the database (only verified primary-domain identities are returned), adds coalesce defaults for empty identity arrays, and simplifies doesIncomingOrgExistInExistingOrgs to assume the pre-filtered identity set.

Reviewed by Cursor Bugbot for commit e180176. Bugbot is set up for automated code reviews on this repo. Configure here.

…ty failures

Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
@skwowet skwowet self-assigned this Apr 6, 2026
Copilot AI review requested due to automatic review settings April 6, 2026 09:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@skwowet skwowet requested a review from mbani01 April 6, 2026 09:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@skwowet skwowet changed the title fix: reduce enrichment query payload sizes to prevent temporal activity failures fix: reduce enrichment payload size to prevent temporal failures (CM-1065) Apr 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces JSON payload sizes returned by enrichment-related Postgres queries to avoid exceeding Temporal activity payload limits (observed in enrichMember for orgs with very large identity counts).

Changes:

  • Replaced full-row jsonb_agg(<row>) aggregations with jsonb_build_object(...) projections to return only downstream-used columns.
  • Filtered org identities in fetchMemberDataForLLMSquashing to only include verified primary-domain identities and normalized null aggregates to empty arrays.
  • Removed redundant application-side filtering in doesIncomingOrgExistInExistingOrgs based on the updated query behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/memberMergeSuggestions.repo.ts Shrinks member identity aggregation payload for merge suggestions.
services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts Shrinks enrichment query payloads and adds DB-side filtering/coalescing for org identities.
services/apps/members_enrichment_worker/src/activities/enrichment.ts Removes now-redundant org identity filtering in app logic to align with query output.
Comments suppressed due to low confidence (1)

services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts:42

  • The aggregate FILTER limits which identities are returned, but the query still INNER JOINs all organizationIdentities rows for the org, so members tied to orgs with thousands of identities still generate a large intermediate join result. To reduce work at the DB level (and better match the intent here), push oi.type = 'primary-domain' AND oi.verified = true into the join/WHERE clause; if you still need to keep org rows that have no matching primary-domain identities, switch the join to a LEFT JOIN and ensure the aggregate excludes null-join rows.
                            )) filter (where oi.type = 'primary-domain' and oi.verified = true) as identities
                        from "memberOrganizations" mo
                            inner join organizations o on mo."organizationId" = o.id
                            inner join "organizationIdentities" oi on oi."organizationId" = o.id
                        where mo."memberId" = $(memberId)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@mbani01 mbani01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @skwowet!

@skwowet skwowet merged commit 02aa8f9 into main Apr 6, 2026
11 checks passed
@skwowet skwowet deleted the fix/enrichment-query-payload-size branch April 6, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants