Skip to content

CORE: various fixes for ucc_mem_map#1306

Open
wfaderhold21 wants to merge 2 commits into
openucx:masterfrom
wfaderhold21:topic/memh-fixes
Open

CORE: various fixes for ucc_mem_map#1306
wfaderhold21 wants to merge 2 commits into
openucx:masterfrom
wfaderhold21:topic/memh-fixes

Conversation

@wfaderhold21

Copy link
Copy Markdown
Collaborator

What

Fix a collection of correctness bugs in the ucc_mem_map/ucc_mem_unmap path and specifically TL/UCP that can lead to failures or memory leaks.

@greptile-apps

greptile-apps Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a collection of correctness bugs across ucc_mem_map/ucc_mem_unmap and TL/UCP: wrong tl_ctx[i]tl_ctx[j] index and wrong base context in failed_mem_map, missing UCC_MEM_MAP_TL_NAME_LEN + sizeof(size_t) in the offload-import offset advance, NULL guard on tl_data before unmap, new IMPORT_OFFLOAD ucp_mem_unmap call, a separate tl_h allocation for the EXPORT_OFFLOAD result handle, and accumulated (non-early-return) error reporting in ucc_mem_unmap.

  • tl_ucp_context.c: Offset bug in ucc_tl_ucp_mem_map_offload_import that caused the TL-name search to skip entries incorrectly is fixed; ucc_tl_ucp_mem_unmap now handles IMPORT_OFFLOAD and guards against NULL tl_data.
  • ucc_context.c: Critical index/context/mode bugs in failed_mem_map are fixed; packed_buffers and tl_h allocations are sized with ucc_max; the EXPORT_OFFLOAD result handle gets its own tl_h allocation instead of aliasing the input handle's array.
  • Three previously-flagged issues remain unresolved: (1) the packed_buffers array pointer itself is never freed in failed_pack; (2) in EXPORT_OFFLOAD mode, failed_pack's cleanup loop iterates j < i (importer counter) but packing stored at packed_buffers[tlh_index] (exporter slot), so any buffer stored at tlh_index >= i is silently leaked; (3) failed_mem_map unconditionally calls mem_unmap on and then frees local_memh even in EXPORT_OFFLOAD mode where local_memh is the caller's own handle.

Confidence Score: 3/5

The PR corrects several real bugs but leaves three unresolved issues in the EXPORT_OFFLOAD error paths that can cause memory leaks and corrupt the caller's handle on any pack failure.

Multiple issues remain open from the previous review round: the packed_buffers array pointer is never freed in failed_pack; the EXPORT_OFFLOAD cleanup loop uses the importer TL counter as the free index but packing stored at exporter-slot indices, leaking any buffer whose tlh_index >= i; and failed_mem_map frees local_memh and calls mem_unmap on its slots even in EXPORT_OFFLOAD mode where local_memh is the caller's own handle.

src/core/ucc_context.c — specifically the failed_pack / failed_mem_map labels in ucc_mem_map_export and their interaction with EXPORT_OFFLOAD mode

Important Files Changed

Filename Overview
src/components/tl/ucp/tl_ucp_context.c Fixes offset calculation in offload import, adds NULL guard on tl_data before unmap, adds IMPORT_OFFLOAD ucp_mem_unmap, sets NULLs after release to prevent double-free, and aligns alloc size to UCC_TL_UCP_MEMH_TL_HEADER_SIZE.
src/core/ucc_context.c Fixes several correctness bugs (wrong tl_ctx index, wrong base context, wrong mode in failed_mem_map, tl_h alloc size, packed_buffers size, EXPORT_OFFLOAD tl_h now separately allocated); however, three previously-flagged P1 issues in the failed_pack/failed_mem_map EXPORT_OFFLOAD paths remain unfixed.

Reviews (3): Last reviewed commit: "TL/UCP: various memh fixes" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants