Skip to content

fix(eval): normalize uppercase GUID eval ids at ingestion (PC-4688)#1702

Open
Chibionos wants to merge 2 commits into
mainfrom
fix/PC-4688-eval-id-casing
Open

fix(eval): normalize uppercase GUID eval ids at ingestion (PC-4688)#1702
Chibionos wants to merge 2 commits into
mainfrom
fix/PC-4688-eval-id-casing

Conversation

@Chibionos
Copy link
Copy Markdown
Contributor

Summary

Fixes PC-4688 — eval runs stall after the first batch (batch size 10) when eval items have uppercase GUID ids (e.g. B063907C-…); the run sits on Running and errors at the run timeout. Lowercase ids work. Reported by FDE (Brian Burke); the customer workaround is to lowercase the ids in the eval-set JSON.

Root cause

GUIDs are case-insensitive, but the eval runtime correlates ids as plain strings. The backend canonicalizes GUIDs to lowercase, while the SDK/runtime preserves whatever casing the eval-set JSON contains. Every id-keyed correlation derives from eval_item.id:

  • EvaluationSet.extract_selected_evals(eval_ids) — case-sensitive set membership (the --eval-ids selection the cloud orchestrator passes per batch),
  • execution_id / span execution.id / eval_item_id and the progress-reporter eval_run_ids cache — all str(eval_item.id).

When the casing of the JSON id and the backend-normalized id diverge, these lookups silently miss, so the runner can't reconcile completed work and stops making progress.

Note: the codebase already guards evaluator ids with StringComparer.OrdinalIgnoreCase on the backend — eval-item ids were simply missed.

Fix

Normalize at the ingestion boundary (the proven-good lowercase state), so all downstream correlation is casing-agnostic:

  • normalize_eval_id() lowercases a value only when it matches the canonical 8-4-4-4-12 GUID form; non-GUID ids (slugs like test-eval-1) are left untouched.
  • Applied via a field_validator("id") on EvaluationItem and LegacyEvaluationItem, so id is canonical everywhere it's read (selection, execution_id, span/cache keys).
  • extract_selected_evals normalizes the incoming eval_ids too, so selection matches regardless of the caller's casing.

Tests

tests/cli/eval/test_eval_id_casing.py:

  • uppercase GUID id → stored lowercase (both EvaluationItem and LegacyEvaluationItem),
  • non-GUID id unchanged,
  • extract_selected_evals matches when called with an uppercase GUID against a normalized set.

Full eval suites pass; ruff + format clean.

Scope / follow-up

This normalizes the uipath SDK / uipath eval runtime ingestion. If the .NET cloud orchestration also compares eval-item ids case-sensitively anywhere, a matching guard there should be tracked under PC-4688 — flagging for @maria so the "normalize IDs" decision is covered end-to-end.

🤖 Generated with Claude Code

Eval items whose id is an uppercase GUID stalled the cloud eval runner after
the first batch. The backend canonicalizes GUIDs to lowercase, so case-sensitive
id correlation on the runtime side (selection via extract_selected_evals, plus
span/cache keying derived from eval_item.id) never matched the backend-stored
ids. Normalize GUID-form ids to canonical lowercase on EvaluationItem and
LegacyEvaluationItem, and normalize the incoming ids in extract_selected_evals,
so selection and correlation are casing-agnostic. Non-GUID ids are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 18:32
@github-actions github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-integrations labels Jun 5, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 227da8c472

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@classmethod
def _normalize_id(cls, value: str) -> str:
"""Normalize GUID ids to canonical lowercase."""
return normalize_eval_id(value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize override keys when lowercasing eval IDs

When an eval set contains an uppercase GUID and the caller provides --input-overrides keyed by that ID copied from the JSON, this validator changes eval_item.id to lowercase while the override maps are left untouched. The runtime later calls apply_input_overrides(..., eval_id=eval_item.id) and the legacy conversational migration does input_overrides.get(evaluation.id, {}), so those uppercase-keyed overrides are silently skipped in the same scenario this patch targets. Please canonicalize the override keys at ingestion/lookup as well.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Normalizes evaluation item IDs at ingestion to prevent eval-run batching/correlation stalls when eval-set JSON contains uppercase GUID IDs (PC-4688), aligning SDK/runtime behavior with backend canonicalization.

Changes:

  • Added normalize_eval_id() and applied it via Pydantic field_validator("id") to normalize GUID-shaped eval item IDs to lowercase.
  • Normalized incoming eval_ids in extract_selected_evals() (both current and legacy sets) to make selection casing-agnostic for GUID IDs.
  • Added regression tests covering GUID normalization, non-GUID pass-through, and selection behavior.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
packages/uipath/src/uipath/eval/models/evaluation_set.py Introduces GUID normalization for eval item IDs and normalizes selection IDs for casing-insensitive GUID matching.
packages/uipath/tests/cli/eval/test_eval_id_casing.py Adds tests ensuring uppercase GUID IDs normalize at ingestion and selection works regardless of caller casing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +25 to +32
def normalize_eval_id(value: str) -> str:
"""Canonicalize a GUID id to lowercase; leave non-GUID ids unchanged.

GUIDs are case-insensitive, but downstream correlation (selection,
span/cache keying) compares ids as plain strings, so a mixed-case id
must be normalized at ingestion to stay matchable.
"""
return value.lower() if isinstance(value, str) and _GUID_RE.match(value) else value
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

🚨 Heads up: uipath-integrations cross-tests are FAILING 🚨

Your changes may break one or more integrations in uipath-integrations-python:

  • uipath-openai-agents
  • uipath-google-adk
  • uipath-agent-framework
  • uipath-llamaindex
  • uipath-pydantic-ai

⚠️ These checks are NOT enforced by branch protection rules. Please review the failures before merging.

🔍 Inspect the failed run →

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-integrations test:uipath-langchain Triggers tests in the uipath-langchain-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants