fix(eval): normalize uppercase GUID eval ids at ingestion (PC-4688) by Chibionos · Pull Request #1702 · UiPath/uipath-python

Chibionos · 2026-06-05T18:32:36Z

Summary

Fixes PC-4688 — eval runs stall after the first batch (batch size 10) when eval items have uppercase GUID ids (e.g. B063907C-…); the run sits on Running and errors at the run timeout. Lowercase ids work. Reported by FDE (Brian Burke); the customer workaround is to lowercase the ids in the eval-set JSON.

Root cause

GUIDs are case-insensitive, but the eval runtime correlates ids as plain strings. The backend canonicalizes GUIDs to lowercase, while the SDK/runtime preserves whatever casing the eval-set JSON contains. Every id-keyed correlation derives from eval_item.id:

EvaluationSet.extract_selected_evals(eval_ids) — case-sensitive set membership (the --eval-ids selection the cloud orchestrator passes per batch),
execution_id / span execution.id / eval_item_id and the progress-reporter eval_run_ids cache — all str(eval_item.id).

When the casing of the JSON id and the backend-normalized id diverge, these lookups silently miss, so the runner can't reconcile completed work and stops making progress.

Note: the codebase already guards evaluator ids with StringComparer.OrdinalIgnoreCase on the backend — eval-item ids were simply missed.

Fix

Normalize at the ingestion boundary (the proven-good lowercase state), so all downstream correlation is casing-agnostic:

normalize_eval_id() lowercases a value only when it matches the canonical 8-4-4-4-12 GUID form; non-GUID ids (slugs like test-eval-1) are left untouched.
Applied via a field_validator("id") on EvaluationItem and LegacyEvaluationItem, so id is canonical everywhere it's read (selection, execution_id, span/cache keys).
extract_selected_evals normalizes the incoming eval_ids too, so selection matches regardless of the caller's casing.

Tests

tests/cli/eval/test_eval_id_casing.py:

uppercase GUID id → stored lowercase (both EvaluationItem and LegacyEvaluationItem),
non-GUID id unchanged,
extract_selected_evals matches when called with an uppercase GUID against a normalized set.

Full eval suites pass; ruff + format clean.

Scope / follow-up

This normalizes the uipath SDK / uipath eval runtime ingestion. If the .NET cloud orchestration also compares eval-item ids case-sensitively anywhere, a matching guard there should be tracked under PC-4688 — flagging for @maria so the "normalize IDs" decision is covered end-to-end.

🤖 Generated with Claude Code

Eval items whose id is an uppercase GUID stalled the cloud eval runner after the first batch. The backend canonicalizes GUIDs to lowercase, so case-sensitive id correlation on the runtime side (selection via extract_selected_evals, plus span/cache keying derived from eval_item.id) never matched the backend-stored ids. Normalize GUID-form ids to canonical lowercase on EvaluationItem and LegacyEvaluationItem, and normalize the incoming ids in extract_selected_evals, so selection and correlation are casing-agnostic. Non-GUID ids are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 227da8c472

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T18:34:42Z

+    @classmethod
+    def _normalize_id(cls, value: str) -> str:
+        """Normalize GUID ids to canonical lowercase."""
+        return normalize_eval_id(value)


Normalize override keys when lowercasing eval IDs

When an eval set contains an uppercase GUID and the caller provides --input-overrides keyed by that ID copied from the JSON, this validator changes eval_item.id to lowercase while the override maps are left untouched. The runtime later calls apply_input_overrides(..., eval_id=eval_item.id) and the legacy conversational migration does input_overrides.get(evaluation.id, {}), so those uppercase-keyed overrides are silently skipped in the same scenario this patch targets. Please canonicalize the override keys at ingestion/lookup as well.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Normalizes evaluation item IDs at ingestion to prevent eval-run batching/correlation stalls when eval-set JSON contains uppercase GUID IDs (PC-4688), aligning SDK/runtime behavior with backend canonicalization.

Changes:

Added normalize_eval_id() and applied it via Pydantic field_validator("id") to normalize GUID-shaped eval item IDs to lowercase.
Normalized incoming eval_ids in extract_selected_evals() (both current and legacy sets) to make selection casing-agnostic for GUID IDs.
Added regression tests covering GUID normalization, non-GUID pass-through, and selection behavior.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File	Description
packages/uipath/src/uipath/eval/models/evaluation_set.py	Introduces GUID normalization for eval item IDs and normalizes selection IDs for casing-insensitive GUID matching.
packages/uipath/tests/cli/eval/test_eval_id_casing.py	Adds tests ensuring uppercase GUID IDs normalize at ingestion and selection works regardless of caller casing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def normalize_eval_id(value: str) -> str:
+    """Canonicalize a GUID id to lowercase; leave non-GUID ids unchanged.
+
+    GUIDs are case-insensitive, but downstream correlation (selection,
+    span/cache keying) compares ids as plain strings, so a mixed-case id
+    must be normalized at ingestion to stay matchable.
+    """
+    return value.lower() if isinstance(value, str) and _GUID_RE.match(value) else value


github-actions · 2026-06-05T18:35:57Z

🚨 Heads up: `uipath-integrations` cross-tests are FAILING 🚨

Your changes may break one or more integrations in uipath-integrations-python:

uipath-openai-agents
uipath-google-adk
uipath-agent-framework
uipath-llamaindex
uipath-pydantic-ai

⚠️ These checks are NOT enforced by branch protection rules. Please review the failures before merging.

🔍 Inspect the failed run →

sonarqubecloud · 2026-06-05T18:36:56Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
93.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings June 5, 2026 18:32

github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-integrations labels Jun 5, 2026

Copilot started reviewing on behalf of Chibionos June 5, 2026 18:32 View session

chore(eval): bump uipath to 2.10.78

6fe2b28

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): normalize uppercase GUID eval ids at ingestion (PC-4688)#1702

fix(eval): normalize uppercase GUID eval ids at ingestion (PC-4688)#1702
Chibionos wants to merge 2 commits into
mainfrom
fix/PC-4688-eval-id-casing

Chibionos commented Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

sonarqubecloud Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Chibionos commented Jun 5, 2026

Summary

Root cause

Fix

Tests

Scope / follow-up

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Jun 5, 2026

🚨 Heads up: uipath-integrations cross-tests are FAILING 🚨

Uh oh!

sonarqubecloud Bot commented Jun 5, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🚨 Heads up: `uipath-integrations` cross-tests are FAILING 🚨