fix(eval): normalize uppercase GUID eval ids at ingestion (PC-4688)#1702
fix(eval): normalize uppercase GUID eval ids at ingestion (PC-4688)#1702Chibionos wants to merge 2 commits into
Conversation
Eval items whose id is an uppercase GUID stalled the cloud eval runner after the first batch. The backend canonicalizes GUIDs to lowercase, so case-sensitive id correlation on the runtime side (selection via extract_selected_evals, plus span/cache keying derived from eval_item.id) never matched the backend-stored ids. Normalize GUID-form ids to canonical lowercase on EvaluationItem and LegacyEvaluationItem, and normalize the incoming ids in extract_selected_evals, so selection and correlation are casing-agnostic. Non-GUID ids are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 227da8c472
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @classmethod | ||
| def _normalize_id(cls, value: str) -> str: | ||
| """Normalize GUID ids to canonical lowercase.""" | ||
| return normalize_eval_id(value) |
There was a problem hiding this comment.
Normalize override keys when lowercasing eval IDs
When an eval set contains an uppercase GUID and the caller provides --input-overrides keyed by that ID copied from the JSON, this validator changes eval_item.id to lowercase while the override maps are left untouched. The runtime later calls apply_input_overrides(..., eval_id=eval_item.id) and the legacy conversational migration does input_overrides.get(evaluation.id, {}), so those uppercase-keyed overrides are silently skipped in the same scenario this patch targets. Please canonicalize the override keys at ingestion/lookup as well.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Normalizes evaluation item IDs at ingestion to prevent eval-run batching/correlation stalls when eval-set JSON contains uppercase GUID IDs (PC-4688), aligning SDK/runtime behavior with backend canonicalization.
Changes:
- Added
normalize_eval_id()and applied it via Pydanticfield_validator("id")to normalize GUID-shaped eval item IDs to lowercase. - Normalized incoming
eval_idsinextract_selected_evals()(both current and legacy sets) to make selection casing-agnostic for GUID IDs. - Added regression tests covering GUID normalization, non-GUID pass-through, and selection behavior.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/uipath/src/uipath/eval/models/evaluation_set.py | Introduces GUID normalization for eval item IDs and normalizes selection IDs for casing-insensitive GUID matching. |
| packages/uipath/tests/cli/eval/test_eval_id_casing.py | Adds tests ensuring uppercase GUID IDs normalize at ingestion and selection works regardless of caller casing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def normalize_eval_id(value: str) -> str: | ||
| """Canonicalize a GUID id to lowercase; leave non-GUID ids unchanged. | ||
|
|
||
| GUIDs are case-insensitive, but downstream correlation (selection, | ||
| span/cache keying) compares ids as plain strings, so a mixed-case id | ||
| must be normalized at ingestion to stay matchable. | ||
| """ | ||
| return value.lower() if isinstance(value, str) and _GUID_RE.match(value) else value |
🚨 Heads up:
|
|



Summary
Fixes PC-4688 — eval runs stall after the first batch (batch size 10) when eval items have uppercase GUID
ids (e.g.B063907C-…); the run sits on Running and errors at the run timeout. Lowercase ids work. Reported by FDE (Brian Burke); the customer workaround is to lowercase theids in the eval-set JSON.Root cause
GUIDs are case-insensitive, but the eval runtime correlates ids as plain strings. The backend canonicalizes GUIDs to lowercase, while the SDK/runtime preserves whatever casing the eval-set JSON contains. Every id-keyed correlation derives from
eval_item.id:EvaluationSet.extract_selected_evals(eval_ids)— case-sensitive set membership (the--eval-idsselection the cloud orchestrator passes per batch),execution_id/ spanexecution.id/eval_item_idand the progress-reportereval_run_idscache — allstr(eval_item.id).When the casing of the JSON id and the backend-normalized id diverge, these lookups silently miss, so the runner can't reconcile completed work and stops making progress.
Note: the codebase already guards evaluator ids with
StringComparer.OrdinalIgnoreCaseon the backend — eval-item ids were simply missed.Fix
Normalize at the ingestion boundary (the proven-good lowercase state), so all downstream correlation is casing-agnostic:
normalize_eval_id()lowercases a value only when it matches the canonical 8-4-4-4-12 GUID form; non-GUID ids (slugs liketest-eval-1) are left untouched.field_validator("id")onEvaluationItemandLegacyEvaluationItem, soidis canonical everywhere it's read (selection, execution_id, span/cache keys).extract_selected_evalsnormalizes the incomingeval_idstoo, so selection matches regardless of the caller's casing.Tests
tests/cli/eval/test_eval_id_casing.py:EvaluationItemandLegacyEvaluationItem),extract_selected_evalsmatches when called with an uppercase GUID against a normalized set.Full eval suites pass; ruff + format clean.
Scope / follow-up
This normalizes the
uipathSDK /uipath evalruntime ingestion. If the .NET cloud orchestration also compares eval-item ids case-sensitively anywhere, a matching guard there should be tracked under PC-4688 — flagging for @maria so the "normalize IDs" decision is covered end-to-end.🤖 Generated with Claude Code