UiPath · ajay-kesavan · May 21, 2026 · May 21, 2026 · May 22, 2026 · May 24, 2026
diff --git a/packages/uipath/pyproject.toml b/packages/uipath/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "uipath"
-version = "2.10.70"
+version = "2.10.72"
 description = "Python SDK and CLI for UiPath Platform, enabling programmatic interaction with automation services, process management, and deployment tools."
 readme = { file = "README.md", content-type = "text/markdown" }
 requires-python = ">=3.11"

diff --git a/packages/uipath/samples/classifier_demo/README.md b/packages/uipath/samples/classifier_demo/README.md
@@ -0,0 +1,139 @@
+# Classifier evaluator end-to-end demo
+
+A minimal intent-classification agent that exercises the new
+`ClassifierEvaluator` end-to-end. Use this as the test fixture for both
+SDK-only validation (Path A below) and Studio Web full-stack validation
+(Path B).
+
+## What's here
+
+```
+classifier_demo/
+├── main.py                       # 3-class keyword classifier
+├── uipath.json
+├── pyproject.toml
+├── bindings.json
+└── evaluations/
+    ├── eval-sets/
+    │   └── main.json             # 9 datapoints, 3 per class, some intentionally wrong
+    └── evaluators/
+        ├── intent_match.json     # per-datapoint ExactMatch on agent_output.intent
+        └── intent_classifier.json # the new uipath-classifier (pure metadata)
+```
+
+The eval set is wired so that for every datapoint both evaluators run:
+- `intent_match` produces a 1.0/0.0 score with `{"expected": "...", "actual": "..."}` justification.
+- `intent_classifier` produces a sentinel 0.0 score with `{"classes": [...], "source_evaluator": "intent_match"}` justification.
+
+Downstream (the C# layer in Studio Web) reads both to compute precision /
+recall / F-score across the dataset.
+
+> Heads-up — every datapoint must have an entry for the classifier in
+> `evaluationCriterias` (even an empty `{}`). The runtime currently skips
+> evaluators that aren't keyed in `evaluationCriterias` for a datapoint, so
+> omitting them silently drops the classifier results.
+
+## Path A — SDK only (real run, ~30 seconds)
+
+```bash
+cd packages/uipath
+uv sync --all-extras
+
+cd samples/classifier_demo
+uv run --project ../.. uipath eval main main.json --no-report --output-file /tmp/out.json
+```
+
+Expected: a results table with two columns (`intent_classifier`, `intent_match`).
+`intent_match` averages to 0.7 (6/9 correct). `intent_classifier` shows 0.0 per
+row by design — its real work is to ship the classes list to the backend.
+
+To see the metadata payload that lands in the backend's
+`CodedEvaluatorScore.Justification`:
+
+```bash
+python3 -c "
+import json
+with open('/tmp/out.json') as f: d = json.load(f)
+for r in d['evaluationSetResults'][0]['evaluationRunResults']:
+    print(r['evaluatorName'], r['result'].get('details'))
+"
+```
+
+You should see something like:
+
+```
+intent_classifier  {'expected': '', 'actual': '', 'classes': ['book', 'cancel', 'reschedule'], 'source_evaluator': 'intent_match'}
+intent_match       {'expected': 'book', 'actual': 'book'}
+```
+
+## Path B — Full Studio Web stack (real UI, click Run, see panel)
+
+Currently blocked on environment that I (the assistant who built this) didn't
+have available locally. The pieces:
+
+### Prereqs (per `Agents/LOCAL_DEVELOPMENT.md`)
+- Docker installed and running
+- `make` available
+- Azure CLI authenticated session (`az login`)
+- Azure DevOps PAT exported as `AZURE_DEVOPS_PAT`
+- GitHub NPM registry token exported as `GH_NPM_REGISTRY_TOKEN`
+- Azure access token exported as `AZURE_ACCESS_TOKEN` (for the python worker build)
+- `cloud-provider-kind` binary (used for the local KinD cluster)
+
+### Steps
+
+1. **Point python-eval-worker at the local SDK branch.** The published
+   `uipath` package on PyPI doesn't yet have `ClassifierEvaluator`. Edit
+   `Agents/python-eval-worker/pyproject.toml`:
+
+   ```toml
+   [tool.uv.sources]
+   uipath = { path = "../../uipath-python/packages/uipath", editable = true }
+   ```
+
+   Then `cd python-eval-worker && uv lock && uv sync`.
+
+2. **Bring up the local KinD cluster** (from `Agents/`):
+   ```bash
+   make create-kind-cluster
+   kubectl get nodes
+   sudo ./bin/cloud-provider-kind &      # in a separate shell or background
+   make up
+   make deploy
+   ```
+
+3. **Build the backend with the classifier changes:**
+   ```bash
+   git checkout feat/eval-classifier-backend       # in Agents repo
+   # Re-trigger the helm/skaffold deploy for the backend
+   make deploy
+   ```
+
+4. **Build the frontend with the UI changes:**
+   ```bash
+   git checkout feat/eval-dataset-evaluators-ui    # in Agents repo
+   # Same deploy command rebuilds frontend image
+   ```
+
+5. **Open Studio Web** (URL surfaced by the deploy output), create an agent
+   project, upload the eval-set + evaluator JSONs from this directory (or
+   author them in the UI — the picker now shows a "Classifier" entry under
+   the AGGREGATION section), and click Run.
+
+6. **Verify** the Aggregations panel renders between the run header and the
+   datapoint table, with the confusion matrix matching what Path A's Python
+   shim computes (macro F1 ≈ 0.667 on this fixture).
+
+### Open questions for the team owning local dev
+
+- Does the existing PAT / token set get refreshed automatically by the dev tooling, or do contributors need to rotate them periodically?
+- Is there a simpler "local-only" path that bypasses the KinD cluster (e.g. docker-compose) for changes that don't touch K8s manifests?
+- What's the standard pattern for pointing the python worker at a non-PyPI uipath build? The `[tool.uv.sources]` override above is the standard uv path — confirm there's no Helm/skaffold complication.
+
+## Companion PRs
+
+| Repo | Branch | PR | What |
+|---|---|---|---|
+| uipath-python | `feat/eval-classifier-evaluator` | [#1674](https://github.com/UiPath/uipath-python/pull/1674) | SDK `ClassifierEvaluator` |
+| Agents | `feat/eval-classifier-backend` | [#5313](https://github.com/UiPath/Agents/pull/5313) | C# math + activity + envelope storage |
+| Agents | `feat/eval-dataset-evaluators-ui` | [#5306](https://github.com/UiPath/Agents/pull/5306) | Frontend picker + Aggregations panel |
diff --git a/packages/uipath/samples/classifier_demo/bindings.json b/packages/uipath/samples/classifier_demo/bindings.json
@@ -0,0 +1,4 @@
+{
+  "version": "2.0",
+  "resources": []
+}
diff --git a/packages/uipath/samples/classifier_demo/evaluations/eval-sets/main.json b/packages/uipath/samples/classifier_demo/evaluations/eval-sets/main.json
@@ -0,0 +1,173 @@
+{
+  "version": "1.0",
+  "id": "classifier-demo-eval-set",
+  "name": "Classifier demo eval set",
+  "evaluatorRefs": [
+    "intent_match",
+    "intent_classifier"
+  ],
+  "evaluations": [
+    {
+      "id": "book-1",
+      "name": "book \u2014 straightforward",
+      "inputs": {
+        "utterance": "I want to book a table for two"
+      },
+      "expectedOutput": {
+        "intent": "book"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "book"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "book-2",
+      "name": "book \u2014 schedule keyword",
+      "inputs": {
+        "utterance": "Please schedule an appointment"
+      },
+      "expectedOutput": {
+        "intent": "book"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "book"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "book-3",
+      "name": "book \u2014 agent misclassifies (utterance triggers cancel keyword)",
+      "inputs": {
+        "utterance": "I had to cancel my last attempt but I want to reserve a slot now"
+      },
+      "expectedOutput": {
+        "intent": "book"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "book"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "cancel-1",
+      "name": "cancel \u2014 straightforward",
+      "inputs": {
+        "utterance": "Please cancel my reservation"
+      },
+      "expectedOutput": {
+        "intent": "cancel"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "cancel"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "cancel-2",
+      "name": "cancel \u2014 void synonym",
+      "inputs": {
+        "utterance": "I want to void the order"
+      },
+      "expectedOutput": {
+        "intent": "cancel"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "cancel"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "cancel-3",
+      "name": "cancel \u2014 agent misclassifies (utterance has 'move' which triggers reschedule)",
+      "inputs": {
+        "utterance": "I need to move past this and cancel everything"
+      },
+      "expectedOutput": {
+        "intent": "cancel"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "cancel"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "reschedule-1",
+      "name": "reschedule \u2014 straightforward",
+      "inputs": {
+        "utterance": "I want to reschedule the meeting"
+      },
+      "expectedOutput": {
+        "intent": "reschedule"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "reschedule"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "reschedule-2",
+      "name": "reschedule \u2014 move synonym",
+      "inputs": {
+        "utterance": "Can we move the slot to tomorrow"
+      },
+      "expectedOutput": {
+        "intent": "reschedule"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "reschedule"
+          }
+        },
+        "intent_classifier": {}
+      }
+    },
+    {
+      "id": "reschedule-3",
+      "name": "reschedule \u2014 agent misclassifies (falls through to default 'book')",
+      "inputs": {
+        "utterance": "Different timing please"
+      },
+      "expectedOutput": {
+        "intent": "reschedule"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "reschedule"
+          }
+        },
+        "intent_classifier": {}
+      }
+    }
+  ]
+}
diff --git a/packages/uipath/samples/classifier_demo/evaluations/evaluators/intent_classifier.json b/packages/uipath/samples/classifier_demo/evaluations/evaluators/intent_classifier.json
@@ -0,0 +1,11 @@
+{
+  "version": "1.0",
+  "id": "intent_classifier",
+  "description": "Classification aggregator. Pure metadata — carries the classes list + source evaluator name to downstream consumers (the C# backend computes precision/recall/F-score over the dataset). Per-datapoint result is a no-op carrying the metadata.",
+  "evaluatorTypeId": "uipath-classifier",
+  "evaluatorConfig": {
+    "name": "intent_classifier",
+    "classes": ["book", "cancel", "reschedule"],
+    "sourceEvaluator": "intent_match"
+  }
+}
diff --git a/packages/uipath/samples/classifier_demo/evaluations/evaluators/intent_match.json b/packages/uipath/samples/classifier_demo/evaluations/evaluators/intent_match.json
@@ -0,0 +1,15 @@
+{
+  "version": "1.0",
+  "id": "intent_match",
+  "description": "Per-datapoint ExactMatch on the agent's `intent` output. Produces expected/actual justification that the ClassifierEvaluator pipeline reads.",
+  "evaluatorTypeId": "uipath-exact-match",
+  "evaluatorConfig": {
+    "name": "intent_match",
+    "targetOutputKey": "intent",
+    "caseSensitive": false,
+    "negated": false,
+    "defaultEvaluationCriteria": {
+      "expectedOutput": "book"
+    }
+  }
+}
diff --git a/packages/uipath/samples/classifier_demo/main.py b/packages/uipath/samples/classifier_demo/main.py
@@ -0,0 +1,42 @@
+"""Tiny intent-classification agent for the ClassifierEvaluator demo.
+
+Given an utterance, returns the intent label. Three intents:
+  - book        (anything containing "book" / "reserve" / "schedule")
+  - cancel      (anything containing "cancel" / "void")
+  - reschedule  (anything containing "reschedule" / "move")
+
+A few datapoints are deliberately misclassified so the run-level
+classification metrics (precision/recall/F-score) come out non-trivially.
+"""
+
+from dataclasses import dataclass
+
+
+@dataclass
+class IntentInput:
+    utterance: str
+
+
+@dataclass
+class IntentOutput:
+    intent: str
+
+
+BOOK_KEYWORDS = {"book", "reserve", "schedule"}
+CANCEL_KEYWORDS = {"cancel", "void"}
+RESCHEDULE_KEYWORDS = {"reschedule", "move"}
+
+
+async def main(input: IntentInput) -> IntentOutput:
+    """Classify the utterance into book / cancel / reschedule."""
+    text = input.utterance.lower()
+    tokens = set(text.split())
+
+    if tokens & RESCHEDULE_KEYWORDS:
+        return IntentOutput(intent="reschedule")
+    if tokens & CANCEL_KEYWORDS:
+        return IntentOutput(intent="cancel")
+    if tokens & BOOK_KEYWORDS:
+        return IntentOutput(intent="book")
+    # Fallback to "book" — deliberately wrong-ish so the matrix is interesting.
+    return IntentOutput(intent="book")