test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks by Stefan-Virgil · Pull Request #1447 · UiPath/skills

Stefan-Virgil · 2026-06-12T12:23:13Z

What

Four e2e / mode:diagnose coder-eval tasks for the Word Add Picture (WordAddImage) failure family, one per branch of the add-picture-failures.md playbook added in #1439:

Task	Branch	Root cause
`word-addpicture-missing-scope`	C1	`Add Picture` is a sibling of (outside) the `Use Word File` scope → no open document
`word-addpicture-com-interop`	C2	Environmental `TYPE_E_LIBNOTREGISTERED` (HRESULT `0x8002801D`) on the robot host — workflow is correct
`word-addpicture-bookmark-missing`	C3	`InsertRelativeTo="Bookmark"` but the runtime document lacks the bookmark
`word-addpicture-image-variable`	C4	`ImagePath` bound to a `UiPath.Core.Image` via `.ToString()` → opens a file literally named `UiPath.Core.Image`

Each grades skill_triggered (w 1.0) + a branch-specific llm_judge (w 3.0, pass_threshold 0.7) against a RESOLUTION.md ground truth, with wrong-branch answers capped at 0.5.

Passing run

--repeats 2 -j 3, claude-sonnet-4-6 coder, runs/2026-06-12_13-58-58:

4/4 tasks, 2/2 replicates each at weighted_score = 1.0.

An earlier run clipped two replicates at the under-spec max_turns: 45; normalizing all four to the troubleshoot standard task_timeout 5400 / max_turns 60 / turn_timeout 3600 produced the clean sweep.

Lint

/lint-task on all four: OK (0 Critical/High/Medium/Low). No self-report, no over-specification, distinct branches (good scaffold reuse), no command_executed verbs to check, sandbox is python: {} only, run_limits top-level.

Merge order

Depends on #1439. Base is the playbook branch so this diff is tests-only; GitHub will retarget to main once #1439 merges.

🤖 Generated with Claude Code

Update — Word process-crash task (RPC_E_WRONG_THREAD) + playbook split

Added a 5th task and refactored the COM branch into a package-level playbook (see companion commit on #1439):

Task	Branch	Root cause
`word-addpicture-word-crash`	E4	WINWORD.EXE crashes mid-insert; the `0x8001010E` (`RPC_E_WRONG_THREAD`) `InvalidCastException` to `Word._Document` on `WordInteropActivity.EndExecute` is a downstream symptom of the process crash. Workflow correct → diagnose the crash (faulting module / Office repair / bitness / orphaned WINWORD.EXE) and/or pre-resize the large image (`Add Picture` has no resize property).

The environmental COM family was lifted out of add-picture-failures.md (C2) into the new package-level word-com-interop-failures.md playbook (causes E1 type-library/class not registered, E2 bitness, E3 busy/blocked, E4 process crash), since it applies to all Word activities, not just Add Picture. The existing word-addpicture-com-interop task is repointed to that playbook (E1, reached via the delegating C2). verify_manifest_commands.py: 72/72 shapes valid. /lint-task and a fresh coder-eval run still pending.

Four e2e diagnose-mode coder-eval tasks, one per branch of the add-picture-failures playbook: - missing-scope (C1): Add Picture sits outside the Use Word File scope - com-interop (C2): environmental TYPE_E_LIBNOTREGISTERED on the robot host - bookmark-missing (C3): InsertRelativeTo=Bookmark, bookmark absent at runtime - image-variable (C4): ImagePath bound to a UiPath.Core.Image via .ToString() Passing run: 4/4 tasks, 2/2 replicates each at score 1.0 (runs/2026-06-12_13-58-58, claude-sonnet-4-6 coder, --repeats 2 -j 3). run_limits normalized to the troubleshoot standard 5400/60/3600. Depends on #1439 (adds the playbook the llm_judge prompts reference). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-12T12:23:27Z

Claude finished @Stefan-Virgil's task in 3m 25s —— View job

PR Review: test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks

Read review criteria and project conventions
Analyze full diff against base branch
Review all 4 task.yaml files in detail
Review all 4 RESOLUTION.md files
Review fixture/mock files for consistency
Review process files (Main.xaml, project.json)
Check for cross-task consistency and anti-patterns
Post final review

Summary

Adds four e2e / mode:diagnose coder-eval tasks for the Word Add Picture (WordAddImage) failure family, one per branch (C1–C4) of the add-picture-failures.md playbook. Each task provides synthetic Orchestrator mock fixtures, a process snapshot with the seeded defect, and an LLM-judge rubric graded against a detailed RESOLUTION.md ground truth. Well-structured, internally consistent, and follows the troubleshoot test CLAUDE.md conventions closely.

Change-by-Change Review

1. `word-addpicture-missing-scope/task.yaml` — C1: Activity outside Use Word File scope

Severity: OK

Tags include required uipath-troubleshoot, e2e, mode:diagnose, lifecycle:discover. run_limits correctly set to the troubleshoot standard (task_timeout: 5400, max_turns: 60, turn_timeout: 3600). Success criteria follow the lean skill_triggered + llm_judge pattern per CLAUDE.md. Judge prompt grades on presentation (Dimension A: playbook match, Dimension B: root cause + fix), with correct wrong-branch cap at 0.5. Simulation block is well-configured. initial_prompt is appropriately minimal — presents the error and asks "why", without hand-holding.

2. `word-addpicture-com-interop/task.yaml` — C2: Environmental COM interop fault

Severity: OK

Same solid structure. The judge prompt correctly emphasizes that trying to "fix" the XAML is the wrong answer (cap at 0.5) — the workflow is correct and the fault is environmental. The Main.xaml process snapshot correctly shows a structurally valid workflow (Add Picture nested in Use Word File, valid absolute path, InsertRelativeTo=Document), which matches the scenario requirement that the agent must conclude "nothing to fix in code."

3. `word-addpicture-bookmark-missing/task.yaml` — C3: Bookmark not found

Severity: OK

Judge rubric correctly distinguishes the bookmark-not-found cause from the image-file-missing misdiagnosis. The Main.xaml shows InsertRelativeTo="Bookmark" BookmarkName="LogoAnchor" and the mock logs show the BusinessException trace at the right point in the timeline.

4. `word-addpicture-image-variable/task.yaml` — C4: Image variable bound to ImagePath

Severity: OK

The Main.xaml correctly includes the defect: ImagePath="[screenshotImage.ToString()]" with screenshotImage declared as Variable x:TypeArguments="ui:Image" and the TakeScreenshot activity producing it. The mock logs show the resolved path with the telltale UiPath.Core.Image filename segment. The project.json correctly includes the additional UiPath.UIAutomation.Activities dependency (for TakeScreenshot) that the other three tasks don't need.

5. All 4 RESOLUTION.md files

Severity: OK

Thorough and well-structured. Each includes: root cause, what went wrong (timeline), why (mechanism), evidence (Orchestrator + logs + source), cross-check section explicitly ruling out the other three branches, recommended fix (with prevention), and investigation summary table. The cross-check sections are a strong design choice — they give the LLM judge clear negative signals to penalize wrong-branch conclusions.

6. All 4 `fixtures/mocks/responses/manifest.json`

Severity: OK

Identical manifest structure across all four tasks: docsai ask → passthrough (correct per CLAUDE.md), then canned rules for or folders list, or jobs list, or jobs logs, or jobs history, or jobs get. Permissive unmocked_default returns empty array. The _doc in each manifest accurately describes the scenario and expected investigation chain.

7. All 4 `fixtures/mocks/responses/or-jobs-*.json`

Severity: OK

Each scenario uses distinct, non-colliding job keys (aa111111-*, bb222222-*, cc335555-*, dd444444-*), process names, and folder keys. Log timestamps are realistic and ordered. Error messages match both the initial_prompt in the task YAML and the RESOLUTION.md evidence sections. Host is consistently MOCK-HOST, robot AUTOMATION1 — properly scrubbed.

8. All 4 `process/` directories

Severity: OK

Each contains Main.xaml, project.json, entry-points.json, project.uiproj. The Main.xaml files faithfully reproduce the specific defect for their branch. project.json files use realistic structure and appropriate dependency versions.

9. Tags — new values outside closed vocabulary

Severity: Low

The tags word-activities, add-picture, bookmark-not-found, com-interop, image-variable, missing-scope are free-form tags not listed in the README's closed tag taxonomy. However, this follows the same convention as existing troubleshoot tasks (e.g., excel-activities, read-range, file-locked, null-reference in the excel-rr-* family). The README says "Propose new values in the PR" — the PR implicitly proposes these by using them across 4 tasks (well above the 2-task minimum), but the PR description doesn't explicitly call them out as new tag proposals.

This is consistent with prior art and the tags are useful for filtering (word-activities + add-picture slices all four tasks, then the branch tag narrows to one). No action needed.

10. `expected_skill` vs `expected: "yes"` on `skill_triggered`

Severity: Low

All four tasks use expected_skill: "uipath-troubleshoot" on the skill_triggered criterion, while the README example shows expected: "yes". This is consistent with all existing troubleshoot tasks (68+ tasks use expected_skill), so it's an established pattern in this test suite. The passing run confirms the framework accepts it.

What's Missing

Missing rpa tag: The troubleshoot test CLAUDE.md says rpa should be applied for "Anything touching an activity package or .xaml workflow — default for any activity-package-related failure." These four tasks exercise Word .xaml workflows but don't include rpa. However, the recent excel-rr-* tasks also omit rpa while using the same mode:diagnose pattern, so this appears to be an evolving convention. Not blocking. Add rpa tag →

Area Ratings

Area	Status	Notes
Frontmatter	N/A	Tasks, not skills — no SKILL.md frontmatter
E2E Tests	OK	4 well-crafted e2e tasks, one per playbook branch; passing run claimed
Task Body	OK	Lean criteria (`skill_triggered` + `llm_judge` only), correct `run_limits`, minimal prompts
References & Assets	OK	Fixtures, process snapshots, and RESOLUTION.md are thorough and internally consistent
Repo Hygiene	OK	CODEOWNERS covers the path; no secrets/PII; all host/robot names scrubbed; no cross-skill references

Issues for Manual Review

Domain accuracy of mock error messages and stack traces: The synthetic error messages (e.g., UiPath.Word.Activities.WordAddImage.ResolveInsertLocation, WordApplicationScopeRuntime.EnsureWordApplication) and stack frames are plausible but fabricated for the test scenario. If these don't match real Word Activities error output, agents that have seen real errors may be confused by the mismatch. Only verifiable by someone who has seen these errors in production.
docsai ask passthrough: All four manifests proxy docsai ask to the real CLI, which requires valid uip auth at test runtime. Confirm CI runners have this configured.

Conclusion

Clean, well-structured PR. All four tasks follow the troubleshoot test conventions closely: lean criteria, detailed ground truths, correct mock structures, and faithful process snapshots that reproduce each defect. The tag taxonomy note and missing rpa tag are both Low — neither blocks merge. No changes requested.

github-actions · 2026-06-12T12:23:27Z

Claude finished @Stefan-Virgil's task in 2m 0s —— View job

Coder-eval task lint (advisory)

4 task YAMLs changed; verdicts: 0 Critical, 0 High, 0 Medium, 0 Low, 4 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

✅ Author confirms passing run: "4/4 tasks, 2/2 replicates each at weighted_score = 1.0." (--repeats 2 -j 3, claude-sonnet-4-6 coder, runs/2026-06-12_13-58-58)

Per-task lint

`tests/tasks/uipath-troubleshoot/word-addpicture-bookmark-missing/task.yaml` — verdict: OK

`tests/tasks/uipath-troubleshoot/word-addpicture-com-interop/task.yaml` — verdict: OK

`tests/tasks/uipath-troubleshoot/word-addpicture-image-variable/task.yaml` — verdict: OK

`tests/tasks/uipath-troubleshoot/word-addpicture-missing-scope/task.yaml` — verdict: OK

Within-PR duplicates

No duplicate clusters detected.

All 4 tasks share the same YAML scaffold (sandbox config, skill_triggered + llm_judge criteria shape, simulation block, run_limits) but each exercises a materially distinct branch of the add-picture-failures.md playbook — C1 (missing scope / InvalidOperationException), C2 (COM interop / TYPE_E_LIBNOTREGISTERED), C3 (bookmark not found / BusinessException), C4 (image variable / FileNotFoundException for literal UiPath.Core.Image). Each has a unique error message, distinct root cause, different fix, and branch-specific wrong-answer caps in the judge rubric. This is good scaffold reuse, not duplication.

Conclusion

✅ All changed tasks pass the rubric. Evidence of passing run confirmed.

Notes:

The skill_triggered + llm_judge pattern (with no command_executed / file_exists) is the mandated pattern for troubleshoot scenarios — the llm_judge has full ground truth via include_reference: true against each task's RESOLUTION.md and every judge prompt caps wrong-branch answers at 0.5 (below the 0.7 pass_threshold).
No command_executed criteria → CLI verb reachability axis is N/A.
No node: / env_packages in sandbox → no redundant CLI install issue.
run_limits correctly placed at top level (not under agent:).

…) diagnosis task Add word-addpicture-word-crash e2e/mode:diagnose task: WINWORD.EXE crashes mid-insert and the job faults with InvalidCastException to Word._Document / 0x8001010E (RPC_E_WRONG_THREAD) on WordInteropActivity.EndExecute. Workflow is correct; the COM error is a downstream symptom of the process crash. Ground truth = environmental Word crash (capture faulting module, repair Office, bitness, orphaned WINWORD.EXE) and/or pre-resize the large image (Add Picture has no resize property) — not an XAML edit. Grades against the new word-com-interop-failures.md playbook (E4). Repoint the existing word-addpicture-com-interop task to the same package-level playbook (E1, reached via add-picture-failures.md C2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Stefan-Virgil requested review from MarinRzv, costin-uipath, dmorosanu and vladimir-cozma as code owners June 12, 2026 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks#1447

test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks#1447
Stefan-Virgil wants to merge 2 commits into
docs/uipath-troubleshoot-word-add-picture-playbookfrom
test/uipath-troubleshoot-word-addpicture-tasks

Stefan-Virgil commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Stefan-Virgil commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Passing run

Lint

Merge order

Update — Word process-crash task (RPC_E_WRONG_THREAD) + playbook split

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks

Summary

Change-by-Change Review

1. word-addpicture-missing-scope/task.yaml — C1: Activity outside Use Word File scope

2. word-addpicture-com-interop/task.yaml — C2: Environmental COM interop fault

3. word-addpicture-bookmark-missing/task.yaml — C3: Bookmark not found

4. word-addpicture-image-variable/task.yaml — C4: Image variable bound to ImagePath

5. All 4 RESOLUTION.md files

6. All 4 fixtures/mocks/responses/manifest.json

7. All 4 fixtures/mocks/responses/or-jobs-*.json

8. All 4 process/ directories

9. Tags — new values outside closed vocabulary

10. expected_skill vs expected: "yes" on skill_triggered

What's Missing

Area Ratings

Issues for Manual Review

Conclusion

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coder-eval task lint (advisory)

Evidence of passing run

Per-task lint

tests/tasks/uipath-troubleshoot/word-addpicture-bookmark-missing/task.yaml — verdict: OK

tests/tasks/uipath-troubleshoot/word-addpicture-com-interop/task.yaml — verdict: OK

tests/tasks/uipath-troubleshoot/word-addpicture-image-variable/task.yaml — verdict: OK

tests/tasks/uipath-troubleshoot/word-addpicture-missing-scope/task.yaml — verdict: OK

Within-PR duplicates

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Stefan-Virgil commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading

1. `word-addpicture-missing-scope/task.yaml` — C1: Activity outside Use Word File scope

2. `word-addpicture-com-interop/task.yaml` — C2: Environmental COM interop fault

3. `word-addpicture-bookmark-missing/task.yaml` — C3: Bookmark not found

4. `word-addpicture-image-variable/task.yaml` — C4: Image variable bound to ImagePath

6. All 4 `fixtures/mocks/responses/manifest.json`

7. All 4 `fixtures/mocks/responses/or-jobs-*.json`

8. All 4 `process/` directories

10. `expected_skill` vs `expected: "yes"` on `skill_triggered`

github-actions Bot commented Jun 12, 2026 •

edited

Loading

`tests/tasks/uipath-troubleshoot/word-addpicture-bookmark-missing/task.yaml` — verdict: OK

`tests/tasks/uipath-troubleshoot/word-addpicture-com-interop/task.yaml` — verdict: OK

`tests/tasks/uipath-troubleshoot/word-addpicture-image-variable/task.yaml` — verdict: OK

`tests/tasks/uipath-troubleshoot/word-addpicture-missing-scope/task.yaml` — verdict: OK