Skip to content

test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks#1447

Open
Stefan-Virgil wants to merge 2 commits into
docs/uipath-troubleshoot-word-add-picture-playbookfrom
test/uipath-troubleshoot-word-addpicture-tasks
Open

test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks#1447
Stefan-Virgil wants to merge 2 commits into
docs/uipath-troubleshoot-word-add-picture-playbookfrom
test/uipath-troubleshoot-word-addpicture-tasks

Conversation

@Stefan-Virgil

@Stefan-Virgil Stefan-Virgil commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What

Four e2e / mode:diagnose coder-eval tasks for the Word Add Picture (WordAddImage) failure family, one per branch of the add-picture-failures.md playbook added in #1439:

Task Branch Root cause
word-addpicture-missing-scope C1 Add Picture is a sibling of (outside) the Use Word File scope → no open document
word-addpicture-com-interop C2 Environmental TYPE_E_LIBNOTREGISTERED (HRESULT 0x8002801D) on the robot host — workflow is correct
word-addpicture-bookmark-missing C3 InsertRelativeTo="Bookmark" but the runtime document lacks the bookmark
word-addpicture-image-variable C4 ImagePath bound to a UiPath.Core.Image via .ToString() → opens a file literally named UiPath.Core.Image

Each grades skill_triggered (w 1.0) + a branch-specific llm_judge (w 3.0, pass_threshold 0.7) against a RESOLUTION.md ground truth, with wrong-branch answers capped at 0.5.

Passing run

--repeats 2 -j 3, claude-sonnet-4-6 coder, runs/2026-06-12_13-58-58:

4/4 tasks, 2/2 replicates each at weighted_score = 1.0.

An earlier run clipped two replicates at the under-spec max_turns: 45; normalizing all four to the troubleshoot standard task_timeout 5400 / max_turns 60 / turn_timeout 3600 produced the clean sweep.

Lint

/lint-task on all four: OK (0 Critical/High/Medium/Low). No self-report, no over-specification, distinct branches (good scaffold reuse), no command_executed verbs to check, sandbox is python: {} only, run_limits top-level.

Merge order

Depends on #1439. Base is the playbook branch so this diff is tests-only; GitHub will retarget to main once #1439 merges.

🤖 Generated with Claude Code


Update — Word process-crash task (RPC_E_WRONG_THREAD) + playbook split

Added a 5th task and refactored the COM branch into a package-level playbook (see companion commit on #1439):

Task Branch Root cause
word-addpicture-word-crash E4 WINWORD.EXE crashes mid-insert; the 0x8001010E (RPC_E_WRONG_THREAD) InvalidCastException to Word._Document on WordInteropActivity.EndExecute is a downstream symptom of the process crash. Workflow correct → diagnose the crash (faulting module / Office repair / bitness / orphaned WINWORD.EXE) and/or pre-resize the large image (Add Picture has no resize property).

The environmental COM family was lifted out of add-picture-failures.md (C2) into the new package-level word-com-interop-failures.md playbook (causes E1 type-library/class not registered, E2 bitness, E3 busy/blocked, E4 process crash), since it applies to all Word activities, not just Add Picture. The existing word-addpicture-com-interop task is repointed to that playbook (E1, reached via the delegating C2). verify_manifest_commands.py: 72/72 shapes valid. /lint-task and a fresh coder-eval run still pending.

Four e2e diagnose-mode coder-eval tasks, one per branch of the
add-picture-failures playbook:
- missing-scope (C1): Add Picture sits outside the Use Word File scope
- com-interop (C2): environmental TYPE_E_LIBNOTREGISTERED on the robot host
- bookmark-missing (C3): InsertRelativeTo=Bookmark, bookmark absent at runtime
- image-variable (C4): ImagePath bound to a UiPath.Core.Image via .ToString()

Passing run: 4/4 tasks, 2/2 replicates each at score 1.0
(runs/2026-06-12_13-58-58, claude-sonnet-4-6 coder, --repeats 2 -j 3).
run_limits normalized to the troubleshoot standard 5400/60/3600.

Depends on #1439 (adds the playbook the llm_judge prompts reference).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Stefan-Virgil's task in 3m 25s —— View job


PR Review: test(uipath-troubleshoot): Word Add Picture failure diagnosis tasks

  • Read review criteria and project conventions
  • Analyze full diff against base branch
  • Review all 4 task.yaml files in detail
  • Review all 4 RESOLUTION.md files
  • Review fixture/mock files for consistency
  • Review process files (Main.xaml, project.json)
  • Check for cross-task consistency and anti-patterns
  • Post final review

Summary

Adds four e2e / mode:diagnose coder-eval tasks for the Word Add Picture (WordAddImage) failure family, one per branch (C1–C4) of the add-picture-failures.md playbook. Each task provides synthetic Orchestrator mock fixtures, a process snapshot with the seeded defect, and an LLM-judge rubric graded against a detailed RESOLUTION.md ground truth. Well-structured, internally consistent, and follows the troubleshoot test CLAUDE.md conventions closely.

Change-by-Change Review

1. word-addpicture-missing-scope/task.yaml — C1: Activity outside Use Word File scope

Severity: OK

Tags include required uipath-troubleshoot, e2e, mode:diagnose, lifecycle:discover. run_limits correctly set to the troubleshoot standard (task_timeout: 5400, max_turns: 60, turn_timeout: 3600). Success criteria follow the lean skill_triggered + llm_judge pattern per CLAUDE.md. Judge prompt grades on presentation (Dimension A: playbook match, Dimension B: root cause + fix), with correct wrong-branch cap at 0.5. Simulation block is well-configured. initial_prompt is appropriately minimal — presents the error and asks "why", without hand-holding.

2. word-addpicture-com-interop/task.yaml — C2: Environmental COM interop fault

Severity: OK

Same solid structure. The judge prompt correctly emphasizes that trying to "fix" the XAML is the wrong answer (cap at 0.5) — the workflow is correct and the fault is environmental. The Main.xaml process snapshot correctly shows a structurally valid workflow (Add Picture nested in Use Word File, valid absolute path, InsertRelativeTo=Document), which matches the scenario requirement that the agent must conclude "nothing to fix in code."

3. word-addpicture-bookmark-missing/task.yaml — C3: Bookmark not found

Severity: OK

Judge rubric correctly distinguishes the bookmark-not-found cause from the image-file-missing misdiagnosis. The Main.xaml shows InsertRelativeTo="Bookmark" BookmarkName="LogoAnchor" and the mock logs show the BusinessException trace at the right point in the timeline.

4. word-addpicture-image-variable/task.yaml — C4: Image variable bound to ImagePath

Severity: OK

The Main.xaml correctly includes the defect: ImagePath="[screenshotImage.ToString()]" with screenshotImage declared as Variable x:TypeArguments="ui:Image" and the TakeScreenshot activity producing it. The mock logs show the resolved path with the telltale UiPath.Core.Image filename segment. The project.json correctly includes the additional UiPath.UIAutomation.Activities dependency (for TakeScreenshot) that the other three tasks don't need.

5. All 4 RESOLUTION.md files

Severity: OK

Thorough and well-structured. Each includes: root cause, what went wrong (timeline), why (mechanism), evidence (Orchestrator + logs + source), cross-check section explicitly ruling out the other three branches, recommended fix (with prevention), and investigation summary table. The cross-check sections are a strong design choice — they give the LLM judge clear negative signals to penalize wrong-branch conclusions.

6. All 4 fixtures/mocks/responses/manifest.json

Severity: OK

Identical manifest structure across all four tasks: docsai ask → passthrough (correct per CLAUDE.md), then canned rules for or folders list, or jobs list, or jobs logs, or jobs history, or jobs get. Permissive unmocked_default returns empty array. The _doc in each manifest accurately describes the scenario and expected investigation chain.

7. All 4 fixtures/mocks/responses/or-jobs-*.json

Severity: OK

Each scenario uses distinct, non-colliding job keys (aa111111-*, bb222222-*, cc335555-*, dd444444-*), process names, and folder keys. Log timestamps are realistic and ordered. Error messages match both the initial_prompt in the task YAML and the RESOLUTION.md evidence sections. Host is consistently MOCK-HOST, robot AUTOMATION1 — properly scrubbed.

8. All 4 process/ directories

Severity: OK

Each contains Main.xaml, project.json, entry-points.json, project.uiproj. The Main.xaml files faithfully reproduce the specific defect for their branch. project.json files use realistic structure and appropriate dependency versions.

9. Tags — new values outside closed vocabulary

Severity: Low

The tags word-activities, add-picture, bookmark-not-found, com-interop, image-variable, missing-scope are free-form tags not listed in the README's closed tag taxonomy. However, this follows the same convention as existing troubleshoot tasks (e.g., excel-activities, read-range, file-locked, null-reference in the excel-rr-* family). The README says "Propose new values in the PR" — the PR implicitly proposes these by using them across 4 tasks (well above the 2-task minimum), but the PR description doesn't explicitly call them out as new tag proposals.

This is consistent with prior art and the tags are useful for filtering (word-activities + add-picture slices all four tasks, then the branch tag narrows to one). No action needed.

10. expected_skill vs expected: "yes" on skill_triggered

Severity: Low

All four tasks use expected_skill: "uipath-troubleshoot" on the skill_triggered criterion, while the README example shows expected: "yes". This is consistent with all existing troubleshoot tasks (68+ tasks use expected_skill), so it's an established pattern in this test suite. The passing run confirms the framework accepts it.

What's Missing

  • Missing rpa tag: The troubleshoot test CLAUDE.md says rpa should be applied for "Anything touching an activity package or .xaml workflow — default for any activity-package-related failure." These four tasks exercise Word .xaml workflows but don't include rpa. However, the recent excel-rr-* tasks also omit rpa while using the same mode:diagnose pattern, so this appears to be an evolving convention. Not blocking. Add rpa tag →

Area Ratings

Area Status Notes
Frontmatter N/A Tasks, not skills — no SKILL.md frontmatter
E2E Tests OK 4 well-crafted e2e tasks, one per playbook branch; passing run claimed
Task Body OK Lean criteria (skill_triggered + llm_judge only), correct run_limits, minimal prompts
References & Assets OK Fixtures, process snapshots, and RESOLUTION.md are thorough and internally consistent
Repo Hygiene OK CODEOWNERS covers the path; no secrets/PII; all host/robot names scrubbed; no cross-skill references

Issues for Manual Review

  • Domain accuracy of mock error messages and stack traces: The synthetic error messages (e.g., UiPath.Word.Activities.WordAddImage.ResolveInsertLocation, WordApplicationScopeRuntime.EnsureWordApplication) and stack frames are plausible but fabricated for the test scenario. If these don't match real Word Activities error output, agents that have seen real errors may be confused by the mismatch. Only verifiable by someone who has seen these errors in production.
  • docsai ask passthrough: All four manifests proxy docsai ask to the real CLI, which requires valid uip auth at test runtime. Confirm CI runners have this configured.

Conclusion

Clean, well-structured PR. All four tasks follow the troubleshoot test conventions closely: lean criteria, detailed ground truths, correct mock structures, and faithful process snapshots that reproduce each defect. The tag taxonomy note and missing rpa tag are both Low — neither blocks merge. No changes requested.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Stefan-Virgil's task in 2m 0s —— View job


Coder-eval task lint (advisory)

4 task YAMLs changed; verdicts: 0 Critical, 0 High, 0 Medium, 0 Low, 4 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

✅ Author confirms passing run: "4/4 tasks, 2/2 replicates each at weighted_score = 1.0." (--repeats 2 -j 3, claude-sonnet-4-6 coder, runs/2026-06-12_13-58-58)

Per-task lint

tests/tasks/uipath-troubleshoot/word-addpicture-bookmark-missing/task.yaml — verdict: OK

tests/tasks/uipath-troubleshoot/word-addpicture-com-interop/task.yaml — verdict: OK

tests/tasks/uipath-troubleshoot/word-addpicture-image-variable/task.yaml — verdict: OK

tests/tasks/uipath-troubleshoot/word-addpicture-missing-scope/task.yaml — verdict: OK

Within-PR duplicates

No duplicate clusters detected.

All 4 tasks share the same YAML scaffold (sandbox config, skill_triggered + llm_judge criteria shape, simulation block, run_limits) but each exercises a materially distinct branch of the add-picture-failures.md playbook — C1 (missing scope / InvalidOperationException), C2 (COM interop / TYPE_E_LIBNOTREGISTERED), C3 (bookmark not found / BusinessException), C4 (image variable / FileNotFoundException for literal UiPath.Core.Image). Each has a unique error message, distinct root cause, different fix, and branch-specific wrong-answer caps in the judge rubric. This is good scaffold reuse, not duplication.

Conclusion

✅ All changed tasks pass the rubric. Evidence of passing run confirmed.

Notes:

  • The skill_triggered + llm_judge pattern (with no command_executed / file_exists) is the mandated pattern for troubleshoot scenarios — the llm_judge has full ground truth via include_reference: true against each task's RESOLUTION.md and every judge prompt caps wrong-branch answers at 0.5 (below the 0.7 pass_threshold).
  • No command_executed criteria → CLI verb reachability axis is N/A.
  • No node: / env_packages in sandbox → no redundant CLI install issue.
  • run_limits correctly placed at top level (not under agent:).

…) diagnosis task

Add word-addpicture-word-crash e2e/mode:diagnose task: WINWORD.EXE crashes
mid-insert and the job faults with InvalidCastException to Word._Document /
0x8001010E (RPC_E_WRONG_THREAD) on WordInteropActivity.EndExecute. Workflow
is correct; the COM error is a downstream symptom of the process crash.
Ground truth = environmental Word crash (capture faulting module, repair
Office, bitness, orphaned WINWORD.EXE) and/or pre-resize the large image
(Add Picture has no resize property) — not an XAML edit. Grades against the
new word-com-interop-failures.md playbook (E4).

Repoint the existing word-addpicture-com-interop task to the same
package-level playbook (E1, reached via add-picture-failures.md C2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant