fix(agent-eval): bump LLM-judge default model off EOL claude-sonnet-4 by alexluong · Pull Request #965 · hookdeck/outpost

alexluong · 2026-06-19T15:08:39Z

What

The agent-evaluation LLM-judge defaulted to claude-sonnet-4-20250514, which is deprecated and now returns 404 not_found_error. CI doesn't set EVAL_SCORE_MODEL, so it fell back to the dead default and every scenario failed:

Eval scenario failed (01-basics-curl.md): Error: Anthropic API 404: {"type":"error","error":{"type":"not_found_error","message":"model: claude-sonnet-4-20250514"}}

Change

Bump the default to claude-sonnet-4-6 in:

src/llm-judge.ts — the actual DEFAULT_SCORE_MODEL
.env.example — commented example
src/score-eval.ts — help text

Still overridable via EVAL_SCORE_MODEL.

🤖 Generated with Claude Code

The LLM-judge default was pinned to claude-sonnet-4-20250514, which is deprecated and now returns a 404 not_found_error. CI doesn't set EVAL_SCORE_MODEL, so it fell back to the dead default and every scenario failed. Bump the default (and the .env.example / help-text references) to claude-sonnet-4-6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

leggetter · 2026-06-19T15:15:27Z

Thanks @alexluong

alexbouchardd approved these changes Jun 19, 2026

View reviewed changes

leggetter merged commit 41c95f9 into main Jun 19, 2026
2 of 3 checks passed

leggetter deleted the fix/eval-judge-model-eol branch June 19, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent-eval): bump LLM-judge default model off EOL claude-sonnet-4#965

fix(agent-eval): bump LLM-judge default model off EOL claude-sonnet-4#965
leggetter merged 1 commit into
mainfrom
fix/eval-judge-model-eol

alexluong commented Jun 19, 2026

Uh oh!

leggetter commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexluong commented Jun 19, 2026

What

Change

Uh oh!

leggetter commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants