Skip to content

fix(agent-eval): bump LLM-judge default model off EOL claude-sonnet-4#965

Merged
leggetter merged 1 commit into
mainfrom
fix/eval-judge-model-eol
Jun 19, 2026
Merged

fix(agent-eval): bump LLM-judge default model off EOL claude-sonnet-4#965
leggetter merged 1 commit into
mainfrom
fix/eval-judge-model-eol

Conversation

@alexluong

Copy link
Copy Markdown
Collaborator

What

The agent-evaluation LLM-judge defaulted to claude-sonnet-4-20250514, which is deprecated and now returns 404 not_found_error. CI doesn't set EVAL_SCORE_MODEL, so it fell back to the dead default and every scenario failed:

Eval scenario failed (01-basics-curl.md): Error: Anthropic API 404: {"type":"error","error":{"type":"not_found_error","message":"model: claude-sonnet-4-20250514"}}

Change

Bump the default to claude-sonnet-4-6 in:

  • src/llm-judge.ts — the actual DEFAULT_SCORE_MODEL
  • .env.example — commented example
  • src/score-eval.ts — help text

Still overridable via EVAL_SCORE_MODEL.

🤖 Generated with Claude Code

The LLM-judge default was pinned to claude-sonnet-4-20250514, which is
deprecated and now returns a 404 not_found_error. CI doesn't set
EVAL_SCORE_MODEL, so it fell back to the dead default and every scenario
failed. Bump the default (and the .env.example / help-text references)
to claude-sonnet-4-6.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@leggetter

Copy link
Copy Markdown
Collaborator

Thanks @alexluong

@leggetter leggetter merged commit 41c95f9 into main Jun 19, 2026
2 of 3 checks passed
@leggetter leggetter deleted the fix/eval-judge-model-eol branch June 19, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants