Skip to content

fix: guard against IndexError when LLM API returns empty choices list#1876

Open
qizwiz wants to merge 1 commit into
microsoft:mainfrom
qizwiz:fix/llm-response-empty-choices-crash
Open

fix: guard against IndexError when LLM API returns empty choices list#1876
qizwiz wants to merge 1 commit into
microsoft:mainfrom
qizwiz:fix/llm-response-empty-choices-crash

Conversation

@qizwiz
Copy link
Copy Markdown

@qizwiz qizwiz commented May 14, 2026

Problem

Three places in markitdown call response.choices[0].message.content immediately after client.chat.completions.create(...) without checking whether choices is non-empty:

  • packages/markitdown/src/markitdown/converters/_image_converter.py:138
  • packages/markitdown/src/markitdown/converters/_llm_caption.py:50
  • packages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py:102

The OpenAI API (and OpenAI-compatible providers) can return an empty choices list in three documented scenarios:

  1. Content filtering — when the image triggers a policy violation, the API returns finish_reason: "content_filter" with an empty choices list
  2. Streaming edge cases — SSE stream closed before any choices are emitted
  3. OpenAI-compatible providers — local LLMs, proxies, and alternative providers may return non-standard response shapes

In all three cases, choices[0] raises IndexError: list index out of range. This crash is silent in development (dev images don't hit content filters) and surfaces in production on real user content.

Formal verification

This was found via pact static analysis and formally verified with Z3 SMT:

Bug model (SAT): content_filtered=True → choices_len=0, access_index=0 → 0≥0 → IndexError
Fix model (UNSAT): With if not response.choices guard, access_attempted ∧ choices_len=0 is a contradiction — IndexError is unreachable on all trigger paths.

Fix

# _image_converter.py and _llm_caption.py
response = client.chat.completions.create(model=model, messages=messages)
if not response.choices:
    return None
return response.choices[0].message.content

# _ocr_service.py (inline — consistent with existing `text or ""` guard below)
text = response.choices[0].message.content if response.choices else None

The _ocr_service.py path already has a bare except Exception that returns OCRResult(text="") on failure, so the None propagates safely through the existing text.strip() if text else "" guard on the next line.

Prior art

This exact crash pattern appears in multiple open issues across the LLM ecosystem: plastic-labs/honcho#676, aden-hive/hive#4767, TheR1D/shell_gpt#741, langchain-community#475.

@qizwiz
Copy link
Copy Markdown
Author

qizwiz commented May 14, 2026

@microsoft-github-policy-service agree

The OpenAI API can return an empty choices list when:
- Content filtering blocks the image response
- A streaming edge case closes before choices are emitted
- An OpenAI-compatible provider returns a non-standard response shape

In all three cases, `response.choices[0]` raises IndexError. This is a
silent crash in production — content filters fire on real user images,
not on dev test images, so the bug is invisible in local testing.

Three affected paths:
- _image_converter.py: return None when no choices (caller handles None)
- _llm_caption.py: return None when no choices (caller handles None)
- _ocr_service.py: inline ternary, consistent with existing `text or ""`
  guard already on the next line

Formally verified: Z3 SMT solver proves IndexError is satisfiable under
content_filtered=True → choices_len=0 (SAT), and proves the guard makes
it UNSAT — no assignment produces IndexError after the check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@qizwiz qizwiz force-pushed the fix/llm-response-empty-choices-crash branch from 5719e76 to 9f80bf3 Compare May 14, 2026 14:06
@qizwiz
Copy link
Copy Markdown
Author

qizwiz commented May 17, 2026

Note: second crash vector discovered

Testing against live providers found that Gemini 2.5 Flash returns HTTP 200 with choices[0].message = None (not an empty choices list) when content is filtered — finish_reason: PROHIBITED_CONTENT. This causes AttributeError: 'NoneType' object has no attribute 'content' even when choices is non-empty.

If the files in this PR have if not response.choices: return style guards but then access .message.content unconditionally, they may still be vulnerable to this path. The comprehensive guard is:

if not response.choices or response.choices[0].message is None:
    return  # or handle appropriately

Happy to push an update if the current guards are incomplete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants