fix: guard against IndexError when LLM API returns empty choices list by qizwiz · Pull Request #1876 · microsoft/markitdown

qizwiz · 2026-05-14T05:17:06Z

Problem

Three places in markitdown call response.choices[0].message.content immediately after client.chat.completions.create(...) without checking whether choices is non-empty:

packages/markitdown/src/markitdown/converters/_image_converter.py:138
packages/markitdown/src/markitdown/converters/_llm_caption.py:50
packages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py:102

The OpenAI API (and OpenAI-compatible providers) can return an empty choices list in three documented scenarios:

Content filtering — when the image triggers a policy violation, the API returns finish_reason: "content_filter" with an empty choices list
Streaming edge cases — SSE stream closed before any choices are emitted
OpenAI-compatible providers — local LLMs, proxies, and alternative providers may return non-standard response shapes

In all three cases, choices[0] raises IndexError: list index out of range. This crash is silent in development (dev images don't hit content filters) and surfaces in production on real user content.

Formal verification

This was found via pact static analysis and formally verified with Z3 SMT:

Bug model (SAT): content_filtered=True → choices_len=0, access_index=0 → 0≥0 → IndexError
Fix model (UNSAT): With if not response.choices guard, access_attempted ∧ choices_len=0 is a contradiction — IndexError is unreachable on all trigger paths.

Fix

# _image_converter.py and _llm_caption.py
response = client.chat.completions.create(model=model, messages=messages)
if not response.choices:
    return None
return response.choices[0].message.content

# _ocr_service.py (inline — consistent with existing `text or ""` guard below)
text = response.choices[0].message.content if response.choices else None

The _ocr_service.py path already has a bare except Exception that returns OCRResult(text="") on failure, so the None propagates safely through the existing text.strip() if text else "" guard on the next line.

Prior art

This exact crash pattern appears in multiple open issues across the LLM ecosystem: plastic-labs/honcho#676, aden-hive/hive#4767, TheR1D/shell_gpt#741, langchain-community#475.

qizwiz · 2026-05-14T05:32:38Z

@microsoft-github-policy-service agree

The OpenAI API can return an empty choices list when: - Content filtering blocks the image response - A streaming edge case closes before choices are emitted - An OpenAI-compatible provider returns a non-standard response shape In all three cases, `response.choices[0]` raises IndexError. This is a silent crash in production — content filters fire on real user images, not on dev test images, so the bug is invisible in local testing. Three affected paths: - _image_converter.py: return None when no choices (caller handles None) - _llm_caption.py: return None when no choices (caller handles None) - _ocr_service.py: inline ternary, consistent with existing `text or ""` guard already on the next line Formally verified: Z3 SMT solver proves IndexError is satisfiable under content_filtered=True → choices_len=0 (SAT), and proves the guard makes it UNSAT — no assignment produces IndexError after the check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

qizwiz · 2026-05-17T22:05:20Z

Note: second crash vector discovered

Testing against live providers found that Gemini 2.5 Flash returns HTTP 200 with choices[0].message = None (not an empty choices list) when content is filtered — finish_reason: PROHIBITED_CONTENT. This causes AttributeError: 'NoneType' object has no attribute 'content' even when choices is non-empty.

If the files in this PR have if not response.choices: return style guards but then access .message.content unconditionally, they may still be vulnerable to this path. The comprehensive guard is:

if not response.choices or response.choices[0].message is None:
    return  # or handle appropriately

Happy to push an update if the current guards are incomplete.

qizwiz force-pushed the fix/llm-response-empty-choices-crash branch from 5719e76 to 9f80bf3 Compare May 14, 2026 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard against IndexError when LLM API returns empty choices list#1876

fix: guard against IndexError when LLM API returns empty choices list#1876
qizwiz wants to merge 1 commit into
microsoft:mainfrom
qizwiz:fix/llm-response-empty-choices-crash

qizwiz commented May 14, 2026

Uh oh!

qizwiz commented May 14, 2026

Uh oh!

qizwiz commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qizwiz commented May 14, 2026

Problem

Formal verification

Fix

Prior art

Uh oh!

qizwiz commented May 14, 2026

Uh oh!

qizwiz commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants