Skip to content

[BUG] ArithmeticException integer overflow in BytesRestResponse causes silent 504 hang for responses >= ~715 MB #22311

Description

@asimmahmood1

Describe the bug

Two stacked bugs in the REST response path combine to produce a silent 5-minute hang (zero bytes flushed to client, 504 from the load balancer) whenever a search response body reaches ~715 MB.

Bug A — integer overflow in BytesRestResponse(String) serialization path

BytesRestResponse(RestStatus, String contentType, String content) calls new BytesArray(content), which calls new BytesRef(text), which calls UnicodeUtil.maxUTF8Length(text.length()). That method returns Math.multiplyExact(utf16Length, MAX_UTF8_BYTES_PER_CHAR) where MAX_UTF8_BYTES_PER_CHAR == 3. For ASCII-dominant JSON this overflows once text.length() > Integer.MAX_VALUE / 3 = 715_827_882 (~715 MB). The resulting ArithmeticException is thrown after the channel is half-closed, so no error response is flushed.

Upstream precedent: OS#1651 / PR #7963 fixed the same overflow on the request (ingest) side. The response side was never patched.

Bug B — non-idempotent close() in ResourceHandlingHttpChannel

RestController$ResourceHandlingHttpChannel.close() uses AtomicBoolean.compareAndSet(false, true) and throws IllegalStateException("Channel is already closed") when called a second time. When Bug A's exception is thrown mid-sendResponse, the first close() call half-closes the channel. RestActionListener.onFailure then tries to send the error response by calling sendResponse again, which re-enters close() and throws again. This leaves the underlying Netty channel dangling until the upstream load balancer times out (e.g. 300 s).

Related component

Search

To Reproduce

Bug A

// Reproducer (no large buffer allocation — overflow fires before any char access)
int overflowingLength = (Integer.MAX_VALUE / 3) + 1; // 715_827_883
CharSequence text = new CharSequence() {
    @Override public int length() { return overflowingLength; }
    @Override public char charAt(int i) { throw new AssertionError("not reached"); }
    @Override public CharSequence subSequence(int s, int e) { throw new UnsupportedOperationException(); }
};
assertThrows(ArithmeticException.class, () -> new BytesArray(new BytesRef(text)));

Bug B

// Proves that a second sendResponse throws instead of no-op'ing
restController.registerHandler(GET, "/repro-bug-b", (req, channel, c) -> {
    channel.sendResponse(new BytesRestResponse(OK, TEXT_CONTENT_TYPE, BytesArray.EMPTY));
    // second call mirrors RestActionListener.onFailure after Bug A fires
    assertThrows(IllegalStateException.class, () ->
        channel.sendResponse(new BytesRestResponse(INTERNAL_SERVER_ERROR, TEXT_CONTENT_TYPE, BytesArray.EMPTY)));
});

Both reproducer tests are included in the companion PR.

Expected behavior

Bug A: Pre-check the response size before crossing into Lucene's UnicodeUtil.maxUTF8Length. If (long) content.length() * 3 > Integer.MAX_VALUE, fail with a typed exception that maps to a clean HTTP 413/507 error response — not a raw ArithmeticException thrown mid-write. Mirror the pattern from PR #7963 (request side).

Bug B: ResourceHandlingHttpChannel.close() should be idempotent — a second call should be a no-op, not throw IllegalStateException. This allows the error path in RestActionListener.onFailure to reach the client gracefully even if the success path already closed the channel.

Additional Details

Observed failure mode (production stack trace, OS 3.1):

[WARN ][r.suppressed] path: /<index>/_search
java.lang.ArithmeticException: integer overflow
  at java.lang.Math.multiplyExact(Math.java:992)
  at org.apache.lucene.util.UnicodeUtil.maxUTF8Length(UnicodeUtil.java:676)
  at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:80)
  at org.opensearch.core.common.bytes.BytesArray.<init>(BytesArray.java:56)
  at org.opensearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:89)
  at ...HttpResponseChannel.sendResponse(HttpResponseAdapter.java:140)
  at AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:773)
  at ExpandSearchPhase.run(ExpandSearchPhase.java:132)

[ERROR][o.o.r.a.RestResponseListener] failed to send failure response
java.lang.IllegalStateException: Channel is already closed
  at RestController$ResourceHandlingHttpChannel.close(RestController.java:648)
  at RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:641)
  at RestActionListener.onFailure(RestActionListener.java:88)
  ...
  Suppressed: java.lang.ArithmeticException: integer overflow  (Bug A, above)

Cliff value: utf16Length > Integer.MAX_VALUE / 3 = 715_827_882 (~715 MB ASCII JSON). Confirmed on cluster with 74x m7i.4xlarge data nodes; responses <=650 MB succeed (~20 s), responses >=700 MB hang the full 300 s load-balancer timeout with zero bytes flushed to the client.

Version: OS 3.1 (stack trace); both code paths are present in current main.

Related issues:

Metadata

Metadata

Assignees

Type

Fields

No fields configured for Bug.

Projects

Status
🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions