In line with the repository's published vulnerability-reporting guidance, I am reporting this issue here as a public issue.
Summary
When parsing crafted XML containing an out-of-range numeric character reference such as �, XML#toJSONObject() throws an uncaught IllegalArgumentException instead of a controlled parsing exception such as JSONException.
As a result, applications that parse attacker-controlled XML may encounter an uncaught runtime exception. Depending on the integration, this may result in request failure or denial of service.
I reproduced this in release 20251224.
Details
The apparent root cause is in XMLTokener#unescapeEntity(), where a decoded numeric character reference is passed to string construction without first validating that it is a valid Unicode code point:
|
if (e.charAt(0) == '#') { |
|
int cp; |
|
if (e.charAt(1) == 'x' || e.charAt(1) == 'X') { |
|
// hex encoded unicode |
|
cp = Integer.parseInt(e.substring(2), 16); |
|
} else { |
|
// decimal encoded unicode |
|
cp = Integer.parseInt(e.substring(1)); |
|
} |
|
return new String(new int[] {cp},0,1); |
Minimal PoC
XML.toJSONObject("<a>�</a>");
I also checked a few closely related inputs while narrowing this down:
� reproduces the same behavior.
- The same behavior is also reachable from an attribute value, e.g.
<a b="�"/>.
� did not reproduce the same uncaught exception in my testing.
This suggests that the immediate issue here is specifically the handling of out-of-range Unicode code points during string construction, rather than XML-invalid numeric character references in general.
Observed Result
java.lang.IllegalArgumentException: 1114112
at java.base/java.lang.StringUTF16.toBytes(Unknown Source)
at java.base/java.lang.String.<init>(Unknown Source)
at org.json.XMLTokener.unescapeEntity(XMLTokener.java:171)
at org.json.XMLTokener.nextEntity(XMLTokener.java:148)
at org.json.XMLTokener.nextContent(XMLTokener.java:117)
at org.json.XML.parse(XML.java:407)
at org.json.XML.toJSONObject(XML.java:780)
at org.json.XML.toJSONObject(XML.java:866)
at org.json.XML.toJSONObject(XML.java:665)
at PoC.main(PoC.java:7)
In line with the repository's published vulnerability-reporting guidance, I am reporting this issue here as a public issue.
Summary
When parsing crafted XML containing an out-of-range numeric character reference such as
�,XML#toJSONObject()throws an uncaughtIllegalArgumentExceptioninstead of a controlled parsing exception such asJSONException.As a result, applications that parse attacker-controlled XML may encounter an uncaught runtime exception. Depending on the integration, this may result in request failure or denial of service.
I reproduced this in release
20251224.Details
The apparent root cause is in
XMLTokener#unescapeEntity(), where a decoded numeric character reference is passed to string construction without first validating that it is a valid Unicode code point:JSON-java/src/main/java/org/json/XMLTokener.java
Lines 162 to 171 in cf65368
Minimal PoC
I also checked a few closely related inputs while narrowing this down:
�reproduces the same behavior.<a b="�"/>.�did not reproduce the same uncaught exception in my testing.This suggests that the immediate issue here is specifically the handling of out-of-range Unicode code points during string construction, rather than XML-invalid numeric character references in general.
Observed Result