Antalya 26.3: Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats#1802
Open
zvonand wants to merge 6 commits into
Open
Antalya 26.3: Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats#1802zvonand wants to merge 6 commits into
zvonand wants to merge 6 commits into
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
zvonand
added a commit
that referenced
this pull request
May 17, 2026
The cherry-pick of upstream PR ClickHouse#101272 added test `04019_formats_nullable_empty_tuple_roundtrip` (which exercises `MsgPack` with `Nullable(Tuple())`) but missed the matching change to `MsgPackVisitor::start_array` from upstream commit acf7821. As a result the `SELECT` half of the `MsgPack` roundtrip throws `ILLEGAL_COLUMN: Cannot insert MessagePack array into column with type Nullable(Tuple())`. Apply the upstream fix verbatim: unwrap the `Nullable` before dispatching on `isTuple`, mark the row non-null in the null map, and descend into the inner `ColumnTuple`. Addresses 1 failing test in Stateless tests (amd_asan, distributed plan, parallel, 1/4) on #1802. Failure report: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=4dccbce9dfb174c622d7a6cd95a7aba252e20e9f&name_0=PR&name_1=Stateless%20tests%20%28amd_asan%2C%20distributed%20plan%2C%20parallel%2C%201%2F4%29 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…solution in next commit) --- Original cherry-pick message follows: Merge pull request ClickHouse#101272 from nihalzp/support-arrow-orc-nullable-tuple Support `Nullable(Tuple)` for `Arrow`, `ArrowStream`, `ORC`, legacy `Parquet` formats
The cherry-pick of upstream PR ClickHouse#101272 added test `04019_formats_nullable_empty_tuple_roundtrip` (which exercises `MsgPack` with `Nullable(Tuple())`) but missed the matching change to `MsgPackVisitor::start_array` from upstream commit acf7821. As a result the `SELECT` half of the `MsgPack` roundtrip throws `ILLEGAL_COLUMN: Cannot insert MessagePack array into column with type Nullable(Tuple())`. Apply the upstream fix verbatim: unwrap the `Nullable` before dispatching on `isTuple`, mark the row non-null in the null map, and descend into the inner `ColumnTuple`. Addresses 1 failing test in Stateless tests (amd_asan, distributed plan, parallel, 1/4) on #1802. Failure report: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=4dccbce9dfb174c622d7a6cd95a7aba252e20e9f&name_0=PR&name_1=Stateless%20tests%20%28amd_asan%2C%20distributed%20plan%2C%20parallel%2C%201%2F4%29 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8481925 to
8b48e5d
Compare
The cherry-pick of upstream PR ClickHouse#101272 added tests `04019_formats_nullable_empty_tuple_roundtrip` and `04064_tuple_inside_nullable_arrow_orc_roundtrip` (which exercise `Arrow`/`ArrowStream` roundtrip of `Nullable(Tuple(...))`) but missed the matching change to `fillArrowArrayWithTupleColumnData` from upstream commit `1e5b4344275` ("Add `Nullable(Tuple)` Arrow writing support"). As a result, the Arrow writer ignored the struct-level null bytemap: `NULL` rows roundtripped as default-valued tuples (`()` instead of `\N`, `((0,''),0)` instead of `((NULL,20))`, etc.). Apply the upstream fix verbatim: pass `nullptr` for the children's null bytemap (struct-level and child-level nulls are independent in Arrow) and use `builder.AppendNull` for null struct rows. Addresses failing tests in `Stateless tests (amd_asan, distributed plan, parallel, 1/4)`, `Stateless tests (amd_asan, distributed plan, parallel, 3/4)`, `Stateless tests (amd_debug, parallel)`, and `Stateless tests (arm_binary, parallel)` on #1802. Failure report: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=e77fa9f7de8f967c20e9e55efa8be5ac4e5bb482&name_0=PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`CSVFormatReader::readField` treats an empty unquoted field as the default value when `input_format_csv_empty_as_default` is enabled (the default). For a `Nullable(Tuple())` column the default is `NULL`, so the empty-tuple values written by `SerializationTuple::serializeTextCSV` (which produces an empty field for a tuple with zero elements) are read back as `NULL` instead of `()`. This breaks the `04019_formats_nullable_empty_tuple_roundtrip` test added in the cherry-pick of upstream PR ClickHouse#101272. The fix is the same as upstream commit `29f6f23cafe` from upstream PR \ClickHouse#100038 ("Fix `Nullable(Tuple)` not working with `CSV`, `MsgPack` format properly"), which ClickHouse#101272 depends on: special-case `Nullable(Tuple())` (zero-element tuple) so the empty field falls through to normal deserialization instead of being replaced with `NULL`. Applied verbatim (diff matches upstream byte-for-byte). Addresses CSV section failure in test `04019_formats_nullable_empty_tuple_roundtrip` on #1802. Failure report: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=36ae78193451a1aaf81d238b7b3bcb25fab3ee3d&name_0=PR&name_1=Stateless+tests+%28amd_asan%2C+distributed+plan%2C+parallel%2C+1%2F4%29&name_2=Tests Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats (ClickHouse#101272 by @nihalzp).
CI/CD Options
Exclude tests:
Regression jobs to run: