Skip to content

Antalya 26.3: Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats#1802

Open
zvonand wants to merge 6 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-101272
Open

Antalya 26.3: Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats#1802
zvonand wants to merge 6 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-101272

Conversation

@zvonand
Copy link
Copy Markdown
Member

@zvonand zvonand commented May 15, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats (ClickHouse#101272 by @nihalzp).

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@zvonand zvonand added releasy Created/managed by RelEasy ai-resolved Port conflict auto-resolved by Claude auto-prereq-added Combined PR includes auto-added prerequisite PR(s) labels May 15, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Workflow [PR], commit [ec4b124]

@svb-alt svb-alt added the backport Backport label May 16, 2026
@zvonand

This comment was marked as outdated.

zvonand added a commit that referenced this pull request May 17, 2026
The cherry-pick of upstream PR ClickHouse#101272 added test
`04019_formats_nullable_empty_tuple_roundtrip` (which exercises
`MsgPack` with `Nullable(Tuple())`) but missed the matching change to
`MsgPackVisitor::start_array` from upstream commit acf7821. As a
result the `SELECT` half of the `MsgPack` roundtrip throws
`ILLEGAL_COLUMN: Cannot insert MessagePack array into column with type
Nullable(Tuple())`.

Apply the upstream fix verbatim: unwrap the `Nullable` before
dispatching on `isTuple`, mark the row non-null in the null map, and
descend into the inner `ColumnTuple`.

Addresses 1 failing test in Stateless tests (amd_asan, distributed
plan, parallel, 1/4) on #1802.

Failure report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=4dccbce9dfb174c622d7a6cd95a7aba252e20e9f&name_0=PR&name_1=Stateless%20tests%20%28amd_asan%2C%20distributed%20plan%2C%20parallel%2C%201%2F4%29

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nihalzp and others added 3 commits May 18, 2026 22:41
…solution in next commit)

---
Original cherry-pick message follows:

Merge pull request ClickHouse#101272 from nihalzp/support-arrow-orc-nullable-tuple

Support `Nullable(Tuple)` for `Arrow`, `ArrowStream`, `ORC`, legacy `Parquet` formats
The cherry-pick of upstream PR ClickHouse#101272 added test
`04019_formats_nullable_empty_tuple_roundtrip` (which exercises
`MsgPack` with `Nullable(Tuple())`) but missed the matching change to
`MsgPackVisitor::start_array` from upstream commit acf7821. As a
result the `SELECT` half of the `MsgPack` roundtrip throws
`ILLEGAL_COLUMN: Cannot insert MessagePack array into column with type
Nullable(Tuple())`.

Apply the upstream fix verbatim: unwrap the `Nullable` before
dispatching on `isTuple`, mark the row non-null in the null map, and
descend into the inner `ColumnTuple`.

Addresses 1 failing test in Stateless tests (amd_asan, distributed
plan, parallel, 1/4) on #1802.

Failure report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=4dccbce9dfb174c622d7a6cd95a7aba252e20e9f&name_0=PR&name_1=Stateless%20tests%20%28amd_asan%2C%20distributed%20plan%2C%20parallel%2C%201%2F4%29

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvonand zvonand force-pushed the feature/antalya-26.3/ClickHouse-ClickHouse-pr-101272 branch from 8481925 to 8b48e5d Compare May 18, 2026 20:42
@zvonand zvonand changed the title Antalya 26.3: Add Arrow Flight SQL support Antalya 26.3: Support Nullable(Tuple) for Arrow, ArrowStream, ORC, legacy Parquet formats May 18, 2026
@zvonand zvonand removed ai-resolved Port conflict auto-resolved by Claude auto-prereq-added Combined PR includes auto-added prerequisite PR(s) releasy Created/managed by RelEasy labels May 18, 2026
zvonand and others added 2 commits May 19, 2026 12:21
The cherry-pick of upstream PR ClickHouse#101272 added tests
`04019_formats_nullable_empty_tuple_roundtrip` and
`04064_tuple_inside_nullable_arrow_orc_roundtrip` (which exercise
`Arrow`/`ArrowStream` roundtrip of `Nullable(Tuple(...))`) but missed
the matching change to `fillArrowArrayWithTupleColumnData` from
upstream commit `1e5b4344275` ("Add `Nullable(Tuple)` Arrow writing
support"). As a result, the Arrow writer ignored the struct-level null
bytemap: `NULL` rows roundtripped as default-valued tuples (`()`
instead of `\N`, `((0,''),0)` instead of `((NULL,20))`, etc.).

Apply the upstream fix verbatim: pass `nullptr` for the children's
null bytemap (struct-level and child-level nulls are independent in
Arrow) and use `builder.AppendNull` for null struct rows.

Addresses failing tests in `Stateless tests (amd_asan, distributed
plan, parallel, 1/4)`, `Stateless tests (amd_asan, distributed plan,
parallel, 3/4)`, `Stateless tests (amd_debug, parallel)`, and
`Stateless tests (arm_binary, parallel)` on
#1802.

Failure report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=e77fa9f7de8f967c20e9e55efa8be5ac4e5bb482&name_0=PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`CSVFormatReader::readField` treats an empty unquoted field as the
default value when `input_format_csv_empty_as_default` is enabled (the
default). For a `Nullable(Tuple())` column the default is `NULL`, so
the empty-tuple values written by `SerializationTuple::serializeTextCSV`
(which produces an empty field for a tuple with zero elements) are
read back as `NULL` instead of `()`. This breaks the
`04019_formats_nullable_empty_tuple_roundtrip` test added in the
cherry-pick of upstream PR ClickHouse#101272.

The fix is the same as upstream commit `29f6f23cafe` from upstream PR
\ClickHouse#100038 ("Fix `Nullable(Tuple)` not working with `CSV`, `MsgPack` format
properly"), which ClickHouse#101272 depends on: special-case
`Nullable(Tuple())` (zero-element tuple) so the empty field falls
through to normal deserialization instead of being replaced with
`NULL`. Applied verbatim (diff matches upstream byte-for-byte).

Addresses CSV section failure in test
`04019_formats_nullable_empty_tuple_roundtrip` on
#1802.

Failure report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1802&sha=36ae78193451a1aaf81d238b7b3bcb25fab3ee3d&name_0=PR&name_1=Stateless+tests+%28amd_asan%2C+distributed+plan%2C+parallel%2C+1%2F4%29&name_2=Tests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants