You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
…olution in next commit)
---
Original cherry-pick message follows:
Merge d0085a8 into 12f07ad
# Conflicts:
# src/Disks/DiskObjectStorage/ObjectStorages/S3/diskSettings.cpp
1 failed test(s) considered — cost $1.1164 full report
AI narration
Let me check what the failure looks like and verify the build setup.
The single failing test 01171_mv_select_insert_isolation_long failed with "not enough iterations of select_insert_action has been made" (10/15 in two of the eight parallel actions). It's a long-running stress test that:
Spawns 8 concurrent transactional workloads against local MergeTree tables and a materialized view
Requires each workload to complete ≥15 iterations within a 400-second window
Has historically been tuned multiple times for timing (adjust MIN_ITERATIONS, adjust settings, add min time in its git history) and is tagged long, no-ordinary-database, no-encrypted-storage, no-msan for that reason
The actual delta this PR introduces (b38a2e29460 + 22afc923047) is a cherry-pick of ClickHouse#96802 — purely S3 client / diskSettings changes. None of that touches MergeTree, transactions, or materialized views. The failure mode is purely a wall-clock budget miss in a known-flaky stress test on an arm_asan debug build, with no correctness assertion fired — only two of eight parallel workers fell short. I can't write a one-sentence "this PR broke this test because " justification, so per the scoping rule it's out of scope.
No code changes made.
Final summary:
01171_mv_select_insert_isolation_long — [unrelated] — long-running stress test (8 parallel transactional workers, 400s wall-clock budget, MIN_ITERATIONS=15). Failed with "not enough iterations of select_insert_action has been made from 33 until 433" — a timing budget miss, not a correctness failure. Test history shows repeated MIN_ITERATIONS/timing tuning, confirming it's a chronic stress-test flake on slow shards (debug+ASan on ARM). This PR's actual delta is a cherry-pick of Share S3 client cache per bucket ClickHouse/ClickHouse#96802 (S3 client / diskSettings only) with no plausible link to local-MergeTree transactional MV throughput.
UNRELATED
Final summary:
01171_mv_select_insert_isolation_long — [unrelated] — long-running stress test (8 parallel transactional workers, 400s wall-clock budget, MIN_ITERATIONS=15). Failed with "not enough iterations of select_insert_action has been made from 33 until 433" — a timing budget miss, not a correctness failure. Test history shows repeated MIN_ITERATIONS/timing tuning, confirming it's a chronic stress-test flake on slow shards (debug+ASan on ARM). This PR's actual delta is a cherry-pick of Share S3 client cache per bucket ClickHouse/ClickHouse#96802 (S3 client / diskSettings only) with no plausible link to local-MergeTree transactional MV throughput.
UNRELATED
⏭️ UNRELATED — Integration tests (amd_tsan, 2/6)
11 failed test(s) considered — cost $3.5695 full report
AI narration
I have enough information to classify all failures. Let me summarize my findings.
Investigation summary
The PR (feature/antalya-26.3/ClickHouse-ClickHouse-pr-96802) is a cherry-pick of upstream PR ClickHouse#96802. The actual diff vs origin/antalya-26.3 (the proper base) is only 4 files in src/IO/S3/:
(The huge git diff antalya-26.3..HEAD shown by --stat is misleading because the local antalya-26.3 is stale — origin/antalya-26.3 already has the CPULeaseAllocation UAF fix, the scheduler refactors, etc. The merge-base of HEAD vs origin/antalya-26.3 is 59bfcc082dc; the PR only adds 2 commits on top of that.)
None of the 11 failing tests exercise S3 client code:
test_create_workload_under_load hits a pre-existing TSan data race in SemaphoreConstraint::active() at src/Common/Scheduler/Nodes/SemaphoreConstraint.h:205 (race between removeChild on the WorkloadResMgr thread and finishRequest on a worker). That file is not in this PR's diff. The TSan abort kills the server, and the cluster fixture is module-scoped, so the other 6 scheduler tests then fail with Connection refused / Attempt to read after eof during the clear_workloads_and_resources teardown.
The 4 test_token_roles_mapping tests are annotated flaky-elsewhere on 5 other tracked PRs — strong master-flake/infra signal, and they are unrelated to S3 client code.
No code changes warranted. Since I am not editing any code, no rebuild or re-run is needed (the rules say only re-run CAUSED-BY-THIS-PR tests).
Final report
test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[fixed_longer_dev] — [unrelated] collateral damage: server died earlier in the run from a pre-existing TSan race in SemaphoreConstraint::active (file untouched by this S3-only PR); subsequent queries get ATTEMPT_TO_READ_AFTER_EOF while the connection drops.
test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_equal] — [unrelated] server already dead from the same TSan abort; Connection refused in teardown. Also flaky on another tracked PR.
test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_longer_prd] — [unrelated] same dead-server cascade; flaky-elsewhere.
test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[random_longer_dev] — [unrelated] same dead-server cascade; flaky-elsewhere.
test_scheduler_cpu_preemptive/test.py::test_downscaling[cpu-slot-preemption-timeout-1ms] — [unrelated] same dead-server cascade; flaky-elsewhere.
test_scheduler_cpu_preemptive/test.py::test_drop_workload_during_query — [unrelated] same dead-server cascade; flaky-elsewhere.
test_scheduler_cpu_preemptive/test.py::test_create_workload_under_load — [unrelated] originating TSan data race in DB::SemaphoreConstraint::active() between WorkloadResourceManager::Resource::deleteNode (writer) and CPULeaseAllocation::RequestChain::finish (reader). The race site is in src/Common/Scheduler/Nodes/SemaphoreConstraint.h, which this PR (S3 client only) does not touch. Also flaky on another tracked PR.
test_token_roles_mapping/test.py::test_mapped_guid_grants_mapped_role — [unrelated] HTTP/auth test, no causal connection to S3 client code; failing on 5 other tracked PRs (strong master/infra flake).
test_token_roles_mapping/test.py::test_multiple_guids_grant_multiple_roles — [unrelated] same; flaky on 5 other PRs.
test_token_roles_mapping/test.py::test_unmapped_guid_is_dropped_by_filter — [unrelated] same; flaky on 5 other PRs.
test_token_roles_mapping/test.py::test_only_unmapped_guids_yield_no_roles — [unrelated] same; flaky on 5 other PRs.
git status --porcelain
Tree clean. No commits authored or reverted.
UNRELATED
Tree clean. No commits authored or reverted.
UNRELATED
🤖 Posted automatically by releasy analyze-fails. Re-run the command to refresh.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Share S3 ClientCache per bucket, reduce repeated region discovery (ClickHouse#96802 by @zvonand).
Cherry-picked from ClickHouse#96802.
Closes ClickHouse#92482
Documentation entry for user-facing changes