Introduce scan.primary-branch which is symmetrical to scan.fallback-branch#7553
Introduce scan.primary-branch which is symmetrical to scan.fallback-branch#7553tsreaper merged 4 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Introduces a new table option scan.primary-branch to allow partition-level “priority branch” reads (symmetric to the existing scan.fallback-branch) so users can move hot/large partitions to a non-PK branch to improve query performance.
Changes:
- Add
scan.primary-branchoption and wire it into table creation/branch resolution so reads prefer the configured primary branch when a partition exists there. - Generalize
FallbackReadFileStoreTableto support both “fallback mode” (wrapped-first) and “primary mode” (other-first), and adjust related chain-table plumbing. - Update tests and generated configuration docs to cover/describe the new option and merged-branch behavior.
Reviewed changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| paimon-core/src/test/java/org/apache/paimon/table/FallbackReadFileStoreTableTest.java | Parameterizes tests to validate both priority orders and adds a write-delegation test. |
| paimon-core/src/main/java/org/apache/paimon/utils/ChainTableUtils.java | Updates chain primary-table resolution to the renamed accessor (other()). |
| paimon-core/src/main/java/org/apache/paimon/table/system/ReadOptimizedTable.java | Delegates scan construction to FallbackReadFileStoreTable to support both modes. |
| paimon-core/src/main/java/org/apache/paimon/table/FileStoreTableFactory.java | Adds scan.primary-branch handling and factors branch-table creation into a helper. |
| paimon-core/src/main/java/org/apache/paimon/table/FallbackReadFileStoreTable.java | Generalizes merged scanning/reading to support either wrapped-first or other-first priority. |
| paimon-core/src/main/java/org/apache/paimon/table/ChainGroupReadTable.java | Adapts to the new FallbackReadFileStoreTable constructor and renamed accessors. |
| paimon-core/src/main/java/org/apache/paimon/table/AbstractFileStoreTable.java | Prevents deletion of a branch configured as scan.primary-branch. |
| paimon-core/src/main/java/org/apache/paimon/schema/SchemaValidation.java | Validates mutual exclusivity of scan.fallback-branch and scan.primary-branch, and validates existence. |
| paimon-api/src/main/java/org/apache/paimon/schema/Schema.java | Tweaks primary-key normalization logic when primary-key is present in options. |
| paimon-api/src/main/java/org/apache/paimon/CoreOptions.java | Adds the scan.primary-branch config option and description. |
| docs/layouts/shortcodes/generated/*.html | Updates generated configuration docs to include the new option and ordering changes. |
Comments suppressed due to low confidence (1)
paimon-api/src/main/java/org/apache/paimon/schema/Schema.java:199
- The new conflict check allows
optionsto contain an emptyprimary-keyvalue without throwing, but the code still parses it and overwrites any DDL primary keys with an empty list. This can silently drop primary keys when users passprimary-key=. Consider treating empty/blank option values as "not set" (skip parsing and keep the existingprimaryKeys), or keep the previous behavior of rejecting any simultaneous DDL+option primary key definitions regardless of emptiness.
if (options.containsKey(CoreOptions.PRIMARY_KEY.key())) {
String pk = options.get(CoreOptions.PRIMARY_KEY.key());
if (!primaryKeys.isEmpty() && !StringUtils.isEmpty(pk)) {
throw new RuntimeException(
"Cannot define primary key on DDL and table options at the same time.");
}
primaryKeys =
Arrays.stream(pk.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Case 1: WHERE pt = 1 | ||
| // Partition 1 exists in priority table. Should never be read from supplemental. | ||
| DataTableScan scan1 = table.newScan(); | ||
| scan1.withFilter(builder.equal(0, 1)); |
There was a problem hiding this comment.
testPlanWithDataFilter no longer applies any data predicate (it only filters on partition pt). This means the test no longer covers the scenario described in the Javadoc (where data filters can eliminate all files in a partition and must not affect partition ownership). Consider reintroducing a data predicate (e.g., pt = 1 AND a = 100) to ensure the plan logic is validated against data-filtered scans.
| // Case 1: WHERE pt = 1 | |
| // Partition 1 exists in priority table. Should never be read from supplemental. | |
| DataTableScan scan1 = table.newScan(); | |
| scan1.withFilter(builder.equal(0, 1)); | |
| // Case 1: WHERE pt = 1 AND a = 100 | |
| // Partition 1 exists in priority table. Should never be read from supplemental. | |
| DataTableScan scan1 = table.newScan(); | |
| scan1.withFilter( | |
| builder.and(builder.equal(0, 1), builder.equal(1, 100))); |
| private Read() { | ||
| this.mainRead = wrapped.newRead(); | ||
| this.fallbackRead = fallback.newRead(); | ||
| FileStoreTable first = wrappedFirst ? wrapped : other; | ||
| FileStoreTable second = wrappedFirst ? other : wrapped; | ||
| this.mainRead = first.newRead(); | ||
| this.fallbackRead = second.newRead(); | ||
| } |
There was a problem hiding this comment.
In primary-branch mode (wrappedFirst=false), fallbackRead/isFallback correspond to reads from the current branch (the non-priority table). The later log message still says "Reading from fallback branch...", which becomes misleading. Consider renaming these variables/messages to neutral terms (e.g., priority/supplemental) or including the actual branch/table name in the log.
| if (!StringUtils.isNullOrWhitespaceOnly(primaryBranch)) { | ||
| checkArgument( | ||
| schemaManager.copyWithBranch(primaryBranch).latest().isPresent(), | ||
| "Cannot set '%s' = '%s' because the branch '%s' isn't existed.", |
There was a problem hiding this comment.
Error message grammar is inconsistent with the fallback-branch validation above ("does not exist"). This string uses "isn't existed", which is ungrammatical; align it to "does not exist" for clarity and consistency.
| "Cannot set '%s' = '%s' because the branch '%s' isn't existed.", | |
| "Cannot set '%s' = '%s' because the branch '%s' does not exist.", |
| String.format( | ||
| "can not delete the primary branch. " | ||
| + "branchName to be deleted is %s. you have set 'scan.primary-branch' = '%s'. " | ||
| + "you should reset 'scan.primary-branch' before deleting this branch.", | ||
| branchName, primaryBranch)); |
There was a problem hiding this comment.
The newly added exception message is user-facing and has several grammar/capitalization issues ("can not", sentence casing, repeated "you"). Consider rephrasing to a clearer message (e.g., "Cannot delete branch '%s' because it is configured as scan.primary-branch; unset scan.primary-branch first.") and, ideally, aligning with the fallback-branch message for consistency.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public FileStoreTable other() { | ||
| return other; | ||
| } | ||
|
|
There was a problem hiding this comment.
fallback() was renamed/removed in favor of other(). Because FallbackReadFileStoreTable is public, this is a binary/source incompatible change for any external users relying on fallback(). Consider keeping a deprecated fallback() accessor delegating to other() (and/or a clearer accessor like supplemental()), at least for one release cycle.
| /** | |
| * @deprecated Use {@link #other()} instead. | |
| */ | |
| @Deprecated | |
| public FileStoreTable fallback() { | |
| return other(); | |
| } |
| public DataTableScan newScan(Function<FileStoreTable, DataTableScan> scanCreator) { | ||
| validateSchema(); | ||
| FileStoreTable first = wrappedFirst ? wrapped : other; | ||
| FileStoreTable second = wrappedFirst ? other : wrapped; | ||
| return new FallbackReadScan( |
There was a problem hiding this comment.
newScan(Function<FileStoreTable, DataTableScan>) correctly allows callers (e.g. system tables) to customize how scans are created, but FallbackReadScan still creates its partition-listing scans via wrappedTable.newScan() / fallbackTable.newScan() internally. That bypasses the provided scanCreator and can make partition-ownership decisions inconsistent with the actual mainScan/fallbackScan behavior (notably for ReadOptimizedTable). Consider threading scanCreator through to FallbackReadScan and using it for the partition listing scans too.
| if (options.containsKey(CoreOptions.PRIMARY_KEY.key())) { | ||
| if (!primaryKeys.isEmpty()) { | ||
| String pk = options.get(CoreOptions.PRIMARY_KEY.key()); | ||
| if (!primaryKeys.isEmpty() && !StringUtils.isEmpty(pk)) { |
There was a problem hiding this comment.
If table options contain primary-key with an empty value (e.g. ''), the new condition !primaryKeys.isEmpty() && !StringUtils.isEmpty(pk) will skip the conflict check and then overwrite the DDL-defined primary key with an empty list (because pk.split(",") yields only empty tokens). That silently drops the primary key definition. It would be safer to treat null/blank pk as “option not set” (ignore it and keep the DDL primaryKeys) instead of normalizing it into an empty PK list.
| if (!primaryKeys.isEmpty() && !StringUtils.isEmpty(pk)) { | |
| // Treat null/blank primary-key option as "not set" and keep DDL-defined primary keys. | |
| if (StringUtils.isEmpty(pk)) { | |
| return primaryKeys; | |
| } | |
| if (!primaryKeys.isEmpty()) { |
| String primaryBranch = coreOptions().toConfiguration().get(CoreOptions.SCAN_PRIMARY_BRANCH); | ||
| if (!StringUtils.isNullOrWhitespaceOnly(primaryBranch) | ||
| && branchName.equals(primaryBranch)) { | ||
| throw new IllegalArgumentException( | ||
| String.format( |
There was a problem hiding this comment.
The new guard preventing deletion of the scan.primary-branch is not covered by tests (there are existing tests around branch deletion / fallback-branch). Please add a test asserting deleteBranch(primaryBranch) fails with the expected message when scan.primary-branch is set, similar to the existing fallback-branch coverage.
Purpose
Some users may have accumulated large amounts of data in their main primary-key branch, and they want to organize the data into another non-primary-key branch to speed up querying. This PR introduces
scan.primary-branchwhich is symmetrical toscan.fallback-branch.Tests
FallbackReadFileStoreTableTest