Skip to content

Introduce scan.primary-branch which is symmetrical to scan.fallback-branch#7553

Merged
tsreaper merged 4 commits intoapache:masterfrom
tsreaper:primary-branch
Mar 31, 2026
Merged

Introduce scan.primary-branch which is symmetrical to scan.fallback-branch#7553
tsreaper merged 4 commits intoapache:masterfrom
tsreaper:primary-branch

Conversation

@tsreaper
Copy link
Copy Markdown
Contributor

Purpose

Some users may have accumulated large amounts of data in their main primary-key branch, and they want to organize the data into another non-primary-key branch to speed up querying. This PR introduces scan.primary-branch which is symmetrical to scan.fallback-branch.

Tests

  • FallbackReadFileStoreTableTest

@tsreaper tsreaper marked this pull request as draft March 30, 2026 11:02
@tsreaper tsreaper marked this pull request as ready for review March 30, 2026 11:49
@tsreaper tsreaper requested a review from Copilot March 30, 2026 11:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new table option scan.primary-branch to allow partition-level “priority branch” reads (symmetric to the existing scan.fallback-branch) so users can move hot/large partitions to a non-PK branch to improve query performance.

Changes:

  • Add scan.primary-branch option and wire it into table creation/branch resolution so reads prefer the configured primary branch when a partition exists there.
  • Generalize FallbackReadFileStoreTable to support both “fallback mode” (wrapped-first) and “primary mode” (other-first), and adjust related chain-table plumbing.
  • Update tests and generated configuration docs to cover/describe the new option and merged-branch behavior.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
paimon-core/src/test/java/org/apache/paimon/table/FallbackReadFileStoreTableTest.java Parameterizes tests to validate both priority orders and adds a write-delegation test.
paimon-core/src/main/java/org/apache/paimon/utils/ChainTableUtils.java Updates chain primary-table resolution to the renamed accessor (other()).
paimon-core/src/main/java/org/apache/paimon/table/system/ReadOptimizedTable.java Delegates scan construction to FallbackReadFileStoreTable to support both modes.
paimon-core/src/main/java/org/apache/paimon/table/FileStoreTableFactory.java Adds scan.primary-branch handling and factors branch-table creation into a helper.
paimon-core/src/main/java/org/apache/paimon/table/FallbackReadFileStoreTable.java Generalizes merged scanning/reading to support either wrapped-first or other-first priority.
paimon-core/src/main/java/org/apache/paimon/table/ChainGroupReadTable.java Adapts to the new FallbackReadFileStoreTable constructor and renamed accessors.
paimon-core/src/main/java/org/apache/paimon/table/AbstractFileStoreTable.java Prevents deletion of a branch configured as scan.primary-branch.
paimon-core/src/main/java/org/apache/paimon/schema/SchemaValidation.java Validates mutual exclusivity of scan.fallback-branch and scan.primary-branch, and validates existence.
paimon-api/src/main/java/org/apache/paimon/schema/Schema.java Tweaks primary-key normalization logic when primary-key is present in options.
paimon-api/src/main/java/org/apache/paimon/CoreOptions.java Adds the scan.primary-branch config option and description.
docs/layouts/shortcodes/generated/*.html Updates generated configuration docs to include the new option and ordering changes.
Comments suppressed due to low confidence (1)

paimon-api/src/main/java/org/apache/paimon/schema/Schema.java:199

  • The new conflict check allows options to contain an empty primary-key value without throwing, but the code still parses it and overwrites any DDL primary keys with an empty list. This can silently drop primary keys when users pass primary-key=. Consider treating empty/blank option values as "not set" (skip parsing and keep the existing primaryKeys), or keep the previous behavior of rejecting any simultaneous DDL+option primary key definitions regardless of emptiness.
        if (options.containsKey(CoreOptions.PRIMARY_KEY.key())) {
            String pk = options.get(CoreOptions.PRIMARY_KEY.key());
            if (!primaryKeys.isEmpty() && !StringUtils.isEmpty(pk)) {
                throw new RuntimeException(
                        "Cannot define primary key on DDL and table options at the same time.");
            }
            primaryKeys =
                    Arrays.stream(pk.split(","))
                            .map(String::trim)
                            .filter(s -> !s.isEmpty())
                            .collect(Collectors.toList());

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +173 to +176
// Case 1: WHERE pt = 1
// Partition 1 exists in priority table. Should never be read from supplemental.
DataTableScan scan1 = table.newScan();
scan1.withFilter(builder.equal(0, 1));
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testPlanWithDataFilter no longer applies any data predicate (it only filters on partition pt). This means the test no longer covers the scenario described in the Javadoc (where data filters can eliminate all files in a partition and must not affect partition ownership). Consider reintroducing a data predicate (e.g., pt = 1 AND a = 100) to ensure the plan logic is validated against data-filtered scans.

Suggested change
// Case 1: WHERE pt = 1
// Partition 1 exists in priority table. Should never be read from supplemental.
DataTableScan scan1 = table.newScan();
scan1.withFilter(builder.equal(0, 1));
// Case 1: WHERE pt = 1 AND a = 100
// Partition 1 exists in priority table. Should never be read from supplemental.
DataTableScan scan1 = table.newScan();
scan1.withFilter(
builder.and(builder.equal(0, 1), builder.equal(1, 100)));

Copilot uses AI. Check for mistakes.
Comment on lines 601 to 606
private Read() {
this.mainRead = wrapped.newRead();
this.fallbackRead = fallback.newRead();
FileStoreTable first = wrappedFirst ? wrapped : other;
FileStoreTable second = wrappedFirst ? other : wrapped;
this.mainRead = first.newRead();
this.fallbackRead = second.newRead();
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In primary-branch mode (wrappedFirst=false), fallbackRead/isFallback correspond to reads from the current branch (the non-priority table). The later log message still says "Reading from fallback branch...", which becomes misleading. Consider renaming these variables/messages to neutral terms (e.g., priority/supplemental) or including the actual branch/table name in the log.

Copilot uses AI. Check for mistakes.
if (!StringUtils.isNullOrWhitespaceOnly(primaryBranch)) {
checkArgument(
schemaManager.copyWithBranch(primaryBranch).latest().isPresent(),
"Cannot set '%s' = '%s' because the branch '%s' isn't existed.",
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message grammar is inconsistent with the fallback-branch validation above ("does not exist"). This string uses "isn't existed", which is ungrammatical; align it to "does not exist" for clarity and consistency.

Suggested change
"Cannot set '%s' = '%s' because the branch '%s' isn't existed.",
"Cannot set '%s' = '%s' because the branch '%s' does not exist.",

Copilot uses AI. Check for mistakes.
Comment on lines +722 to +726
String.format(
"can not delete the primary branch. "
+ "branchName to be deleted is %s. you have set 'scan.primary-branch' = '%s'. "
+ "you should reset 'scan.primary-branch' before deleting this branch.",
branchName, primaryBranch));
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added exception message is user-facing and has several grammar/capitalization issues ("can not", sentence casing, repeated "you"). Consider rephrasing to a clearer message (e.g., "Cannot delete branch '%s' because it is configured as scan.primary-branch; unset scan.primary-branch first.") and, ideally, aligning with the fallback-branch message for consistency.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

public FileStoreTable other() {
return other;
}

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fallback() was renamed/removed in favor of other(). Because FallbackReadFileStoreTable is public, this is a binary/source incompatible change for any external users relying on fallback(). Consider keeping a deprecated fallback() accessor delegating to other() (and/or a clearer accessor like supplemental()), at least for one release cycle.

Suggested change
/**
* @deprecated Use {@link #other()} instead.
*/
@Deprecated
public FileStoreTable fallback() {
return other();
}

Copilot uses AI. Check for mistakes.
Comment on lines 204 to 208
public DataTableScan newScan(Function<FileStoreTable, DataTableScan> scanCreator) {
validateSchema();
FileStoreTable first = wrappedFirst ? wrapped : other;
FileStoreTable second = wrappedFirst ? other : wrapped;
return new FallbackReadScan(
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newScan(Function<FileStoreTable, DataTableScan>) correctly allows callers (e.g. system tables) to customize how scans are created, but FallbackReadScan still creates its partition-listing scans via wrappedTable.newScan() / fallbackTable.newScan() internally. That bypasses the provided scanCreator and can make partition-ownership decisions inconsistent with the actual mainScan/fallbackScan behavior (notably for ReadOptimizedTable). Consider threading scanCreator through to FallbackReadScan and using it for the partition listing scans too.

Copilot uses AI. Check for mistakes.
if (options.containsKey(CoreOptions.PRIMARY_KEY.key())) {
if (!primaryKeys.isEmpty()) {
String pk = options.get(CoreOptions.PRIMARY_KEY.key());
if (!primaryKeys.isEmpty() && !StringUtils.isEmpty(pk)) {
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If table options contain primary-key with an empty value (e.g. ''), the new condition !primaryKeys.isEmpty() && !StringUtils.isEmpty(pk) will skip the conflict check and then overwrite the DDL-defined primary key with an empty list (because pk.split(",") yields only empty tokens). That silently drops the primary key definition. It would be safer to treat null/blank pk as “option not set” (ignore it and keep the DDL primaryKeys) instead of normalizing it into an empty PK list.

Suggested change
if (!primaryKeys.isEmpty() && !StringUtils.isEmpty(pk)) {
// Treat null/blank primary-key option as "not set" and keep DDL-defined primary keys.
if (StringUtils.isEmpty(pk)) {
return primaryKeys;
}
if (!primaryKeys.isEmpty()) {

Copilot uses AI. Check for mistakes.
Comment on lines +717 to +721
String primaryBranch = coreOptions().toConfiguration().get(CoreOptions.SCAN_PRIMARY_BRANCH);
if (!StringUtils.isNullOrWhitespaceOnly(primaryBranch)
&& branchName.equals(primaryBranch)) {
throw new IllegalArgumentException(
String.format(
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new guard preventing deletion of the scan.primary-branch is not covered by tests (there are existing tests around branch deletion / fallback-branch). Please add a test asserting deleteBranch(primaryBranch) fails with the expected message when scan.primary-branch is set, similar to the existing fallback-branch coverage.

Copilot uses AI. Check for mistakes.
@tsreaper tsreaper merged commit 6beab20 into apache:master Mar 31, 2026
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants