PHOENIX-7751 : [SyncTable Tool] Feature to validate table data using PhoenixSyncTable tool b/w source and target cluster by rahulLiving · Pull Request #2379 · apache/phoenix

rahulLiving · 2026-02-18T15:38:22Z

No description provided.

This reverts commit 3c54c86.

This reverts commit c97f7e0.

This reverts commit fd46404.

…PhoenixSyncTable tool b/w source and target cluster

tkhurana · 2026-03-12T20:13:18Z

...t/src/main/java/org/apache/phoenix/coprocessorclient/BaseScannerRegionObserverConstants.java

+
+  /**
+   * PhoenixSyncTableTool chunk metadata cell qualifiers. These define the wire protocol between
+   * hoenixSyncTableRegionScanner (server-side coprocessor) and PhoenixSyncTableMapper (client-side


Typo missing 'P'

tkhurana · 2026-03-18T14:15:15Z

...ix-core-server/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java

+
+  public static Long getPhoenixSyncTableFromTime(Configuration conf) {
+    Preconditions.checkNotNull(conf);
+    String value = conf.get(PHOENIX_SYNC_TABLE_FROM_TIME);


Why didn't you use conf.getLong() ?

tkhurana · 2026-03-18T14:15:46Z

...ix-core-server/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java

+    conf.setLong(PHOENIX_SYNC_TABLE_TO_TIME, toTime);
+  }
+
+  public static Long getPhoenixSyncTableToTime(Configuration conf) {


Here also why didn't you use conf.getLong ?

tkhurana · 2026-03-23T22:33:11Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableTool.java

+    qTable = SchemaUtil.getQualifiedTableName(schemaName, tableName);
+    qSchemaName = SchemaUtil.normalizeIdentifier(schemaName);
+    PhoenixMapReduceUtil.validateTimeRange(startTime, endTime, qTable);
+    PhoenixMapReduceUtil.validateMaxLookbackAge(configuration, endTime, qTable);


Do we need the end time to be within the max lookback window ? How will the sync tool break if the end time is outside of the max lookback window ?

Right, this check is not useful.

On the other hand, we should not only enforce that it is not outside the window, we should also enforce a "safety buffer" to accommodate the data in flight. Even when the endTime is with in the window, if it is too close to the current time, it may miss the data that is still in flight and may cause false positives. In practice, this may not matter as the time it takes to setup and run could be in the order of several minutes and so enough for the catch up to complete, but I think it is better to make it explicit by enforcing a safety buffer and make this more deterministic.

If we remove this check and allow the endTime to be in the future, the possibility of having false positives due to the data in flight becomes a lot more pronounced. By enforcing both startTime and endTIme, we can ensure a "consistent window" where data is guaranteed to be fully replicated and 'quiesced' on both sides. WDYT?

I was thinking more about the "consistent window" or "quiesced window" approach that I suggested above and realized this is actually a race against sliding window during long-running jobs.

If a sync job takes several hours to complete, a startTime that was valid at the beginning of the job might actually 'slide' out of the lookback window by the time the final Mappers execute. Since HBase compactions on the Source and Target clusters aren't synchronized, couldn't this lead to false-positive mismatches if one cluster purges historical data mid-run while the other hasn't yet?

It may not always be possible to make the "Safety Buffer" on the startTime large enough to account for the job execution time, what if the max lookback window is only a few hours and the job itself takes hours? Does this require utilizing HBase Snapshots to 'freeze' the data state for the duration of the sync? Are there existing pattern that other systems might have employed to handle this issue?

@kadirozde @tkhurana

We need to think about this from two perspective, where we run the sync job regularly as cron, secondly if we use this for migration validation.
For migration validation, start time would definitely before maxLookbackAge. It is upto the owner if they want to validate all version and delete markers or just latest version.
For regular cron job to be used in PhoenixHA, we can configure the start/end time to be within maxLookBackAge.
Tanuj suggestion of giving user flexibility to choose rawScan & allVersion option would be helpful. And since we plan to fix the mismatched rows as well, we can consider source as SOT and fix accordingly.
Though, there can be instance where it can't be fixed like source have removed delete marker via compaction but target still has delete marker. Such rows can be flagged as not fixable as per design.

Btw, default endTime is (currentTime - 1 hour), to ensure target has the desired data.

tkhurana · 2026-03-23T23:23:16Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableTool.java

+      PhoenixConfigurationUtil.setPhoenixSyncTableChunkSizeBytes(configuration, chunkSizeBytes);
+    }
+    if (tenantId != null) {
+      PhoenixConfigurationUtil.setTenantId(configuration, tenantId);


Can you verify if the tenantid is being correctly set as a key prefix on the scan ?

If you have a table region with multiple tenants and we pass a tenant id then our scan range should start with the tenantid prefix.

Yes, it only create input ranges and scan for tenant specific rows. We have an IT for same

tkhurana · 2026-03-24T20:28:29Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixMapReduceUtil.java

+   * Configures a Configuration object with ZooKeeper settings from a ZK quorum string.
+   * @param baseConf Base configuration to create from (typically job configuration)
+   * @param zkQuorum ZooKeeper quorum string in format: "zk_quorum:port:znode" Example:
+   *                 "zk1,zk2,zk3:2181:/hbase"


This is actually not the only format for zk quorum. There are other valid formats also where the port number is specified separately for each server. There is actually a very useful API in Hbase called HBaseConfiguration.createClusterConf(job.getConfiguration(), targetZkQuorum) We should use that as that also works for zk registry.

tkhurana · 2026-03-24T20:51:14Z

...core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRepository.java

+
+    String query = "SELECT START_ROW_KEY, END_ROW_KEY FROM " + SYNC_TABLE_CHECKPOINT_TABLE_NAME
+      + " WHERE TABLE_NAME = ?  AND TARGET_CLUSTER = ?"
+      + " AND TYPE = ? AND FROM_TIME = ? AND TO_TIME = ? AND STATUS IN ( ?, ?)";


I am not 100% positive that you can assume that the output of this query is always sorted by row key. You might have to add an ORDER BY clause here. If you are adding an ORDER BY clause it will be better to add all the PK columns to make the sorting more efficient.

tkhurana · 2026-03-24T20:53:08Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableInputFormat.java

+    int completedIdx = 0;
+
+    // Two pointer comparison across splitRange and completedRange
+    while (splitIdx < allSplits.size() && completedIdx < completedRegions.size()) {


I think you are assuming here that completedRegions is already sorted. Please see my comment on the getProcessedMapperRegions function.

Won't the results be sorted in the PK order already? I see that the new commit adds ORDER BY, but not sure why that is required.

tkhurana · 2026-03-24T21:00:54Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableInputFormat.java

+      PhoenixInputSplit split = (PhoenixInputSplit) allSplits.get(splitIdx);
+      KeyRange splitRange = split.getKeyRange();
+      KeyRange completedRange = completedRegions.get(completedIdx);
+      byte[] splitStart = splitRange.getLowerRange();


Will the end key of the split range will always be exclusive ? If yes, can you please add a comment

Yes, both splitRange and completedRange, start key would be inclusive and endKey exclusive always. Will add a comment.

tkhurana · 2026-03-24T23:49:59Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java

+   * @return List of (startKey, endKey) pairs representing unprocessed ranges
+   */
+  @VisibleForTesting
+  public List<Pair<byte[], byte[]>> calculateUnprocessedRanges(byte[] mapperRegionStart,


Maybe we could return a List<KeyRange>

tkhurana · 2026-03-25T00:10:37Z

...core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRepository.java

+    if (hasStartBoundary) {
+      queryBuilder.append(" AND END_ROW_KEY >= ?");
+    }
+    queryBuilder.append(" AND STATUS IN (?, ?)");


Same as above we don't need to pass status

tkhurana · 2026-03-25T00:20:54Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java

+    scan.setCacheBlocks(false);
+    scan.setTimeRange(fromTime, toTime);
+    if (isTargetScan) {
+      scan.setLimit(1);


Can you add a comment here why we are setting limit to 1 and caching to 1

tkhurana · 2026-03-25T00:53:07Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java

+    Scan scan = new Scan();
+    scan.withStartRow(startKey, isStartKeyInclusive);
+    scan.withStopRow(endKey, isEndKeyInclusive);
+    scan.setRaw(true);


Are we sure we have to do raw scan ?

Also, can we make this configurable via the SyncTool commandl ine

tkhurana · 2026-03-25T00:54:05Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java

+    scan.withStartRow(startKey, isStartKeyInclusive);
+    scan.withStopRow(endKey, isEndKeyInclusive);
+    scan.setRaw(true);
+    scan.readAllVersions();


Same can we make the behavior of reading all versions configurable.

tkhurana · 2026-03-25T01:02:01Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/PhoenixSyncTableToolIT.java

@@ -0,0 +1,2267 @@
+/*


Can you add a test where rows are deleted on both the source and target tables but you have run compaction on only one. We can have actually 2 cases where compaction is run on the source but not on target and vice versa. I saw that you are doing raw scan. Maxlookback settings will also impact this.

haridsv

I just skimmed through and left some comments at the surface level.

phoenix-core-client/src/main/java/org/apache/phoenix/util/SHA256DigestUtil.java

...t/src/main/java/org/apache/phoenix/coprocessorclient/BaseScannerRegionObserverConstants.java

phoenix-core-client/src/main/java/org/apache/phoenix/util/ScanUtil.java

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java

...core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRepository.java

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRow.java

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableTool.java

haridsv · 2026-03-27T14:14:06Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableTool.java

+    qTable = SchemaUtil.getQualifiedTableName(schemaName, tableName);
+    qSchemaName = SchemaUtil.normalizeIdentifier(schemaName);
+    PhoenixMapReduceUtil.validateTimeRange(startTime, endTime, qTable);
+    PhoenixMapReduceUtil.validateMaxLookbackAge(configuration, endTime, qTable);


On the other hand, we should not only enforce that it is not outside the window, we should also enforce a "safety buffer" to accommodate the data in flight. Even when the endTime is with in the window, if it is too close to the current time, it may miss the data that is still in flight and may cause false positives. In practice, this may not matter as the time it takes to setup and run could be in the order of several minutes and so enough for the catch up to complete, but I think it is better to make it explicit by enforcing a safety buffer and make this more deterministic.

If we remove this check and allow the endTime to be in the future, the possibility of having false positives due to the data in flight becomes a lot more pronounced. By enforcing both startTime and endTIme, we can ensure a "consistent window" where data is guaranteed to be fully replicated and 'quiesced' on both sides. WDYT?

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixMapReduceUtil.java

haridsv · 2026-03-27T14:59:45Z

...core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRepository.java

+    try (PreparedStatement ps = connection.prepareStatement(UPSERT_CHECKPOINT_SQL)) {
+      ps.setString(1, row.getTableName());
+      ps.setString(2, row.getTargetCluster());
+      ps.setString(3, row.getType().name());


I would recommend storing a byte code rather a long string to reduce the size of the row key.

Have you thought about this?

35MB calculation was for primary key. 100K region, and each region with 10 chunks will have total of 1.1M rows.
Each row size with CHUNK/REGION will be 35 bytes. This is for all columns/cells
Each row size with C/R will be 7 bytes
1.1M35 - 1.1M7 Bytes, that roughly equals to 35 MB for a table with 100k region.

OK, I guess the difference will reduce further after compression, we should enable it along with column encoding as well.

By default column encoding is set to 2 for user table.
I just checked, none of the Phoenix system tables specify COMPRESSION in their DDL, maybe it depens on. what compression they want to use.
So, COMPRESSION can be set explicitly as HBASE column family attribute.

...core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRepository.java

phoenix-core/src/test/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtilTest.java

...core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRepository.java

...e-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableCheckpointOutputRow.java

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableTool.java

...core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableOutputRepository.java

haridsv · 2026-03-28T06:58:00Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableInputFormat.java

+    int completedIdx = 0;
+
+    // Two pointer comparison across splitRange and completedRange
+    while (splitIdx < allSplits.size() && completedIdx < completedRegions.size()) {


Won't the results be sorted in the PK order already? I see that the new commit adds ORDER BY, but not sure why that is required.

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableInputFormat.java

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java

haridsv · 2026-03-31T08:12:17Z

I see a generics compiler warning that can be fixed with the following change:

-public class PhoenixSyncTableInputFormat extends PhoenixInputFormat {
+public class PhoenixSyncTableInputFormat extends PhoenixInputFormat<DBWritable> {

The existing code also has a couple of warnings that can be fixed:

-public class PhoenixServerBuildIndexInputFormat<T extends DBWritable> extends PhoenixInputFormat {
+public class PhoenixServerBuildIndexInputFormat<T extends DBWritable> extends PhoenixInputFormat<T> {

-  extends PhoenixServerBuildIndexInputFormat {
+  extends PhoenixServerBuildIndexInputFormat<T> {

phoenix-core-client/src/main/java/org/apache/phoenix/util/SHA256DigestUtil.java

phoenix-core-client/src/main/java/org/apache/phoenix/util/ScanUtil.java

...e-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableCheckpointOutputRow.java

rahulLiving · 2026-04-09T09:47:46Z

I see a generics compiler warning that can be fixed with the following change:

Fixed it for my changes.

phoenix-core-client/src/main/java/org/apache/phoenix/util/SHA256DigestUtil.java

...-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixSyncTableRegionScanner.java

tkhurana · 2026-04-10T16:01:02Z

phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/PhoenixSyncTableMapper.java

+    // Not using try-with-resources since ChunkScannerContext owns the table lifecycle
+    Table hTable =
+      conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(physicalTableName);
+    Scan scan =


@rahulLiving One thing I realized we are not setting ttl attribute on the scan. We should so that we can mask expired rows. Also, add a test case for the same.

Rahul Kumar and others added 22 commits August 1, 2025 20:52

connection creation time

3c54c86

Revert "connection creation time"

c97f7e0

This reverts commit 3c54c86.

Revert "Revert "connection creation time""

53e9a3b

This reverts commit c97f7e0.

Merge remote-tracking branch 'upstream/master'

6b75fec

Merge remote-tracking branch 'upstream/master'

6f40ab4

Merge remote-tracking branch 'upstream/master'

7328f93

ITs changes

fd46404

Revert "ITs changes"

58ef6a9

This reverts commit fd46404.

Merge remote-tracking branch 'upstream/master'

6f226f6

PHOENIX-7751 : [SyncTable Tool] Feature to validate table data using …

1ccf4b6

…PhoenixSyncTable tool b/w source and target cluster

revert other changes

e75c6c1

checkstyle fix

a5060ab

checkstyle fix

cffd2e6

checkstyle fix

2ef30e6

adding more ITs

dd18dae

adding more ITs

326e792

misc fix

b7127cc

code comment

f588291

code comment formatting

f81aa56

Adding all UT/ITs

d60104f

Fix tests

359f345

Fix tests

1bcd693

rahulLiving marked this pull request as ready for review March 12, 2026 12:36

Rahul Kumar added 2 commits March 12, 2026 18:08

Merge remote-tracking branch 'upstream/master' into PHOENIX-7751

7904c50

PhoenixConfigurationUtilTest

b9dfd3c

tkhurana reviewed Mar 12, 2026

View reviewed changes

Rahul Kumar added 2 commits March 13, 2026 19:28

Fix build issues

6c50f95

Some More UTs

b8c00e4

tkhurana reviewed Mar 18, 2026

View reviewed changes

tkhurana reviewed Mar 23, 2026

View reviewed changes

tkhurana reviewed Mar 24, 2026

View reviewed changes

tkhurana reviewed Mar 25, 2026

View reviewed changes

haridsv reviewed Mar 26, 2026

View reviewed changes

Address review comments

d54f970

haridsv reviewed Mar 27, 2026

View reviewed changes

haridsv reviewed Mar 28, 2026

View reviewed changes

Address review comments

a951251

haridsv reviewed Apr 1, 2026

View reviewed changes

tkhurana approved these changes Apr 6, 2026

View reviewed changes

haridsv reviewed Apr 9, 2026

View reviewed changes

Rahul Kumar added 7 commits April 9, 2026 15:44

address suggestion

6daafa2

Address review comments

05cf3da

Fix tests

9aed71c

spotless apply

050e3ed

remove max size check

0dccd70

fix timeout

a0014a9

add column encoding

e9c0c35

tkhurana reviewed Apr 10, 2026

View reviewed changes

Conversation

rahulLiving commented Feb 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkhurana Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkhurana Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahulLiving Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahulLiving Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkhurana Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haridsv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkhurana Mar 18, 2026 •

edited

Loading

tkhurana Mar 18, 2026 •

edited

Loading

rahulLiving Mar 24, 2026 •

edited

Loading

rahulLiving Mar 24, 2026 •

edited

Loading

tkhurana Mar 24, 2026 •

edited

Loading

haridsv Apr 9, 2026 •

edited

Loading