Skip to content

MDEV-36025: backup taken from a replica with optimistic parallel replication fails to restore most of the time#4888

Open
hemantdangi-gc wants to merge 1 commit into10.11from
10.11_MDEV-36025
Open

MDEV-36025: backup taken from a replica with optimistic parallel replication fails to restore most of the time#4888
hemantdangi-gc wants to merge 1 commit into10.11from
10.11_MDEV-36025

Conversation

@hemantdangi-gc
Copy link
Copy Markdown
Contributor

Issue:
The commit 5836191 (MDEV-21168) was deliberately NOT ported to 10.5+. It added an optional --rollback-xa flag to mariabackup in 10.4 only, with this note in the commit message:
"The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for slaves."
However, MDEV-742 does not solve the problem for internal XA transactions, as MDEV-36025 demonstrates. The --rollback-xa option, SRV_OPERATION_RESTORE_ROLLBACK_XA, and related code are completely absent from the 10.6 codebase.

Solution:
Port the MDEV-21168 fix to MariaDB 10.6.

Add SRV_OPERATION_RESTORE_ROLLBACK_XA server operation mode and --rollback-xa option (enabled by default) to mariabackup --prepare. This automatically rolls back prepared XA transactions during prepare, since the backup does not contain the binary log needed to resolve them.

Prevent incompatible combination of --rollback_xa and --export options. The combination creates mmap state inconsistency in InnoDB's MTR system, leading to crash.

…ication fails to restore most of the time

Issue:
The commit 5836191 (MDEV-21168) was deliberately NOT ported to 10.5+. It added
an optional --rollback-xa flag to mariabackup in 10.4 only, with this note in
the commit message:
"The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves."
However, MDEV-742 does not solve the problem for internal XA transactions, as
MDEV-36025 demonstrates. The --rollback-xa option,
SRV_OPERATION_RESTORE_ROLLBACK_XA, and related code are completely absent from
the 10.6 codebase.

Solution:
Port the MDEV-21168 fix to MariaDB 10.6.

Add SRV_OPERATION_RESTORE_ROLLBACK_XA server operation mode and
--rollback-xa option (enabled by default) to mariabackup --prepare.
This automatically rolls back prepared XA transactions during prepare,
since the backup does not contain the binary log needed to resolve them.

Prevent incompatible combination of --rollback_xa and --export options.
The combination creates mmap state inconsistency in InnoDB's MTR system,
leading to crash.
@andrelkin
Copy link
Copy Markdown
Contributor

However, MDEV-742 does not solve the problem for internal XA transactions, as MDEV-36025 demonstrates

How it demonstrates?
I believe it is then covered by an mtr test. Could you please point to that file and its block?

At any rate the commit message should be more verbose in this part. Please describe that scenario.

@hemantdangi-gc
Copy link
Copy Markdown
Contributor Author

hemantdangi-gc commented Apr 3, 2026

However, MDEV-742 does not solve the problem for internal XA transactions, as MDEV-36025 demonstrates

How it demonstrates? I believe it is then covered by an mtr test. Could you please point to that file and its block?

At any rate the commit message should be more verbose in this part. Please describe that scenario.

The commit 5836191 (MDEV-21168) was deliberately NOT ported to 10.5+. It added an optional --rollback-xa flag to mariabackup in 10.4 only, with this note in the commit message: "The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for slaves.""

The mariabackup.xa_prepared_on_restore test with MDEV-36025 'Found n prepared transactions' error, passes after porting MDEV-21168.

I am saying here MDEV-742 didn't fixed needed issue, and so we do have to port MDEV-21168, to handle MDEV-36025 error. I wanted to put a reason in commit message why MDEV-21168 is needed so added this line.

@hemantdangi-gc hemantdangi-gc requested a review from dr-m April 3, 2026 10:09
@andrelkin
Copy link
Copy Markdown
Contributor

andrelkin commented Apr 3, 2026

@hemantdangi-gc , whatever MDEV-742 failed to fix, that issue just has to be described in this ticket in all detail in the PR.
So could you please take care to cover my

MDEV-742 does not solve the problem for internal XA transactions, as MDEV-36025 demonstrates
concern of how specifically the internal XA (aka normal) transaction were not covered by MDEV-21168, to cover in the PR description at necessary length and its shorter (if possible) version is good for the commit message.

I thought I would see that failure scenario in some test, and that's exactly what a good commit message must point to.

The solution section needs to be structured better too.
Sure it start with a reference to an existing work

Port the MDEV-21168 fix
and in the following expand on how and why that work fixes the MDEV-36025 report.

As MDEV-36025 is reported for slave, the refined issue description must either confirm this is the slave side indeed or exonerate 😄 the good old slave (the blame is on the general server therefore).

PS. If you need to discuss the technical side of the issue I'll be available from next Tue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants