MDEV-36025: backup taken from a replica with optimistic parallel replication fails to restore most of the time#4888
MDEV-36025: backup taken from a replica with optimistic parallel replication fails to restore most of the time#4888hemantdangi-gc wants to merge 1 commit into10.11from
Conversation
41e76d3 to
e61f267
Compare
How it demonstrates? At any rate the commit message should be more verbose in this part. Please describe that scenario. |
The I am saying here MDEV-742 didn't fixed needed issue, and so we do have to port MDEV-21168, to handle MDEV-36025 error. I wanted to put a reason in commit message why MDEV-21168 is needed so added this line. |
|
@hemantdangi-gc , whatever MDEV-742 failed to fix, that issue just has to be described in this ticket in all detail in the PR.
I thought I would see that failure scenario in some test, and that's exactly what a good commit message must point to. The solution section needs to be structured better too.
As MDEV-36025 is reported for slave, the refined issue description must either confirm this is the slave side indeed or exonerate 😄 the good old slave (the blame is on the general server therefore). PS. If you need to discuss the technical side of the issue I'll be available from next Tue. |
e61f267 to
14ef88c
Compare
…ication fails
to restore most of the time
Issue:
Backups taken from a replica running optimistic parallel replication can
restore into a server that aborts on startup with:
Found N prepared transactions! ...
In the MDEV-36025 reproducer the application never issues XA SQL, but on
startup InnoDB reports several transactions “in the XA prepared state” and the
server aborts. These internal XA transactions created by optimistic parallel
replication on the replica are not covered by MDEV-742 and end up prepared after
restore, causing the “Found N prepared transactions” startup failure. This is
reproducible by mariabackup.xa_prepared_on_restore testcase, which fails with
'Found N prepared transactions'.
Solution:
Port the MDEV-21168 fix to MariaDB 10.6.
Add SRV_OPERATION_RESTORE_ROLLBACK_XA server operation mode and
--rollback-xa option (enabled by default) to mariabackup --prepare.
This automatically rolls back prepared XA transactions during prepare,
since the backup does not contain the binary log needed to resolve them.
Prevent incompatible combination of --rollback_xa and --export options.
The combination creates mmap state inconsistency in InnoDB's MTR system,
leading to crash.
14ef88c to
984d632
Compare
Added testcase detail in commit message now:
I have added replica usage in commit message now:
I have revised the commit message based on your suggestions, and have removed wrong expectation from MDEV-742 to resolve internal XA transaction issue. Please review and suggest if you have any further update. |
|
@hemantdangi-gc, thanks for the mariabackup.xa_prepared_on_restore references! For it I find that the issue is present in the general server.
Note MDEV-21168 removed, at least was supposed to do so, the user XA related option. Normal trx:s were not targeted. Once again, the bug agenda is that the prepared state BEGIN-...-COMMIT normal trx must be automatically rolled back. The normal trx can be identified by its xid having "mysql" string prefix as part of its identifier. |
| innodb_shutdown(); | ||
| /* Without buf_flush_sync(), the rolled-back changes would exist only | ||
| in the buffer pool and be lost on shutdown, leaving the data files in | ||
| an inconsistent state. | ||
| In the innodb_preshutdown(), the condition was updated to include | ||
| SRV_OPERATION_RESTORE_ROLLBACK_XA so it waits for transactions when | ||
| srv_fast_shutdown == 0. The innodb_preshutdown() is called by | ||
| innodb_shutdown(), which will wait for any active transactions to | ||
| finish and shut down purge and undo background sources for | ||
| SRV_OPERATION_RESTORE_ROLLBACK_XA. */ | ||
| if (xtrabackup_rollback_xa) | ||
| buf_flush_sync_batch(0, false); | ||
|
|
||
| innodb_shutdown(); | ||
|
|
||
| innodb_free_param(); | ||
| innodb_free_param(); |
There was a problem hiding this comment.
The comment is rather confusing. Why can’t we invoke the higher-level function log_make_checkpoint() here? Can we issue a some messages that would indicate that the backup has now diverged from the server it was copied from? And mention the new checkpoint LSN?
| {"rollback-xa", OPT_XTRA_ROLLBACK_XA, | ||
| "Rollback prepared XA transactions on --prepare. Enabled by default; " | ||
| "use --skip-rollback-xa to disable. " | ||
| "After preparing target directory with this option " | ||
| "it can no longer be a base for incremental backup.", | ||
| (G_PTR *) &xtrabackup_rollback_xa, (G_PTR *) &xtrabackup_rollback_xa, 0, | ||
| GET_BOOL, NO_ARG, 1, 0, 0, 0, 0, 0}, |
There was a problem hiding this comment.
I don’t think it is acceptable to change the default behaviour in stable release series. The need to change a large number of existing tests in a stable release series should be a warning sign to any reviewer.
Furthermore, I don’t think it is acceptable to break incremental backup by default, in any release.
| --exec $MYSQLD_LAST_CMD | ||
|
|
||
| --let SEARCH_PATTERN= Found .* prepared transactions! | ||
| --source include/search_pattern_in_file.inc |
There was a problem hiding this comment.
Note this is the general server, not slave.
Issue:
Backups taken from a replica running optimistic parallel replication can
restore into a server that aborts on startup with:
Found N prepared transactions! ...
In the MDEV-36025 reproducer the application never issues XA SQL, but on
startup InnoDB reports several transactions “in the XA prepared state” and the
server aborts. These internal XA transactions created by optimistic parallel
replication on the replica are not covered by MDEV-742 and end up prepared after
restore, causing the “Found N prepared transactions” startup failure. This is
reproducible by mariabackup.xa_prepared_on_restore testcase, which fails with
'Found N prepared transactions'.
Solution:
Port the MDEV-21168 fix to MariaDB 10.6.
Add SRV_OPERATION_RESTORE_ROLLBACK_XA server operation mode and
--rollback-xa option (enabled by default) to mariabackup --prepare.
This automatically rolls back prepared XA transactions during prepare,
since the backup does not contain the binary log needed to resolve them.
Prevent incompatible combination of --rollback_xa and --export options.
The combination creates mmap state inconsistency in InnoDB's MTR system,
leading to crash.