[LLM EXPERIMENT] mysql: Add regression test for backup/restore DROP TABLE race#35926
Draft
[LLM EXPERIMENT] mysql: Add regression test for backup/restore DROP TABLE race#35926
Conversation
Adds a test that reproduces the race condition in database-issues#7683: during a mysqldump --all-databases restore, the replication worker's verify_schemas() queries the CURRENT state of information_schema to detect table drops. The flood of DDL events from the restore causes the worker to fall behind, and by the time verify_schemas runs for a tracked table's DROP TABLE event, that table has already been recreated — so the drop goes undetected and the source silently continues. This test is expected to fail until the fix (parsing DROP TABLE statements directly instead of querying MySQL state) is implemented. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
Change `contains:table was dropped` to `contains:Source error` so our CI output is identical to the nightly failures in builds 15970/15981. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add docstring noting relationship to mysql-cdc-resumption. Also include the partial fix: parse DROP TABLE statements directly instead of calling verify_schemas. This correctly detects the drop (health transitions to Stalled, errored_outputs blocks new rows) but the error emitted via give_fueled does not reach the query result. The error likely gets stuck in the reclock/persist pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a3e2af8 to
76b335e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Regression test and investigation for database-issues#7683: during a
mysqldump --all-databasesrestore, the MySQL replication worker can miss aDROP TABLEevent becauseverify_schemas()queries the current state ofinformation_schemarather than parsing the DDL statement. The flood of DDL events from system tables causes the worker to fall behind the binlog stream, and by the timeverify_schemasruns, the table has already been recreated.This is the same bug observed in nightly builds 15970 and 15981, where the
backup_restore_mysqlscenario inmysql-cdc-resumptionfailed withquery succeeded, but expected error containing "Source error". PR #35910 commented out that assertion to unblock CI.Commits
mysql: Add regression test for backup/restore DROP TABLE race— Adds a minimal reproduction (test/mysql-cdc/drop-recreate/) that does mysqldump + piped restore against a single tracked table. Fails identically to the nightly.test: Match assertion to existing nightly failure— Changescontains:table was droppedtocontains:Source errorso CI output matches the nightly failure messages exactly.PARTIAL fix— ParsesDROP TABLEstatements directly inhandle_query_eventinstead of callingverify_schemas. Also adds docstring noting relationship tomysql-cdc-resumption/backup_restore_mysql.Investigation findings
The partial fix in commit 3 correctly detects the drop, but the error does not reach queries:
DROP TABLE IF EXISTS \t` /* generated by server */` — note the server-appended comment that must be strippedDefiniteError::TableDroppedemitted viagive_fueledat the correct GTIDStalledinmz_source_statuseswith the correct errorhandle_rows_eventfilters errored outputs — no new data from the recreated table reaches the dataflowSELECT * FROM drop_recreatereturns the original snapshot data (1 before), not aSource errorThe error emitted at the
DROP TABLEGTID does not make it to persist at a timestamp the query can read. The likely cause is in the reclock operator or persist sink — the error gets stuck at a GTID timestamp that the output's frontier never advances past.Completing this fix requires understanding the remap/reclock/persist pipeline for per-output errors, which is beyond the scope of this PR.
Test plan
bin/mzcompose --find mysql-cdc run drop-recreatefails withquery succeeded, but expected error containing "Source error"— confirms the race condition (CI build 120314)verify-source-failed.tdcan be uncommented inmysql-cdc-resumptionGenerated with Claude Code