Skip to content

[FLINK-39824][mysql] Cache table filter results to reduce regex matching#4422

Open
taoran92 wants to merge 1 commit into
apache:masterfrom
taoran92:cache-mysql-table-filter
Open

[FLINK-39824][mysql] Cache table filter results to reduce regex matching#4422
taoran92 wants to merge 1 commit into
apache:masterfrom
taoran92:cache-mysql-table-filter

Conversation

@taoran92
Copy link
Copy Markdown
Member

@taoran92 taoran92 commented Jun 2, 2026

What is the purpose of the change

This PR reduces high CPU usage in MySQL CDC source when synchronizing a large number of tables.

In large-table scenarios, MySQL binlog event processing may repeatedly check whether the same TableId should be included by the configured table filters. The hot path goes through Debezium's table filter
predicates, which rely on regex matching:

java.util.regex.Matcher.match
java.util.regex.Matcher.matches
io.debezium.relational.RelationalTableFilters
io.debezium.connector.mysql.MySqlStreamingChangeEventSource.informAboutUnknownTableIfRequired

When the table list is large or the regex patterns are complex, repeatedly evaluating the same table filter result can consume significant CPU and cause TaskManager CPU usage to stay close to 100%.

This PR caches the table filter result by TableId after constructing the Debezium table filter. The cached filter preserves the existing semantics of the Debezium include filter and Flink CDC excludeTableList, while avoiding repeated regex evaluation for the same table.

Brief change log

  • Cache MySQL CDC table filter results by TableId in MySqlSourceConfig
  • Preserve existing include/exclude table filter semantics when using the cached filter
  • Add unit tests to verify repeated checks for the same table reuse the cached result
  • Add unit tests to verify excludeTableList behavior is unchanged

No configuration option is added because this only caches deterministic table filter results and does not change filtering semantics.

Verifying this change

This change is verified by unit tests:

  • MySqlSourceConfigTest#testCachesTableFilterResults
  • MySqlSourceConfigTest#testTableFilterWithExcludeTableList

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e. is any changed class annotated with @public(@PublicEvolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: no

Documentation

Does this pull request introduce a new feature? no

If yes, how is the feature documented? not applicable

@taoran92
Copy link
Copy Markdown
Member Author

taoran92 commented Jun 2, 2026

@lvyanquan Can u help me review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant