Skip to content

[hotfix] Use explicit UTF-8 charset in coerceToBytes and guard Optional.get() in Kafka serializers#4427

Open
dangzitou wants to merge 1 commit into
apache:masterfrom
dangzitou:fix/charset-coerceToBytes
Open

[hotfix] Use explicit UTF-8 charset in coerceToBytes and guard Optional.get() in Kafka serializers#4427
dangzitou wants to merge 1 commit into
apache:masterfrom
dangzitou:fix/charset-coerceToBytes

Conversation

@dangzitou
Copy link
Copy Markdown

What this PR does

This PR fixes two minor but important issues:

1. Platform-default charset in SchemaMergingUtils.coerceToBytes()

File: flink-cdc-common/.../SchemaMergingUtils.java:617

originalField.toString().getBytes() uses the platform-default charset, which can vary across JVM configurations and operating systems. This causes inconsistent behavior when converting fields to bytes during schema evolution.

Fix: Use StandardCharsets.UTF_8 explicitly.

2. Unchecked Optional.get() in Kafka serialization schemas

Files:

  • CsvSerializationSchema.java:109
  • JsonSerializationSchema.java:132

Both files call schema.getColumn(...).get() without checking if the Optional is present. If a primary key column is missing from the schema (e.g., after schema evolution), this throws a bare NoSuchElementException with no context.

Fix: Replace with orElseThrow() that includes the missing column name in the error message.

Testing

These are straightforward defensive improvements:

  • The charset fix ensures consistent UTF-8 encoding regardless of JVM locale
  • The Optional guard improves error diagnostics without changing happy-path behavior

@dangzitou dangzitou force-pushed the fix/charset-coerceToBytes branch from 98af393 to 64bdcdb Compare June 3, 2026 11:57
…onnectors

Fix platform-default charset usage in String.getBytes() calls:
- SchemaMergingUtils.coerceToBytes()
- DebeziumJsonSerializationSchema.convertDefaultValue()
- RowDataTiKVEventDeserializationSchemaBase (TiDB connector)
- SchemaMergingUtilsTest, BinaryTypeReturningClass, VarBinaryTypeReturningClass

Fix unchecked Optional.get() on schema column lookups:
- CsvSerializationSchema / JsonSerializationSchema (Kafka)
- DorisMetadataApplier / DorisSchemaUtils (Doris)
- StarRocksUtils (StarRocks)

All .getBytes() now use StandardCharsets.UTF_8 explicitly.
All .get() replaced with .orElseThrow() with descriptive error messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant