From 5e7f3ede0702f132cd61f01130d3fa1d1059a305 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Wed, 4 Mar 2026 15:32:21 -0800 Subject: [PATCH 1/6] Draft stress testing and tuning doc --- .../iceberg/iceberg-performance-tuning.adoc | 207 ++++++++++++++++++ 1 file changed, 207 insertions(+) create mode 100644 modules/manage/pages/iceberg/iceberg-performance-tuning.adoc diff --git a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc new file mode 100644 index 0000000000..60b985d362 --- /dev/null +++ b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc @@ -0,0 +1,207 @@ += Tune Performance for Iceberg Topics +:description: Optimize Redpanda Iceberg translation throughput and Parquet file sizes by tuning message size limits, lag configuration, and flush thresholds. +:page-categories: Iceberg, Management +:page-topic-type: best-practices +:personas: ops_admin, streaming_developer +:learning-objective-1: Evaluate the impact of message size on Iceberg translation throughput and Parquet file sizes +:learning-objective-2: Choose appropriate flush threshold and lag target values for large-message workloads +:learning-objective-3: Identify translation performance signals using Iceberg metrics + +// tag::single-source[] + +ifndef::env-cloud[] +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== +endif::[] + +Use this guide to optimize Redpanda Iceberg translation performance. It explains how the translation pipeline works, describes message size limits, and provides recommendations for tuning throughput and Parquet file sizes. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== Prerequisites + +Before tuning Iceberg performance, you need to be familiar with how Iceberg topics work in Redpanda. See xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics]. + +== Translation pipeline overview + +Redpanda translates Kafka topic data to Iceberg format using a set of background _translators_. Each CPU shard runs one translator, which reads data from its assigned topic partitions, writes it to local scratch space as Parquet files, then uploads those files to object storage and commits the changes to the Iceberg catalog. + +Key pipeline characteristics: + +ifndef::env-cloud[] +* *Throughput*: Approximately 5 MiB/s per shard under typical conditions. +* *Lag target*: Controlled by xref:reference:properties/cluster-properties.adoc#iceberg_target_lag_ms[`iceberg_target_lag_ms`] (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window. +* *Flush threshold*: Controlled by xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`] (default: 32 MiB). Each translator uploads its on-disk data when accumulated data reaches this threshold. +endif::[] +ifdef::env-cloud[] +* *Throughput*: Approximately 5 MiB/s per shard under typical conditions. +* *Lag target*: Controlled by `iceberg_target_lag_ms` (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window. +* *Flush threshold*: Controlled by `datalake_translator_flush_bytes` (default: 32 MiB). Each translator uploads its on-disk data when accumulated data reaches this threshold. +endif::[] + +The flush threshold and lag target together determine the size of the Parquet files written to object storage. Larger Parquet files generally improve downstream query performance by reducing the number of metadata operations query engines must perform. + +== Message size limits + +Redpanda has validated 32 MiB as the maximum recommended message size for Iceberg-enabled topics. At this size, a shard produces approximately two messages per Parquet file (assuming one or more Kafka partitions per shard). The 32 MiB figure corresponds to the default value of +ifndef::env-cloud[] +xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`]. +endif::[] +ifdef::env-cloud[] +`datalake_translator_flush_bytes`. +endif::[] + +// TODO: Confirm with PM — include or omit the following? +// From ENG-889 stress testing: 60-80 MiB messages cause OOM conditions in various Redpanda +// subsystems, even with Iceberg disabled. 100 MiB messages are known to fail (customer case). + +//// +[WARNING] +==== +Messages larger than 32 MiB are not recommended for Iceberg-enabled topics. Messages in the 60-80 MiB range can cause out-of-memory (OOM) conditions in Redpanda subsystems, even with Iceberg disabled. Messages of 100 MiB or larger are known to cause failures. + +If your workload requires large messages, see <>. +==== +//// +=== Effect on query performance + +Large messages produce large Parquet files with few records per file. Query engines must load entire Parquet files even when accessing only a subset of columns, which can result in high memory usage and slow scans for analytical workloads. If query latency is a concern, consider: + +* Reducing individual message sizes if your data model allows it +* Applying the tuning in <> to optimize file sizes for your workload + +== Configuration reference + +The following properties are the primary controls for Iceberg translation performance. None require a cluster restart. + +ifndef::env-cloud[] +* xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`] (default: `33554432` / 32 MiB): Per-translator data threshold before uploading on-disk data to object storage. This is the primary control for Parquet file size. +* xref:reference:properties/cluster-properties.adoc#iceberg_target_lag_ms[`iceberg_target_lag_ms`] (default: `60000` / 1 minute): Default lag target for all Iceberg-enabled topics. Override per topic with `redpanda.iceberg.target.lag.ms`. +* xref:reference:properties/cluster-properties.adoc#iceberg_catalog_commit_interval_ms[`iceberg_catalog_commit_interval_ms`] (default: `60000` / 1 minute): Interval between catalog commit transactions across all topics. +* xref:reference:properties/cluster-properties.adoc#iceberg_target_backlog_size[`iceberg_target_backlog_size`] (default: `104857600` / 100 MiB): Average per-partition backlog size target. Controls when the backlog controller increases translation CPU priority. +endif::[] +ifdef::env-cloud[] +* `datalake_translator_flush_bytes` (default: 32 MiB): Per-translator data threshold before uploading on-disk data to object storage. This is the primary control for Parquet file size. +* `iceberg_target_lag_ms` (default: 1 minute): Default lag target for all Iceberg-enabled topics. Override per topic with `redpanda.iceberg.target.lag.ms`. +* `iceberg_catalog_commit_interval_ms` (default: 1 minute): Interval between catalog commit transactions across all topics. +* `iceberg_target_backlog_size` (default: 100 MiB): Average per-partition backlog size target. Controls when the backlog controller increases translation CPU priority. +endif::[] + +// TODO: Looks like only redpanda.iceberg.target.lag.ms is available +// to tune in Cloud. Confirm what to include in this section +ifndef::env-cloud[] +== Tune for large messages + +If your workload consistently produces large messages, increase both the flush threshold and the lag target together. This lets each translator accumulate more data per upload cycle, producing Parquet files with more records per file. + +. Increase `datalake_translator_flush_bytes` to exceed your typical message size. A good starting value is two to four times your average message size: ++ +[,bash] +---- +rpk cluster config set datalake_translator_flush_bytes +---- ++ +For example, for a workload with an average message size of 16 MiB: ++ +[,bash] +---- +rpk cluster config set datalake_translator_flush_bytes 67108864 +---- + +. Increase `iceberg_target_lag_ms` to give translators more time to accumulate data. A value of five minutes is a reasonable starting point for large-message workloads: ++ +[,bash] +---- +rpk cluster config set iceberg_target_lag_ms 300000 +---- ++ +You can also set the lag target per topic using the +xref:reference:properties/topic-properties.adoc#redpanda-iceberg-target-lag-ms[`redpanda.iceberg.target.lag.ms`] topic property. ++ +[NOTE] +==== +Increasing the lag target means Iceberg tables receive new data less frequently. Choose a lag value that balances file efficiency against how current your downstream data must be. +==== + +[TIP] +==== +`datalake_translator_flush_bytes` and `iceberg_target_lag_ms` work best when tuned together. A high flush threshold combined with a short lag window may not improve file sizes if the lag window expires before enough data has accumulated. +==== +endif::[] + +== Backlog control + +When translation falls behind, Redpanda's backlog controller automatically increases the translation scheduling group's CPU priority to help it catch up. If the backlog grows large enough to exceed the throttle threshold, Redpanda applies backpressure to producers to prevent the lag from growing further. + +ifndef::env-cloud[] +The following tunable properties control this behavior. In most cases, the defaults are appropriate. Contact https://support.redpanda.com/hc/en-us/requests/new[Redpanda support^] before adjusting them. + +* xref:reference:properties/cluster-properties.adoc#iceberg_target_backlog_size[`iceberg_target_backlog_size`] (default: `104857600` / 100 MiB): Average per-partition backlog size the controller targets. When exceeded, the controller increases translation scheduling priority. +* xref:reference:properties/cluster-properties.adoc#iceberg_backlog_controller_p_coeff[`iceberg_backlog_controller_p_coeff`] (default: `0.00001`): Proportional coefficient for the backlog controller. +* xref:reference:properties/cluster-properties.adoc#iceberg_backlog_controller_i_coeff[`iceberg_backlog_controller_i_coeff`] (default: `0.005`): Integral coefficient for accumulated backlog errors. +endif::[] +ifdef::env-cloud[] +The backlog control behavior is governed by `iceberg_target_backlog_size`, `iceberg_backlog_controller_p_coeff`, and `iceberg_backlog_controller_i_coeff`. In most cases, the defaults are appropriate. Contact https://support.redpanda.com/hc/en-us/requests/new[Redpanda support^] before adjusting them. +endif::[] + +== Partition count limits + +// Max partition count testing in progress + +For general partitioning best practices in the meantime, see xref:manage:iceberg/about-iceberg-topics.adoc#use-custom-partitioning[Use custom partitioning]. + +== Monitor translation performance + +Use the following xref:reference:public-metrics-reference.adoc#iceberg-metrics[Iceberg metrics] to understand whether translation is keeping pace with incoming data: + +* *Translation lag*: Compare the rate of `redpanda_iceberg_translation_parquet_rows_added` or `redpanda_iceberg_translation_raw_bytes_processed` against your source write rate. A widening gap indicates translation is falling behind. No single metric measures translation lag directly. These rate comparisons are the primary signal. +* *CPU utilization*: Translation is CPU-intensive. Monitor xref:reference:public-metrics-reference.adoc#infrastructure-metrics[infrastructure metrics] such as `redpanda_cpu_busy_seconds_total` for sustained high utilization, which may indicate the cluster is undersized for the combined broker and translation workload. + +=== Iceberg translation metrics + +The following metrics provide detail on translation throughput, file output, and errors: + +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_raw_bytes_processed[`redpanda_iceberg_translation_raw_bytes_processed`]: Total raw bytes consumed for translation input. Use this to monitor input throughput and compare against the expected 5 MiB/s per shard baseline. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_parquet_bytes_added[`redpanda_iceberg_translation_parquet_bytes_added`]: Total bytes written to Parquet files. Divide by `redpanda_iceberg_translation_files_created` to estimate the average file size produced by your workload. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_files_created[`redpanda_iceberg_translation_files_created`]: Number of Parquet files created. A high file creation rate relative to bytes added indicates many small files. Consider increasing `datalake_translator_flush_bytes` and `iceberg_target_lag_ms`. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_parquet_rows_added[`redpanda_iceberg_translation_parquet_rows_added`]: Total rows written to Parquet files. Useful for understanding record-level throughput. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of dead letter queue (DLQ) Parquet files created. A non-zero and increasing value indicates records are failing to translate. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_translations_finished[`redpanda_iceberg_translation_translations_finished`]: Number of completed translator executions. A stalling or zero rate indicates translation has stopped. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_rest_client_num_commit_table_update_requests_failed[`redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`]: Failed table commit requests to the REST catalog. Applies only when using a REST catalog (`iceberg_catalog_type: rest`). Persistent failures indicate catalog connectivity or permission issues. + +ifndef::env-cloud[] +To check the current values of key translation cluster properties: + +[,bash] +---- +rpk cluster config get datalake_translator_flush_bytes +rpk cluster config get iceberg_target_lag_ms +rpk cluster config get iceberg_target_backlog_size +---- +endif::[] + +[TIP] +==== +If translation consistently lags despite available CPU headroom, the workload may be partition-bound. Each shard translates its assigned partitions independently, so distributing data across more partitions allows more shards to contribute to translation and can improve total throughput. +==== + +== Troubleshoot Parquet read performance + +This section covers internal pipeline details that are relevant only if your query engine reports unexpectedly poor performance when reading the Parquet files generated by Redpanda. + +=== Page size and flush interval + +Redpanda's translator uses a 512 KiB internal page size for Parquet files, and pages are flushed from memory at most every 30 seconds. These values are not user-configurable. + +When a message contains a field whose data approaches 512 KiB, the resulting Parquet page may be larger than expected. This does not affect data correctness but can increase the memory requirements for query engines reading those files. + +If your query engine is reporting unexpectedly large Parquet pages or high per-query memory usage, review your message schemas for fields approaching 512 KiB in size. Splitting large fields or reducing field sizes can help. Contact https://support.redpanda.com/hc/en-us/requests/new[Redpanda support^] for additional guidance. + +// end::single-source[] From 76d5823b3f569cc055e3f70bc381b4fcd34bc791 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 24 Mar 2026 13:27:44 -0700 Subject: [PATCH 2/6] Reorganize draft based on SME feedback --- .../pages/iceberg/about-iceberg-topics.adoc | 129 +------------ .../iceberg/iceberg-performance-tuning.adoc | 170 +++++++----------- .../iceberg/iceberg-topics-gcp-biglake.adoc | 4 +- .../iceberg/iceberg-troubleshooting.adoc | 109 +++++++++++ .../pages/iceberg/specify-iceberg-schema.adoc | 4 +- 5 files changed, 176 insertions(+), 240 deletions(-) create mode 100644 modules/manage/pages/iceberg/iceberg-troubleshooting.adoc diff --git a/modules/manage/pages/iceberg/about-iceberg-topics.adoc b/modules/manage/pages/iceberg/about-iceberg-topics.adoc index d785e11fd7..621db28358 100644 --- a/modules/manage/pages/iceberg/about-iceberg-topics.adoc +++ b/modules/manage/pages/iceberg/about-iceberg-topics.adoc @@ -312,136 +312,11 @@ Querying the Iceberg table for `demo-topic` includes the new column `ts`: +---------+--------------+--------------------------+ ---- -== Troubleshoot errors - -If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate dead-letter queue (DLQ) Iceberg table named `~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format: - -- Redpanda cannot find the embedded schema ID in the Schema Registry. -- Redpanda fails to translate one or more schema data types to an Iceberg type. -- In `value_schema_id_prefix` mode, you do not use the Schema Registry wire format with the magic byte. - -The DLQ table itself uses the `key_value` schema, consisting of two columns: the record metadata including the key, and a binary column for the record's value. - -NOTE: Topic property misconfiguration, such as xref:manage:iceberg/specify-iceberg-schema.adoc#override-value-schema-latest-default[overriding the default behavior of `value_schema_latest` mode] but not specifying the fully qualified Protobuf message name, does not cause records to be written to the DLQ table. Instead, Redpanda pauses the topic data translation to the Iceberg table until you fix the misconfiguration. - -=== Inspect DLQ table - -You can inspect the DLQ table for records that failed to write to the Iceberg table, and you can take further action on these records, such as transforming and reprocessing them, or debugging issues that occurred upstream. - -The following example produces a record to a topic named `ClickEvent` and does not use the Schema Registry wire format that includes the magic byte and schema ID: - -[,bash,role=no-copy] ----- -echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' ----- - -Querying the DLQ table returns the record that was not translated: - -[,sql] ----- -SELECT - value -FROM ."ClickEvent~dlq"; -- Fully qualified table name ----- - -[,bash,role=no-copy] ----- -+-------------------------------------------------+ -| value | -+-------------------------------------------------+ -| 7b 22 75 73 65 72 5f 69 64 22 3a 32 33 32 34 2c | -| 22 65 76 65 6e 74 5f 74 79 70 65 22 3a 22 42 55 | -| 54 54 4f 4e 5f 43 4c 49 43 4b 22 2c 22 74 73 22 | -| 3a 22 32 30 32 34 2d 31 31 2d 32 35 54 32 30 3a | -| 32 33 3a 35 39 2e 33 38 30 5a 22 7d | -+-------------------------------------------------+ ----- - -The data is in binary format, and the first byte is not `0x00`, indicating that it was not produced with a schema. - -=== Reprocess DLQ records - -You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some engines may automatically decode the binary value for you: - -.ClickHouse SQL example to reprocess DLQ record -[,sql] ----- -SELECT - CAST(jsonExtractString(json, 'user_id') AS Int32) AS user_id, - jsonExtractString(json, 'event_type') AS event_type, - jsonExtractString(json, 'ts') AS ts -FROM ( - SELECT - CAST(value AS String) AS json - FROM .`ClickEvent~dlq` -- Ensure that the table name is properly parsed -); ----- - -[,bash,role=no-copy] ----- -+---------+--------------+--------------------------+ -| user_id | event_type | ts | -+---------+--------------+--------------------------+ -| 2324 | BUTTON_CLICK | 2024-11-25T20:23:59.380Z | -+---------+--------------+--------------------------+ ----- - -You can now insert the transformed record back into the main Iceberg table. Redpanda recommends employing a strategy for exactly-once processing to avoid duplicates when reprocessing records. - -=== Drop invalid records - -ifndef::env-cloud[] -To disable the default behavior and drop an invalid record, set the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-invalid-record-action[`redpanda.iceberg.invalid.record.action`] topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. -endif::[] -ifdef::env-cloud[] -To disable the default behavior and drop an invalid record, set the `redpanda.iceberg.invalid.record.action` topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. -endif::[] - -== Performance considerations - -When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster. - -You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team. - -=== Use custom partitioning - -ifndef::env-cloud[] -To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-partition-spec[`redpanda.iceberg.partition.spec`] topic property to define the partitioning scheme: -endif::[] -ifdef::env-cloud[] -To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the `redpanda.iceberg.partition.spec` topic property to define the partitioning scheme: -endif::[] - -[,bash,] ----- -# Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg -rpk topic create -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(, , ...)" ----- - -Valid `` values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as `redpanda.timestamp`) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification. - -For example: - -* To partition the table by a single key, such as a column `col1`, use: `redpanda.iceberg.partition.spec=(col1)`. -* To partition by multiple columns, use a comma-separated list: `redpanda.iceberg.partition.spec=(col1, col2)`. -* To partition by the year of a timestamp column `ts1`, and a string column `col1`, use: `redpanda.iceberg.partition.spec=(year(ts1), col1)`. - -To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the https://iceberg.apache.org/spec/#partitioning[Apache Iceberg documentation^]. - -[TIP] -==== -* Partition by columns that you frequently use in queries. Columns with relatively few unique values, also known as low cardinality, are also good candidates for partitioning. -* If you must partition based on columns with high cardinality, for example timestamps, use Iceberg's available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed. -==== - -=== Avoid high column count - -A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics. - == Next steps * xref:manage:iceberg/use-iceberg-catalogs.adoc[] -* xref:manage:iceberg/migrate-to-iceberg-topics.adoc[Migrate existing Iceberg integrations to Iceberg Topics] +* xref:manage:iceberg/iceberg-performance-tuning.adoc[Tune Performance for Iceberg Topics] +* xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics] == Suggested reading diff --git a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc index 60b985d362..5da48531be 100644 --- a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc +++ b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc @@ -1,22 +1,18 @@ = Tune Performance for Iceberg Topics -:description: Optimize Redpanda Iceberg translation throughput and Parquet file sizes by tuning message size limits, lag configuration, and flush thresholds. +:description: Optimize query performance and translation throughput for Iceberg topics with partitioning, compaction, flush threshold tuning, and cluster sizing guidance. :page-categories: Iceberg, Management :page-topic-type: best-practices :personas: ops_admin, streaming_developer -:learning-objective-1: Evaluate the impact of message size on Iceberg translation throughput and Parquet file sizes -:learning-objective-2: Choose appropriate flush threshold and lag target values for large-message workloads +:learning-objective-1: Apply partitioning and compaction strategies to improve query performance +:learning-objective-2: Choose appropriate flush threshold and lag target values for your workload :learning-objective-3: Identify translation performance signals using Iceberg metrics -// tag::single-source[] - -ifndef::env-cloud[] [NOTE] ==== include::shared:partial$enterprise-license.adoc[] ==== -endif::[] -Use this guide to optimize Redpanda Iceberg translation performance. It explains how the translation pipeline works, describes message size limits, and provides recommendations for tuning throughput and Parquet file sizes. +Use this guide to optimize the performance of Iceberg topics in Redpanda. It covers strategies for improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput. After reading this page, you will be able to: @@ -28,93 +24,80 @@ After reading this page, you will be able to: Before tuning Iceberg performance, you need to be familiar with how Iceberg topics work in Redpanda. See xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics]. -== Translation pipeline overview +== Optimize query performance -Redpanda translates Kafka topic data to Iceberg format using a set of background _translators_. Each CPU shard runs one translator, which reads data from its assigned topic partitions, writes it to local scratch space as Parquet files, then uploads those files to object storage and commits the changes to the Iceberg catalog. +Query engines read Parquet files from object storage to process Iceberg table data. Partitioning, compaction, and schema design affect how efficiently those reads perform. -Key pipeline characteristics: +=== Use custom partitioning -ifndef::env-cloud[] -* *Throughput*: Approximately 5 MiB/s per shard under typical conditions. -* *Lag target*: Controlled by xref:reference:properties/cluster-properties.adoc#iceberg_target_lag_ms[`iceberg_target_lag_ms`] (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window. -* *Flush threshold*: Controlled by xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`] (default: 32 MiB). Each translator uploads its on-disk data when accumulated data reaches this threshold. -endif::[] -ifdef::env-cloud[] -* *Throughput*: Approximately 5 MiB/s per shard under typical conditions. -* *Lag target*: Controlled by `iceberg_target_lag_ms` (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window. -* *Flush threshold*: Controlled by `datalake_translator_flush_bytes` (default: 32 MiB). Each translator uploads its on-disk data when accumulated data reaches this threshold. -endif::[] +To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-partition-spec[`redpanda.iceberg.partition.spec`] topic property to define the partitioning scheme: -The flush threshold and lag target together determine the size of the Parquet files written to object storage. Larger Parquet files generally improve downstream query performance by reducing the number of metadata operations query engines must perform. +[,bash,] +---- +# Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg +rpk topic create -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(, , ...)" +---- -== Message size limits +Valid `` values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as `redpanda.timestamp`) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification. -Redpanda has validated 32 MiB as the maximum recommended message size for Iceberg-enabled topics. At this size, a shard produces approximately two messages per Parquet file (assuming one or more Kafka partitions per shard). The 32 MiB figure corresponds to the default value of -ifndef::env-cloud[] -xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`]. -endif::[] -ifdef::env-cloud[] -`datalake_translator_flush_bytes`. -endif::[] +For example: -// TODO: Confirm with PM — include or omit the following? -// From ENG-889 stress testing: 60-80 MiB messages cause OOM conditions in various Redpanda -// subsystems, even with Iceberg disabled. 100 MiB messages are known to fail (customer case). +* To partition the table by a single key, such as a column `col1`, use: `redpanda.iceberg.partition.spec=(col1)`. +* To partition by multiple columns, use a comma-separated list: `redpanda.iceberg.partition.spec=(col1, col2)`. +* To partition by the year of a timestamp column `ts1`, and a string column `col1`, use: `redpanda.iceberg.partition.spec=(year(ts1), col1)`. -//// -[WARNING] -==== -Messages larger than 32 MiB are not recommended for Iceberg-enabled topics. Messages in the 60-80 MiB range can cause out-of-memory (OOM) conditions in Redpanda subsystems, even with Iceberg disabled. Messages of 100 MiB or larger are known to cause failures. +To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the https://iceberg.apache.org/spec/#partitioning[Apache Iceberg documentation^]. -If your workload requires large messages, see <>. +[TIP] +==== +* Partition by columns that you frequently use in queries. Columns with relatively few unique values, also known as low cardinality, are also good candidates for partitioning. +* If you must partition based on columns with high cardinality, for example timestamps, use Iceberg's available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed. ==== -//// -=== Effect on query performance -Large messages produce large Parquet files with few records per file. Query engines must load entire Parquet files even when accessing only a subset of columns, which can result in high memory usage and slow scans for analytical workloads. If query latency is a concern, consider: +=== Compact Iceberg tables -* Reducing individual message sizes if your data model allows it -* Applying the tuning in <> to optimize file sizes for your workload +Over time, Iceberg translation can produce many small Parquet files, especially with low-throughput topics or short lag targets. Compaction merges small files into larger ones, reducing the number of metadata operations query engines must perform and improving read performance. -== Configuration reference +* Managed service: Some managed query engines and data platforms, such as Snowflake and Databricks, automatically compact Iceberg tables. +* Manual or scheduled compaction: Tools like https://spark.apache.org/[Apache Spark^] can run compaction jobs on a schedule. This is useful if your query engine does not compact automatically. -The following properties are the primary controls for Iceberg translation performance. None require a cluster restart. +If you observe degraded read performance or a high number of small files, investigate whether your query engine supports automatic compaction or schedule periodic compaction jobs. -ifndef::env-cloud[] -* xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`] (default: `33554432` / 32 MiB): Per-translator data threshold before uploading on-disk data to object storage. This is the primary control for Parquet file size. -* xref:reference:properties/cluster-properties.adoc#iceberg_target_lag_ms[`iceberg_target_lag_ms`] (default: `60000` / 1 minute): Default lag target for all Iceberg-enabled topics. Override per topic with `redpanda.iceberg.target.lag.ms`. -* xref:reference:properties/cluster-properties.adoc#iceberg_catalog_commit_interval_ms[`iceberg_catalog_commit_interval_ms`] (default: `60000` / 1 minute): Interval between catalog commit transactions across all topics. -* xref:reference:properties/cluster-properties.adoc#iceberg_target_backlog_size[`iceberg_target_backlog_size`] (default: `104857600` / 100 MiB): Average per-partition backlog size target. Controls when the backlog controller increases translation CPU priority. -endif::[] -ifdef::env-cloud[] -* `datalake_translator_flush_bytes` (default: 32 MiB): Per-translator data threshold before uploading on-disk data to object storage. This is the primary control for Parquet file size. -* `iceberg_target_lag_ms` (default: 1 minute): Default lag target for all Iceberg-enabled topics. Override per topic with `redpanda.iceberg.target.lag.ms`. -* `iceberg_catalog_commit_interval_ms` (default: 1 minute): Interval between catalog commit transactions across all topics. -* `iceberg_target_backlog_size` (default: 100 MiB): Average per-partition backlog size target. Controls when the backlog controller increases translation CPU priority. -endif::[] +=== Avoid high column count -// TODO: Looks like only redpanda.iceberg.target.lag.ms is available -// to tune in Cloud. Confirm what to include in this section -ifndef::env-cloud[] -== Tune for large messages +A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics. + +== Tune translation performance + +Translation is the process in which Redpanda converts topic data into Parquet files for the Iceberg table. Each round of translation processes one topic partition at a time. + +Under typical conditions, Iceberg translation has the following performance characteristics: + +* Throughput: Approximately 5 MiB/s per core. +* Flush threshold: Controlled by xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`] (default: 32 MiB). Each translation process uploads its on-disk data when accumulated data reaches this threshold. This is the primary control for Parquet file size. +* Lag target: Controlled by xref:reference:properties/cluster-properties.adoc#iceberg_target_lag_ms[`iceberg_target_lag_ms`] (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window. + +The flush threshold and lag target together determine the size of the Parquet files written to object storage. Larger Parquet files generally improve downstream query performance by reducing the number of metadata operations query engines must perform. -If your workload consistently produces large messages, increase both the flush threshold and the lag target together. This lets each translator accumulate more data per upload cycle, producing Parquet files with more records per file. +=== Tune flush threshold and lag target -. Increase `datalake_translator_flush_bytes` to exceed your typical message size. A good starting value is two to four times your average message size: +Increase both the flush threshold and the lag target together to produce larger Parquet files with more records per file. This is the primary way to tune Iceberg translation performance. + +. Increase `datalake_translator_flush_bytes` to control the size of Parquet files. A good starting value depends on your workload: + [,bash] ---- rpk cluster config set datalake_translator_flush_bytes ---- + -For example, for a workload with an average message size of 16 MiB: +For example, to set a 64 MiB flush threshold: + [,bash] ---- rpk cluster config set datalake_translator_flush_bytes 67108864 ---- -. Increase `iceberg_target_lag_ms` to give translators more time to accumulate data. A value of five minutes is a reasonable starting point for large-message workloads: +. Increase `iceberg_target_lag_ms` to give translators more time to accumulate data before committing: + [,bash] ---- @@ -133,50 +116,34 @@ Increasing the lag target means Iceberg tables receive new data less frequently. ==== `datalake_translator_flush_bytes` and `iceberg_target_lag_ms` work best when tuned together. A high flush threshold combined with a short lag window may not improve file sizes if the lag window expires before enough data has accumulated. ==== -endif::[] -== Backlog control +=== Message size -When translation falls behind, Redpanda's backlog controller automatically increases the translation scheduling group's CPU priority to help it catch up. If the backlog grows large enough to exceed the throttle threshold, Redpanda applies backpressure to producers to prevent the lag from growing further. +Redpanda has validated 32 MiB as the maximum recommended message size for Iceberg-enabled topics. With large messages, each Parquet file contains fewer records because the flush threshold is reached sooner. This can reduce the efficiency of analytical queries that need to scan many records. -ifndef::env-cloud[] -The following tunable properties control this behavior. In most cases, the defaults are appropriate. Contact https://support.redpanda.com/hc/en-us/requests/new[Redpanda support^] before adjusting them. +If query latency is a concern and your workload produces large messages, consider: -* xref:reference:properties/cluster-properties.adoc#iceberg_target_backlog_size[`iceberg_target_backlog_size`] (default: `104857600` / 100 MiB): Average per-partition backlog size the controller targets. When exceeded, the controller increases translation scheduling priority. -* xref:reference:properties/cluster-properties.adoc#iceberg_backlog_controller_p_coeff[`iceberg_backlog_controller_p_coeff`] (default: `0.00001`): Proportional coefficient for the backlog controller. -* xref:reference:properties/cluster-properties.adoc#iceberg_backlog_controller_i_coeff[`iceberg_backlog_controller_i_coeff`] (default: `0.005`): Integral coefficient for accumulated backlog errors. -endif::[] -ifdef::env-cloud[] -The backlog control behavior is governed by `iceberg_target_backlog_size`, `iceberg_backlog_controller_p_coeff`, and `iceberg_backlog_controller_i_coeff`. In most cases, the defaults are appropriate. Contact https://support.redpanda.com/hc/en-us/requests/new[Redpanda support^] before adjusting them. -endif::[] +* Reducing individual message sizes if your data model allows it. +* Increasing `datalake_translator_flush_bytes` to produce Parquet files with more records per file. -== Partition count limits +=== Cluster sizing and backpressure -// Max partition count testing in progress +When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster. -For general partitioning best practices in the meantime, see xref:manage:iceberg/about-iceberg-topics.adoc#use-custom-partitioning[Use custom partitioning]. +You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team. -== Monitor translation performance +=== Monitor translation performance Use the following xref:reference:public-metrics-reference.adoc#iceberg-metrics[Iceberg metrics] to understand whether translation is keeping pace with incoming data: -* *Translation lag*: Compare the rate of `redpanda_iceberg_translation_parquet_rows_added` or `redpanda_iceberg_translation_raw_bytes_processed` against your source write rate. A widening gap indicates translation is falling behind. No single metric measures translation lag directly. These rate comparisons are the primary signal. -* *CPU utilization*: Translation is CPU-intensive. Monitor xref:reference:public-metrics-reference.adoc#infrastructure-metrics[infrastructure metrics] such as `redpanda_cpu_busy_seconds_total` for sustained high utilization, which may indicate the cluster is undersized for the combined broker and translation workload. - -=== Iceberg translation metrics - -The following metrics provide detail on translation throughput, file output, and errors: - -* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_raw_bytes_processed[`redpanda_iceberg_translation_raw_bytes_processed`]: Total raw bytes consumed for translation input. Use this to monitor input throughput and compare against the expected 5 MiB/s per shard baseline. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_raw_bytes_processed[`redpanda_iceberg_translation_raw_bytes_processed`]: Total raw bytes consumed for translation input. Use this to monitor input throughput and compare against the expected 5 MiB/s per core baseline. * xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_parquet_bytes_added[`redpanda_iceberg_translation_parquet_bytes_added`]: Total bytes written to Parquet files. Divide by `redpanda_iceberg_translation_files_created` to estimate the average file size produced by your workload. * xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_files_created[`redpanda_iceberg_translation_files_created`]: Number of Parquet files created. A high file creation rate relative to bytes added indicates many small files. Consider increasing `datalake_translator_flush_bytes` and `iceberg_target_lag_ms`. * xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_parquet_rows_added[`redpanda_iceberg_translation_parquet_rows_added`]: Total rows written to Parquet files. Useful for understanding record-level throughput. -* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of dead letter queue (DLQ) Parquet files created. A non-zero and increasing value indicates records are failing to translate. -* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause. * xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_translations_finished[`redpanda_iceberg_translation_translations_finished`]: Number of completed translator executions. A stalling or zero rate indicates translation has stopped. -* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_rest_client_num_commit_table_update_requests_failed[`redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`]: Failed table commit requests to the REST catalog. Applies only when using a REST catalog (`iceberg_catalog_type: rest`). Persistent failures indicate catalog connectivity or permission issues. -ifndef::env-cloud[] +For metrics related to DLQ files, invalid records, and catalog commit failures, see xref:manage:iceberg/iceberg-troubleshooting.adoc#troubleshooting-metrics[Troubleshooting metrics]. + To check the current values of key translation cluster properties: [,bash] @@ -185,23 +152,8 @@ rpk cluster config get datalake_translator_flush_bytes rpk cluster config get iceberg_target_lag_ms rpk cluster config get iceberg_target_backlog_size ---- -endif::[] [TIP] ==== -If translation consistently lags despite available CPU headroom, the workload may be partition-bound. Each shard translates its assigned partitions independently, so distributing data across more partitions allows more shards to contribute to translation and can improve total throughput. +If translation consistently lags despite available CPU headroom, the workload may be partition-bound. Each core translates its assigned partitions independently, so distributing data across more partitions allows more cores to contribute to translation and can improve total throughput. ==== - -== Troubleshoot Parquet read performance - -This section covers internal pipeline details that are relevant only if your query engine reports unexpectedly poor performance when reading the Parquet files generated by Redpanda. - -=== Page size and flush interval - -Redpanda's translator uses a 512 KiB internal page size for Parquet files, and pages are flushed from memory at most every 30 seconds. These values are not user-configurable. - -When a message contains a field whose data approaches 512 KiB, the resulting Parquet page may be larger than expected. This does not affect data correctness but can increase the memory requirements for query engines reading those files. - -If your query engine is reporting unexpectedly large Parquet pages or high per-query memory usage, review your message schemas for fields approaching 512 KiB in size. Splitting large fields or reducing field sizes can help. Contact https://support.redpanda.com/hc/en-us/requests/new[Redpanda support^] for additional guidance. - -// end::single-source[] diff --git a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc index ba52d60375..d4d2c7b446 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc @@ -246,7 +246,7 @@ iceberg_dlq_table_suffix: _dlq + -- * Replace `` with your bucket name and `` with your Google Cloud project ID. -* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[dead-letter queue (DLQ) table]. +* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue (DLQ) table]. -- + NOTE: If you edit `bootstrap.yml`, you can skip the cluster configuration step in <> and proceed to the next step in that section to enable Iceberg for a topic. @@ -293,7 +293,7 @@ iceberg_dlq_table_suffix: _dlq + -- * Replace `` with your bucket name and `` with your Google Cloud project ID. -* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[dead-letter queue (DLQ) table]. +* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue (DLQ) table]. -- ifndef::env-cloud[] diff --git a/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc b/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc new file mode 100644 index 0000000000..79161bdf71 --- /dev/null +++ b/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc @@ -0,0 +1,109 @@ += Troubleshoot Iceberg Topics +:description: Diagnose and resolve errors in Redpanda Iceberg translation, including dead-letter queue inspection and record reprocessing. +:page-categories: Iceberg, Troubleshooting + +// tag::single-source[] + +ifndef::env-cloud[] +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== +endif::[] + +This page covers how to diagnose and resolve errors that occur during Iceberg translation, including working with dead-letter queue (DLQ) tables and handling invalid records. + +== Dead-letter queue (DLQ) + +If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate dead-letter queue (DLQ) Iceberg table named `~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format: + +- Redpanda cannot find the embedded schema ID in the Schema Registry. +- Redpanda fails to translate one or more schema data types to an Iceberg type. +- In `value_schema_id_prefix` mode, you do not use the Schema Registry wire format with the magic byte. + +The DLQ table itself uses the `key_value` schema, consisting of two columns: the record metadata including the key, and a binary column for the record's value. + +NOTE: Topic property misconfiguration, such as xref:manage:iceberg/specify-iceberg-schema.adoc#override-value-schema-latest-default[overriding the default behavior of `value_schema_latest` mode] but not specifying the fully qualified Protobuf message name, does not cause records to be written to the DLQ table. Instead, Redpanda pauses the topic data translation to the Iceberg table until you fix the misconfiguration. + +=== Inspect DLQ table + +You can inspect the DLQ table for records that failed to write to the Iceberg table, and you can take further action on these records, such as transforming and reprocessing them, or debugging issues that occurred upstream. + +The following example produces a record to a topic named `ClickEvent` and does not use the Schema Registry wire format that includes the magic byte and schema ID: + +[,bash,role=no-copy] +---- +echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' +---- + +Querying the DLQ table returns the record that was not translated: + +[,sql] +---- +SELECT + value +FROM ."ClickEvent~dlq"; -- Fully qualified table name +---- + +[,bash,role=no-copy] +---- ++-------------------------------------------------+ +| value | ++-------------------------------------------------+ +| 7b 22 75 73 65 72 5f 69 64 22 3a 32 33 32 34 2c | +| 22 65 76 65 6e 74 5f 74 79 70 65 22 3a 22 42 55 | +| 54 54 4f 4e 5f 43 4c 49 43 4b 22 2c 22 74 73 22 | +| 3a 22 32 30 32 34 2d 31 31 2d 32 35 54 32 30 3a | +| 32 33 3a 35 39 2e 33 38 30 5a 22 7d | ++-------------------------------------------------+ +---- + +The data is in binary format, and the first byte is not `0x00`, indicating that it was not produced with a schema. + +=== Reprocess DLQ records + +You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some engines may automatically decode the binary value for you: + +.ClickHouse SQL example to reprocess DLQ record +[,sql] +---- +SELECT + CAST(jsonExtractString(json, 'user_id') AS Int32) AS user_id, + jsonExtractString(json, 'event_type') AS event_type, + jsonExtractString(json, 'ts') AS ts +FROM ( + SELECT + CAST(value AS String) AS json + FROM .`ClickEvent~dlq` -- Ensure that the table name is properly parsed +); +---- + +[,bash,role=no-copy] +---- ++---------+--------------+--------------------------+ +| user_id | event_type | ts | ++---------+--------------+--------------------------+ +| 2324 | BUTTON_CLICK | 2024-11-25T20:23:59.380Z | ++---------+--------------+--------------------------+ +---- + +You can now insert the transformed record back into the main Iceberg table. Redpanda recommends employing a strategy for exactly-once processing to avoid duplicates when reprocessing records. + +=== Drop invalid records + +ifndef::env-cloud[] +To disable the default behavior and drop an invalid record, set the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-invalid-record-action[`redpanda.iceberg.invalid.record.action`] topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. +endif::[] +ifdef::env-cloud[] +To disable the default behavior and drop an invalid record, set the `redpanda.iceberg.invalid.record.action` topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. +endif::[] + +== Troubleshooting metrics + +The following xref:reference:public-metrics-reference.adoc#iceberg-metrics[Iceberg metrics] help identify translation errors, invalid records, and catalog connectivity issues: + +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of dead letter queue (DLQ) Parquet files created. A non-zero and increasing value indicates records are failing to translate. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_rest_client_num_commit_table_update_requests_failed[`redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`]: Failed table commit requests to the REST catalog. Applies only when using a REST catalog (`iceberg_catalog_type: rest`). Persistent failures indicate catalog connectivity or permission issues. + +// end::single-source[] diff --git a/modules/manage/pages/iceberg/specify-iceberg-schema.adoc b/modules/manage/pages/iceberg/specify-iceberg-schema.adoc index d8ccd38c37..006cc152df 100644 --- a/modules/manage/pages/iceberg/specify-iceberg-schema.adoc +++ b/modules/manage/pages/iceberg/specify-iceberg-schema.adoc @@ -60,7 +60,7 @@ The following modes are compatible with producing to an Iceberg topic using Redp - `key_value` - Starting in version 25.2, `value_schema_latest` with a JSON schema -Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/about-iceberg-topics.adoc#manage-dead-letter-queue[dead-letter queue]. +Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue]. ==== == Configure Iceberg mode for a topic @@ -184,7 +184,7 @@ CREATE TABLE ClickEvent ( As you produce records to the topic, the data also becomes available in object storage for Iceberg-compatible clients to consume. You can use the same analytical tools to xref:manage:iceberg/query-iceberg-topics.adoc[read the Iceberg topic data] in a data lake as you would for a relational database. -If Redpanda fails to translate the record to the columnar format as defined by the schema, it writes the record to a dead-letter queue (DLQ) table. See xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[Troubleshoot errors] for more information. +If Redpanda fails to translate the record to the columnar format as defined by the schema, it writes the record to a dead-letter queue (DLQ) table. See xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics] for more information. NOTE: You cannot use schemas to parse or decode record keys for Iceberg. The record keys are always stored in binary format in the `redpanda.key` column. From 84739973b349c8cc80ea0251f3d2fd74ee660749 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 24 Mar 2026 13:36:21 -0700 Subject: [PATCH 3/6] Update Iceberg xrefs --- docs-data/property-overrides.json | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs-data/property-overrides.json b/docs-data/property-overrides.json index b22cd6c0b9..6011cbe92a 100644 --- a/docs-data/property-overrides.json +++ b/docs-data/property-overrides.json @@ -965,7 +965,7 @@ "description": "Default value for the `redpanda.iceberg.invalid.record.action` topic property.", "related_topics": [ "self-managed-only: xref:reference:properties/topic-properties.adoc#redpanda-iceberg-invalid-record-action[`redpanda.iceberg.invalid.record.action`]", - "self-managed-only: xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[Troubleshoot errors]" + "self-managed-only: xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics]" ], "config_scope": "cluster" }, @@ -1710,7 +1710,7 @@ "redpanda.iceberg.invalid.record.action": { "description": "Whether to write invalid records to a dead-letter queue (DLQ).", "related_topics": [ - "xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[Troubleshoot errors]" + "xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics]" ], "config_scope": "topic" }, @@ -1724,7 +1724,7 @@ "redpanda.iceberg.partition.spec": { "description": "The link:https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] specification for the Iceberg table.", "related_topics": [ - "xref:manage:iceberg/about-iceberg-topics.adoc#use-custom-partitioning[Use custom partitioning]" + "xref:manage:iceberg/iceberg-performance-tuning.adoc#use-custom-partitioning[Use custom partitioning]" ], "config_scope": "topic" }, From d4ec3b6f15f9dc75fb06d9a74725444754e27ed1 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 24 Mar 2026 14:02:05 -0700 Subject: [PATCH 4/6] Minor xref edit --- modules/manage/pages/iceberg/about-iceberg-topics.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/manage/pages/iceberg/about-iceberg-topics.adoc b/modules/manage/pages/iceberg/about-iceberg-topics.adoc index 621db28358..6b179d6f22 100644 --- a/modules/manage/pages/iceberg/about-iceberg-topics.adoc +++ b/modules/manage/pages/iceberg/about-iceberg-topics.adoc @@ -315,8 +315,8 @@ Querying the Iceberg table for `demo-topic` includes the new column `ts`: == Next steps * xref:manage:iceberg/use-iceberg-catalogs.adoc[] -* xref:manage:iceberg/iceberg-performance-tuning.adoc[Tune Performance for Iceberg Topics] -* xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics] +* xref:manage:iceberg/iceberg-performance-tuning.adoc[] +* xref:manage:iceberg/iceberg-troubleshooting.adoc[] == Suggested reading From d9955d43dbb95e2283333d78a4bec80b3a06f1da Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 24 Mar 2026 14:55:39 -0700 Subject: [PATCH 5/6] Style improvements --- .../iceberg/iceberg-performance-tuning.adoc | 8 +++--- .../iceberg/iceberg-topics-gcp-biglake.adoc | 4 +-- .../iceberg/iceberg-troubleshooting.adoc | 25 +++++++++++++------ .../pages/iceberg/specify-iceberg-schema.adoc | 2 +- 4 files changed, 24 insertions(+), 15 deletions(-) diff --git a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc index 5da48531be..5bcc91f8bb 100644 --- a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc +++ b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc @@ -12,7 +12,7 @@ include::shared:partial$enterprise-license.adoc[] ==== -Use this guide to optimize the performance of Iceberg topics in Redpanda. It covers strategies for improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput. +This guide covers strategies for optimizing the performance of Iceberg topics in Redpanda, including improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput. After reading this page, you will be able to: @@ -22,7 +22,7 @@ After reading this page, you will be able to: == Prerequisites -Before tuning Iceberg performance, you need to be familiar with how Iceberg topics work in Redpanda. See xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics]. +You must be familiar with how Iceberg topics work in Redpanda. See xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics]. == Optimize query performance @@ -32,7 +32,7 @@ Query engines read Parquet files from object storage to process Iceberg table da To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-partition-spec[`redpanda.iceberg.partition.spec`] topic property to define the partitioning scheme: -[,bash,] +[,bash] ---- # Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg rpk topic create -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(, , ...)" @@ -50,7 +50,7 @@ To learn more about how partitioning schemes can affect query performance, and f [TIP] ==== -* Partition by columns that you frequently use in queries. Columns with relatively few unique values, also known as low cardinality, are also good candidates for partitioning. +* Partition by columns that you frequently use in queries. Columns with relatively few unique values (low cardinality) are good candidates for partitioning. * If you must partition based on columns with high cardinality, for example timestamps, use Iceberg's available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed. ==== diff --git a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc index d4d2c7b446..076cc1a54e 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc @@ -246,7 +246,7 @@ iceberg_dlq_table_suffix: _dlq + -- * Replace `` with your bucket name and `` with your Google Cloud project ID. -* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue (DLQ) table]. +* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table]. -- + NOTE: If you edit `bootstrap.yml`, you can skip the cluster configuration step in <> and proceed to the next step in that section to enable Iceberg for a topic. @@ -293,7 +293,7 @@ iceberg_dlq_table_suffix: _dlq + -- * Replace `` with your bucket name and `` with your Google Cloud project ID. -* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue (DLQ) table]. +* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table]. -- ifndef::env-cloud[] diff --git a/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc b/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc index 79161bdf71..9aa3faa91c 100644 --- a/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc +++ b/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc @@ -1,6 +1,10 @@ = Troubleshoot Iceberg Topics -:description: Diagnose and resolve errors in Redpanda Iceberg translation, including dead-letter queue inspection and record reprocessing. +:description: Diagnose and resolve errors in Redpanda Iceberg translation, including dead-letter queue (DLQ) inspection and record reprocessing. :page-categories: Iceberg, Troubleshooting +:page-topic-type: troubleshooting +:personas: ops_admin, streaming_developer +:learning-objective-1: Diagnose Iceberg translation errors using DLQ tables and metrics +:learning-objective-2: Reprocess or drop invalid records from the DLQ table // tag::single-source[] @@ -11,11 +15,16 @@ include::shared:partial$enterprise-license.adoc[] ==== endif::[] -This page covers how to diagnose and resolve errors that occur during Iceberg translation, including working with dead-letter queue (DLQ) tables and handling invalid records. +{description} -== Dead-letter queue (DLQ) +Use this page to: -If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate dead-letter queue (DLQ) Iceberg table named `~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format: +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} + +== Dead-letter queue + +If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate DLQ Iceberg table named `~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format: - Redpanda cannot find the embedded schema ID in the Schema Registry. - Redpanda fails to translate one or more schema data types to an Iceberg type. @@ -62,7 +71,7 @@ The data is in binary format, and the first byte is not `0x00`, indicating that === Reprocess DLQ records -You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some engines may automatically decode the binary value for you: +You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some query engines decode the binary value automatically: .ClickHouse SQL example to reprocess DLQ record [,sql] @@ -87,7 +96,7 @@ FROM ( +---------+--------------+--------------------------+ ---- -You can now insert the transformed record back into the main Iceberg table. Redpanda recommends employing a strategy for exactly-once processing to avoid duplicates when reprocessing records. +You can now insert the transformed record back into the main Iceberg table. Redpanda recommends using an exactly-once processing strategy to avoid duplicates when reprocessing records. === Drop invalid records @@ -102,8 +111,8 @@ endif::[] The following xref:reference:public-metrics-reference.adoc#iceberg-metrics[Iceberg metrics] help identify translation errors, invalid records, and catalog connectivity issues: -* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of dead letter queue (DLQ) Parquet files created. A non-zero and increasing value indicates records are failing to translate. -* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of DLQ Parquet files created. A non-zero and increasing value indicates records are failing to translate. See <> to examine the failed records. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause. See <> to configure how Redpanda handles these records. * xref:reference:public-metrics-reference.adoc#redpanda_iceberg_rest_client_num_commit_table_update_requests_failed[`redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`]: Failed table commit requests to the REST catalog. Applies only when using a REST catalog (`iceberg_catalog_type: rest`). Persistent failures indicate catalog connectivity or permission issues. // end::single-source[] diff --git a/modules/manage/pages/iceberg/specify-iceberg-schema.adoc b/modules/manage/pages/iceberg/specify-iceberg-schema.adoc index 006cc152df..672d12cb7e 100644 --- a/modules/manage/pages/iceberg/specify-iceberg-schema.adoc +++ b/modules/manage/pages/iceberg/specify-iceberg-schema.adoc @@ -60,7 +60,7 @@ The following modes are compatible with producing to an Iceberg topic using Redp - `key_value` - Starting in version 25.2, `value_schema_latest` with a JSON schema -Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue]. +Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue]. ==== == Configure Iceberg mode for a topic From 9df8cf545190b8ce4fac32bb18af1f49f8319f59 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Fri, 10 Apr 2026 15:39:23 -0700 Subject: [PATCH 6/6] Apply suggestions from SME review --- .../iceberg/iceberg-performance-tuning.adoc | 34 +++++++++++-------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc index 5bcc91f8bb..80810c1770 100644 --- a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc +++ b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc @@ -58,8 +58,8 @@ To learn more about how partitioning schemes can affect query performance, and f Over time, Iceberg translation can produce many small Parquet files, especially with low-throughput topics or short lag targets. Compaction merges small files into larger ones, reducing the number of metadata operations query engines must perform and improving read performance. -* Managed service: Some managed query engines and data platforms, such as Snowflake and Databricks, automatically compact Iceberg tables. -* Manual or scheduled compaction: Tools like https://spark.apache.org/[Apache Spark^] can run compaction jobs on a schedule. This is useful if your query engine does not compact automatically. +* Automatic compaction: Some catalog and data platform services, such as AWS Glue and Databricks, automatically compact Iceberg tables. +* Manual or scheduled compaction: Tools like https://spark.apache.org/[Apache Spark^] can run compaction jobs on a schedule. This is useful if your catalog or platform does not compact automatically. If you observe degraded read performance or a high number of small files, investigate whether your query engine supports automatic compaction or schedule periodic compaction jobs. @@ -117,6 +117,21 @@ Increasing the lag target means Iceberg tables receive new data less frequently. `datalake_translator_flush_bytes` and `iceberg_target_lag_ms` work best when tuned together. A high flush threshold combined with a short lag window may not improve file sizes if the lag window expires before enough data has accumulated. ==== +To check the current values of key translation properties: + +[,bash] +---- +rpk cluster config get datalake_translator_flush_bytes +rpk cluster config get iceberg_target_lag_ms +---- + +To check topic-level overrides: + +[,bash] +---- +rpk topic describe -c +---- + === Message size Redpanda has validated 32 MiB as the maximum recommended message size for Iceberg-enabled topics. With large messages, each Parquet file contains fewer records because the flush threshold is reached sooner. This can reduce the efficiency of analytical queries that need to scan many records. @@ -124,11 +139,11 @@ Redpanda has validated 32 MiB as the maximum recommended message size for Iceber If query latency is a concern and your workload produces large messages, consider: * Reducing individual message sizes if your data model allows it. -* Increasing `datalake_translator_flush_bytes` to produce Parquet files with more records per file. +* Increasing `datalake_translator_flush_bytes` and `iceberg_target_lag_ms` to produce Parquet files with more records per file. See <>. -=== Cluster sizing and backpressure +=== Cluster sizing -When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster. +When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically increases the scheduling priority of Iceberg translation to help it catch up with incoming data. However, this does not substitute for adequate cluster resources. You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team. @@ -144,15 +159,6 @@ Use the following xref:reference:public-metrics-reference.adoc#iceberg-metrics[I For metrics related to DLQ files, invalid records, and catalog commit failures, see xref:manage:iceberg/iceberg-troubleshooting.adoc#troubleshooting-metrics[Troubleshooting metrics]. -To check the current values of key translation cluster properties: - -[,bash] ----- -rpk cluster config get datalake_translator_flush_bytes -rpk cluster config get iceberg_target_lag_ms -rpk cluster config get iceberg_target_backlog_size ----- - [TIP] ==== If translation consistently lags despite available CPU headroom, the workload may be partition-bound. Each core translates its assigned partitions independently, so distributing data across more partitions allows more cores to contribute to translation and can improve total throughput.