diff --git a/docs-data/property-overrides.json b/docs-data/property-overrides.json index b22cd6c0b9..6011cbe92a 100644 --- a/docs-data/property-overrides.json +++ b/docs-data/property-overrides.json @@ -965,7 +965,7 @@ "description": "Default value for the `redpanda.iceberg.invalid.record.action` topic property.", "related_topics": [ "self-managed-only: xref:reference:properties/topic-properties.adoc#redpanda-iceberg-invalid-record-action[`redpanda.iceberg.invalid.record.action`]", - "self-managed-only: xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[Troubleshoot errors]" + "self-managed-only: xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics]" ], "config_scope": "cluster" }, @@ -1710,7 +1710,7 @@ "redpanda.iceberg.invalid.record.action": { "description": "Whether to write invalid records to a dead-letter queue (DLQ).", "related_topics": [ - "xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[Troubleshoot errors]" + "xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics]" ], "config_scope": "topic" }, @@ -1724,7 +1724,7 @@ "redpanda.iceberg.partition.spec": { "description": "The link:https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] specification for the Iceberg table.", "related_topics": [ - "xref:manage:iceberg/about-iceberg-topics.adoc#use-custom-partitioning[Use custom partitioning]" + "xref:manage:iceberg/iceberg-performance-tuning.adoc#use-custom-partitioning[Use custom partitioning]" ], "config_scope": "topic" }, diff --git a/modules/manage/pages/iceberg/about-iceberg-topics.adoc b/modules/manage/pages/iceberg/about-iceberg-topics.adoc index d785e11fd7..6b179d6f22 100644 --- a/modules/manage/pages/iceberg/about-iceberg-topics.adoc +++ b/modules/manage/pages/iceberg/about-iceberg-topics.adoc @@ -312,136 +312,11 @@ Querying the Iceberg table for `demo-topic` includes the new column `ts`: +---------+--------------+--------------------------+ ---- -== Troubleshoot errors - -If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate dead-letter queue (DLQ) Iceberg table named `~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format: - -- Redpanda cannot find the embedded schema ID in the Schema Registry. -- Redpanda fails to translate one or more schema data types to an Iceberg type. -- In `value_schema_id_prefix` mode, you do not use the Schema Registry wire format with the magic byte. - -The DLQ table itself uses the `key_value` schema, consisting of two columns: the record metadata including the key, and a binary column for the record's value. - -NOTE: Topic property misconfiguration, such as xref:manage:iceberg/specify-iceberg-schema.adoc#override-value-schema-latest-default[overriding the default behavior of `value_schema_latest` mode] but not specifying the fully qualified Protobuf message name, does not cause records to be written to the DLQ table. Instead, Redpanda pauses the topic data translation to the Iceberg table until you fix the misconfiguration. - -=== Inspect DLQ table - -You can inspect the DLQ table for records that failed to write to the Iceberg table, and you can take further action on these records, such as transforming and reprocessing them, or debugging issues that occurred upstream. - -The following example produces a record to a topic named `ClickEvent` and does not use the Schema Registry wire format that includes the magic byte and schema ID: - -[,bash,role=no-copy] ----- -echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' ----- - -Querying the DLQ table returns the record that was not translated: - -[,sql] ----- -SELECT - value -FROM ."ClickEvent~dlq"; -- Fully qualified table name ----- - -[,bash,role=no-copy] ----- -+-------------------------------------------------+ -| value | -+-------------------------------------------------+ -| 7b 22 75 73 65 72 5f 69 64 22 3a 32 33 32 34 2c | -| 22 65 76 65 6e 74 5f 74 79 70 65 22 3a 22 42 55 | -| 54 54 4f 4e 5f 43 4c 49 43 4b 22 2c 22 74 73 22 | -| 3a 22 32 30 32 34 2d 31 31 2d 32 35 54 32 30 3a | -| 32 33 3a 35 39 2e 33 38 30 5a 22 7d | -+-------------------------------------------------+ ----- - -The data is in binary format, and the first byte is not `0x00`, indicating that it was not produced with a schema. - -=== Reprocess DLQ records - -You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some engines may automatically decode the binary value for you: - -.ClickHouse SQL example to reprocess DLQ record -[,sql] ----- -SELECT - CAST(jsonExtractString(json, 'user_id') AS Int32) AS user_id, - jsonExtractString(json, 'event_type') AS event_type, - jsonExtractString(json, 'ts') AS ts -FROM ( - SELECT - CAST(value AS String) AS json - FROM .`ClickEvent~dlq` -- Ensure that the table name is properly parsed -); ----- - -[,bash,role=no-copy] ----- -+---------+--------------+--------------------------+ -| user_id | event_type | ts | -+---------+--------------+--------------------------+ -| 2324 | BUTTON_CLICK | 2024-11-25T20:23:59.380Z | -+---------+--------------+--------------------------+ ----- - -You can now insert the transformed record back into the main Iceberg table. Redpanda recommends employing a strategy for exactly-once processing to avoid duplicates when reprocessing records. - -=== Drop invalid records - -ifndef::env-cloud[] -To disable the default behavior and drop an invalid record, set the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-invalid-record-action[`redpanda.iceberg.invalid.record.action`] topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. -endif::[] -ifdef::env-cloud[] -To disable the default behavior and drop an invalid record, set the `redpanda.iceberg.invalid.record.action` topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. -endif::[] - -== Performance considerations - -When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster. - -You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team. - -=== Use custom partitioning - -ifndef::env-cloud[] -To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-partition-spec[`redpanda.iceberg.partition.spec`] topic property to define the partitioning scheme: -endif::[] -ifdef::env-cloud[] -To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the `redpanda.iceberg.partition.spec` topic property to define the partitioning scheme: -endif::[] - -[,bash,] ----- -# Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg -rpk topic create -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(, , ...)" ----- - -Valid `` values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as `redpanda.timestamp`) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification. - -For example: - -* To partition the table by a single key, such as a column `col1`, use: `redpanda.iceberg.partition.spec=(col1)`. -* To partition by multiple columns, use a comma-separated list: `redpanda.iceberg.partition.spec=(col1, col2)`. -* To partition by the year of a timestamp column `ts1`, and a string column `col1`, use: `redpanda.iceberg.partition.spec=(year(ts1), col1)`. - -To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the https://iceberg.apache.org/spec/#partitioning[Apache Iceberg documentation^]. - -[TIP] -==== -* Partition by columns that you frequently use in queries. Columns with relatively few unique values, also known as low cardinality, are also good candidates for partitioning. -* If you must partition based on columns with high cardinality, for example timestamps, use Iceberg's available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed. -==== - -=== Avoid high column count - -A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics. - == Next steps * xref:manage:iceberg/use-iceberg-catalogs.adoc[] -* xref:manage:iceberg/migrate-to-iceberg-topics.adoc[Migrate existing Iceberg integrations to Iceberg Topics] +* xref:manage:iceberg/iceberg-performance-tuning.adoc[] +* xref:manage:iceberg/iceberg-troubleshooting.adoc[] == Suggested reading diff --git a/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc new file mode 100644 index 0000000000..80810c1770 --- /dev/null +++ b/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc @@ -0,0 +1,165 @@ += Tune Performance for Iceberg Topics +:description: Optimize query performance and translation throughput for Iceberg topics with partitioning, compaction, flush threshold tuning, and cluster sizing guidance. +:page-categories: Iceberg, Management +:page-topic-type: best-practices +:personas: ops_admin, streaming_developer +:learning-objective-1: Apply partitioning and compaction strategies to improve query performance +:learning-objective-2: Choose appropriate flush threshold and lag target values for your workload +:learning-objective-3: Identify translation performance signals using Iceberg metrics + +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== + +This guide covers strategies for optimizing the performance of Iceberg topics in Redpanda, including improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== Prerequisites + +You must be familiar with how Iceberg topics work in Redpanda. See xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics]. + +== Optimize query performance + +Query engines read Parquet files from object storage to process Iceberg table data. Partitioning, compaction, and schema design affect how efficiently those reads perform. + +=== Use custom partitioning + +To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-partition-spec[`redpanda.iceberg.partition.spec`] topic property to define the partitioning scheme: + +[,bash] +---- +# Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg +rpk topic create -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(, , ...)" +---- + +Valid `` values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as `redpanda.timestamp`) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification. + +For example: + +* To partition the table by a single key, such as a column `col1`, use: `redpanda.iceberg.partition.spec=(col1)`. +* To partition by multiple columns, use a comma-separated list: `redpanda.iceberg.partition.spec=(col1, col2)`. +* To partition by the year of a timestamp column `ts1`, and a string column `col1`, use: `redpanda.iceberg.partition.spec=(year(ts1), col1)`. + +To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the https://iceberg.apache.org/spec/#partitioning[Apache Iceberg documentation^]. + +[TIP] +==== +* Partition by columns that you frequently use in queries. Columns with relatively few unique values (low cardinality) are good candidates for partitioning. +* If you must partition based on columns with high cardinality, for example timestamps, use Iceberg's available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed. +==== + +=== Compact Iceberg tables + +Over time, Iceberg translation can produce many small Parquet files, especially with low-throughput topics or short lag targets. Compaction merges small files into larger ones, reducing the number of metadata operations query engines must perform and improving read performance. + +* Automatic compaction: Some catalog and data platform services, such as AWS Glue and Databricks, automatically compact Iceberg tables. +* Manual or scheduled compaction: Tools like https://spark.apache.org/[Apache Spark^] can run compaction jobs on a schedule. This is useful if your catalog or platform does not compact automatically. + +If you observe degraded read performance or a high number of small files, investigate whether your query engine supports automatic compaction or schedule periodic compaction jobs. + +=== Avoid high column count + +A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics. + +== Tune translation performance + +Translation is the process in which Redpanda converts topic data into Parquet files for the Iceberg table. Each round of translation processes one topic partition at a time. + +Under typical conditions, Iceberg translation has the following performance characteristics: + +* Throughput: Approximately 5 MiB/s per core. +* Flush threshold: Controlled by xref:reference:properties/cluster-properties.adoc#datalake_translator_flush_bytes[`datalake_translator_flush_bytes`] (default: 32 MiB). Each translation process uploads its on-disk data when accumulated data reaches this threshold. This is the primary control for Parquet file size. +* Lag target: Controlled by xref:reference:properties/cluster-properties.adoc#iceberg_target_lag_ms[`iceberg_target_lag_ms`] (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window. + +The flush threshold and lag target together determine the size of the Parquet files written to object storage. Larger Parquet files generally improve downstream query performance by reducing the number of metadata operations query engines must perform. + +=== Tune flush threshold and lag target + +Increase both the flush threshold and the lag target together to produce larger Parquet files with more records per file. This is the primary way to tune Iceberg translation performance. + +. Increase `datalake_translator_flush_bytes` to control the size of Parquet files. A good starting value depends on your workload: ++ +[,bash] +---- +rpk cluster config set datalake_translator_flush_bytes +---- ++ +For example, to set a 64 MiB flush threshold: ++ +[,bash] +---- +rpk cluster config set datalake_translator_flush_bytes 67108864 +---- + +. Increase `iceberg_target_lag_ms` to give translators more time to accumulate data before committing: ++ +[,bash] +---- +rpk cluster config set iceberg_target_lag_ms 300000 +---- ++ +You can also set the lag target per topic using the +xref:reference:properties/topic-properties.adoc#redpanda-iceberg-target-lag-ms[`redpanda.iceberg.target.lag.ms`] topic property. ++ +[NOTE] +==== +Increasing the lag target means Iceberg tables receive new data less frequently. Choose a lag value that balances file efficiency against how current your downstream data must be. +==== + +[TIP] +==== +`datalake_translator_flush_bytes` and `iceberg_target_lag_ms` work best when tuned together. A high flush threshold combined with a short lag window may not improve file sizes if the lag window expires before enough data has accumulated. +==== + +To check the current values of key translation properties: + +[,bash] +---- +rpk cluster config get datalake_translator_flush_bytes +rpk cluster config get iceberg_target_lag_ms +---- + +To check topic-level overrides: + +[,bash] +---- +rpk topic describe -c +---- + +=== Message size + +Redpanda has validated 32 MiB as the maximum recommended message size for Iceberg-enabled topics. With large messages, each Parquet file contains fewer records because the flush threshold is reached sooner. This can reduce the efficiency of analytical queries that need to scan many records. + +If query latency is a concern and your workload produces large messages, consider: + +* Reducing individual message sizes if your data model allows it. +* Increasing `datalake_translator_flush_bytes` and `iceberg_target_lag_ms` to produce Parquet files with more records per file. See <>. + +=== Cluster sizing + +When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically increases the scheduling priority of Iceberg translation to help it catch up with incoming data. However, this does not substitute for adequate cluster resources. + +You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team. + +=== Monitor translation performance + +Use the following xref:reference:public-metrics-reference.adoc#iceberg-metrics[Iceberg metrics] to understand whether translation is keeping pace with incoming data: + +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_raw_bytes_processed[`redpanda_iceberg_translation_raw_bytes_processed`]: Total raw bytes consumed for translation input. Use this to monitor input throughput and compare against the expected 5 MiB/s per core baseline. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_parquet_bytes_added[`redpanda_iceberg_translation_parquet_bytes_added`]: Total bytes written to Parquet files. Divide by `redpanda_iceberg_translation_files_created` to estimate the average file size produced by your workload. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_files_created[`redpanda_iceberg_translation_files_created`]: Number of Parquet files created. A high file creation rate relative to bytes added indicates many small files. Consider increasing `datalake_translator_flush_bytes` and `iceberg_target_lag_ms`. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_parquet_rows_added[`redpanda_iceberg_translation_parquet_rows_added`]: Total rows written to Parquet files. Useful for understanding record-level throughput. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_translations_finished[`redpanda_iceberg_translation_translations_finished`]: Number of completed translator executions. A stalling or zero rate indicates translation has stopped. + +For metrics related to DLQ files, invalid records, and catalog commit failures, see xref:manage:iceberg/iceberg-troubleshooting.adoc#troubleshooting-metrics[Troubleshooting metrics]. + +[TIP] +==== +If translation consistently lags despite available CPU headroom, the workload may be partition-bound. Each core translates its assigned partitions independently, so distributing data across more partitions allows more cores to contribute to translation and can improve total throughput. +==== diff --git a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc index ba52d60375..076cc1a54e 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc @@ -246,7 +246,7 @@ iceberg_dlq_table_suffix: _dlq + -- * Replace `` with your bucket name and `` with your Google Cloud project ID. -* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[dead-letter queue (DLQ) table]. +* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table]. -- + NOTE: If you edit `bootstrap.yml`, you can skip the cluster configuration step in <> and proceed to the next step in that section to enable Iceberg for a topic. @@ -293,7 +293,7 @@ iceberg_dlq_table_suffix: _dlq + -- * Replace `` with your bucket name and `` with your Google Cloud project ID. -* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[dead-letter queue (DLQ) table]. +* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table]. -- ifndef::env-cloud[] diff --git a/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc b/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc new file mode 100644 index 0000000000..9aa3faa91c --- /dev/null +++ b/modules/manage/pages/iceberg/iceberg-troubleshooting.adoc @@ -0,0 +1,118 @@ += Troubleshoot Iceberg Topics +:description: Diagnose and resolve errors in Redpanda Iceberg translation, including dead-letter queue (DLQ) inspection and record reprocessing. +:page-categories: Iceberg, Troubleshooting +:page-topic-type: troubleshooting +:personas: ops_admin, streaming_developer +:learning-objective-1: Diagnose Iceberg translation errors using DLQ tables and metrics +:learning-objective-2: Reprocess or drop invalid records from the DLQ table + +// tag::single-source[] + +ifndef::env-cloud[] +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== +endif::[] + +{description} + +Use this page to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} + +== Dead-letter queue + +If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate DLQ Iceberg table named `~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format: + +- Redpanda cannot find the embedded schema ID in the Schema Registry. +- Redpanda fails to translate one or more schema data types to an Iceberg type. +- In `value_schema_id_prefix` mode, you do not use the Schema Registry wire format with the magic byte. + +The DLQ table itself uses the `key_value` schema, consisting of two columns: the record metadata including the key, and a binary column for the record's value. + +NOTE: Topic property misconfiguration, such as xref:manage:iceberg/specify-iceberg-schema.adoc#override-value-schema-latest-default[overriding the default behavior of `value_schema_latest` mode] but not specifying the fully qualified Protobuf message name, does not cause records to be written to the DLQ table. Instead, Redpanda pauses the topic data translation to the Iceberg table until you fix the misconfiguration. + +=== Inspect DLQ table + +You can inspect the DLQ table for records that failed to write to the Iceberg table, and you can take further action on these records, such as transforming and reprocessing them, or debugging issues that occurred upstream. + +The following example produces a record to a topic named `ClickEvent` and does not use the Schema Registry wire format that includes the magic byte and schema ID: + +[,bash,role=no-copy] +---- +echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' +---- + +Querying the DLQ table returns the record that was not translated: + +[,sql] +---- +SELECT + value +FROM ."ClickEvent~dlq"; -- Fully qualified table name +---- + +[,bash,role=no-copy] +---- ++-------------------------------------------------+ +| value | ++-------------------------------------------------+ +| 7b 22 75 73 65 72 5f 69 64 22 3a 32 33 32 34 2c | +| 22 65 76 65 6e 74 5f 74 79 70 65 22 3a 22 42 55 | +| 54 54 4f 4e 5f 43 4c 49 43 4b 22 2c 22 74 73 22 | +| 3a 22 32 30 32 34 2d 31 31 2d 32 35 54 32 30 3a | +| 32 33 3a 35 39 2e 33 38 30 5a 22 7d | ++-------------------------------------------------+ +---- + +The data is in binary format, and the first byte is not `0x00`, indicating that it was not produced with a schema. + +=== Reprocess DLQ records + +You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some query engines decode the binary value automatically: + +.ClickHouse SQL example to reprocess DLQ record +[,sql] +---- +SELECT + CAST(jsonExtractString(json, 'user_id') AS Int32) AS user_id, + jsonExtractString(json, 'event_type') AS event_type, + jsonExtractString(json, 'ts') AS ts +FROM ( + SELECT + CAST(value AS String) AS json + FROM .`ClickEvent~dlq` -- Ensure that the table name is properly parsed +); +---- + +[,bash,role=no-copy] +---- ++---------+--------------+--------------------------+ +| user_id | event_type | ts | ++---------+--------------+--------------------------+ +| 2324 | BUTTON_CLICK | 2024-11-25T20:23:59.380Z | ++---------+--------------+--------------------------+ +---- + +You can now insert the transformed record back into the main Iceberg table. Redpanda recommends using an exactly-once processing strategy to avoid duplicates when reprocessing records. + +=== Drop invalid records + +ifndef::env-cloud[] +To disable the default behavior and drop an invalid record, set the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-invalid-record-action[`redpanda.iceberg.invalid.record.action`] topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. +endif::[] +ifdef::env-cloud[] +To disable the default behavior and drop an invalid record, set the `redpanda.iceberg.invalid.record.action` topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property. +endif::[] + +== Troubleshooting metrics + +The following xref:reference:public-metrics-reference.adoc#iceberg-metrics[Iceberg metrics] help identify translation errors, invalid records, and catalog connectivity issues: + +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of DLQ Parquet files created. A non-zero and increasing value indicates records are failing to translate. See <> to examine the failed records. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause. See <> to configure how Redpanda handles these records. +* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_rest_client_num_commit_table_update_requests_failed[`redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`]: Failed table commit requests to the REST catalog. Applies only when using a REST catalog (`iceberg_catalog_type: rest`). Persistent failures indicate catalog connectivity or permission issues. + +// end::single-source[] diff --git a/modules/manage/pages/iceberg/specify-iceberg-schema.adoc b/modules/manage/pages/iceberg/specify-iceberg-schema.adoc index d8ccd38c37..672d12cb7e 100644 --- a/modules/manage/pages/iceberg/specify-iceberg-schema.adoc +++ b/modules/manage/pages/iceberg/specify-iceberg-schema.adoc @@ -60,7 +60,7 @@ The following modes are compatible with producing to an Iceberg topic using Redp - `key_value` - Starting in version 25.2, `value_schema_latest` with a JSON schema -Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/about-iceberg-topics.adoc#manage-dead-letter-queue[dead-letter queue]. +Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue]. ==== == Configure Iceberg mode for a topic @@ -184,7 +184,7 @@ CREATE TABLE ClickEvent ( As you produce records to the topic, the data also becomes available in object storage for Iceberg-compatible clients to consume. You can use the same analytical tools to xref:manage:iceberg/query-iceberg-topics.adoc[read the Iceberg topic data] in a data lake as you would for a relational database. -If Redpanda fails to translate the record to the columnar format as defined by the schema, it writes the record to a dead-letter queue (DLQ) table. See xref:manage:iceberg/about-iceberg-topics.adoc#troubleshoot-errors[Troubleshoot errors] for more information. +If Redpanda fails to translate the record to the columnar format as defined by the schema, it writes the record to a dead-letter queue (DLQ) table. See xref:manage:iceberg/iceberg-troubleshooting.adoc[Troubleshoot Iceberg Topics] for more information. NOTE: You cannot use schemas to parse or decode record keys for Iceberg. The record keys are always stored in binary format in the `redpanda.key` column.