-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Labels
Description
What would you like to happen?
The Apache Beam Iceberg connector currently only supports Google Cloud Storage (GCS) out of the box. The expansion service JAR includes iceberg-gcp, but does not bundle iceberg-aws, which is required for AWS S3 and S3-compatible storage backends (e.g., MinIO, Supabase Storage, etc.).
Attempting to write to an S3-compatible destination using the current expansion service results in the following error when running on Dataflow:
Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Cannot initialize FileIO implementation org.apache.iceberg.aws.s3.S3FileIO: Cannot find constructor for interface org.apache.iceberg.io.FileIO
Missing org.apache.iceberg.aws.s3.S3FileIO [java.lang.ClassNotFoundException: org.apache.iceberg.aws.s3.S3FileIO]
Current workarounds:
- Building and deploying a custom expansion service JAR that includes iceberg-aws
- Using IcebergIO directly (which is generally discouraged in favor of using Managed IO)
Proposal
Bundle iceberg-aws with the official Iceberg expansion service JAR to enable native S3 and S3-compatible storage support.
This would allow writing to S3-compatible destinations using a REST-based catalog configuration such as:
ImmutableMap<String, String> catalogProperties = ImmutableMap.<String, String>builder()
.put("type", "rest")
.put("uri", options.getCatalogUri())
.put("token", options.getCatalogToken())
.put("warehouse", options.getWarehouse())
.put("client.region", "us-east-1")
.put("s3.endpoint", options.getS3Endpoint())
.put("s3.access-key-id", options.getS3AccessKeyId())
.put("s3.secret-access-key", options.getS3SecretAccessKey())
.put("s3.path-style-access", "true")
.put("s3.remote-signing-enabled", "false")
.build();Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner
Reactions are currently unavailable