diff --git a/docs/pics/cloud/integrations/databricks/storage-direct-access-keys.png b/docs/pics/cloud/integrations/databricks/storage-direct-access-keys.png new file mode 100644 index 000000000..57da17a99 Binary files /dev/null and b/docs/pics/cloud/integrations/databricks/storage-direct-access-keys.png differ diff --git a/docs/pics/cloud/integrations/databricks/storage-direct-access-role.png b/docs/pics/cloud/integrations/databricks/storage-direct-access-role.png new file mode 100644 index 000000000..573adfa3a Binary files /dev/null and b/docs/pics/cloud/integrations/databricks/storage-direct-access-role.png differ diff --git a/docs/snippets/cloud/integrations/databricks.mdx b/docs/snippets/cloud/integrations/databricks.mdx index 9514f122f..e47e8f7ec 100644 --- a/docs/snippets/cloud/integrations/databricks.mdx +++ b/docs/snippets/cloud/integrations/databricks.mdx @@ -38,6 +38,91 @@ Then, select your authentication method: long-lived personal access tokens. +### Storage Access + +Elementary requires access to the table history in order to enable automated monitors such as volume and freshness monitors. +You can configure this in one of the following ways: + +#### Option 1: Fetch history using `DESCRIBE HISTORY` + +Elementary can fetch the table history by running `DESCRIBE HISTORY` queries on your Databricks warehouse. +In the Elementary UI, choose **None** under **Storage access method**. + +This requires `SELECT` access on the relevant tables, as described in the permissions and security section above. + +#### Option 2: Credentials vending + +Elementary can access the storage using temporary credentials issued by Databricks through [credential vending](https://docs.databricks.com/aws/en/external-access/credential-vending). +In the Elementary UI, choose **Credentials vending** under **Storage access method**. + +This requires granting `EXTERNAL USE SCHEMA` on the relevant schemas. + +When using this option, Elementary only reads the Delta transaction log files from storage. + +#### Option 3: Direct storage access + +Elementary can access the storage directly using credentials that you configure. +In the Elementary UI, choose **Direct storage access** under **Storage access method**. + +When using this option, Elementary only reads the Delta transaction log files from storage. + +For S3-backed Databricks storage, you can configure access in one of the following ways: + +__AWS Role authentication__ + +Databricks direct storage access using AWS role ARN + +This is the recommended approach, as it provides better security and follows AWS best practices. +After choosing **Direct storage access**, select **AWS role ARN** under **Select S3 authentication method**. + +1. Create an IAM role that Elementary can assume. +2. Select "Another AWS account" as the trusted entity. +3. Enter Elementary's AWS account ID: `743289191656`. +4. Optionally enable an external ID. +5. Attach a policy that grants read access to the Delta log files. + +Use a policy similar to the following: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "VisualEditor0", + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:ListBucket" + ], + "Resource": [ + "arn:aws:s3:::databricks-metastore-bucket", + "arn:aws:s3:::databricks-metastore-bucket/*_delta_log*" + ] + } + ] +} +``` + +Provide the role ARN in the Elementary UI, and the external ID as well if you configured one. + +__AWS access keys__ + +Databricks direct storage access using AWS access keys + +If needed, you can instead provide direct AWS credentials. +After choosing **Direct storage access**, select **Secret access key** under **Select S3 authentication method**. + +1. Create an IAM user that Elementary will use for storage access. +2. Enable programmatic access. +3. Attach the same read-only S3 policy shown above. +4. Provide the AWS access key ID and secret access key in the Elementary UI. + #### Access token (legacy)