Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
85 changes: 85 additions & 0 deletions docs/snippets/cloud/integrations/databricks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,91 @@ Then, select your authentication method:
long-lived personal access tokens.
</Info>

### Storage Access

Elementary requires access to the table history in order to enable automated monitors such as volume and freshness monitors.
You can configure this in one of the following ways:

#### Option 1: Fetch history using `DESCRIBE HISTORY`

Elementary can fetch the table history by running `DESCRIBE HISTORY` queries on your Databricks warehouse.
In the Elementary UI, choose **None** under **Storage access method**.

This requires `SELECT` access on the relevant tables, as described in the permissions and security section above.

#### Option 2: Credentials vending

Elementary can access the storage using temporary credentials issued by Databricks through [credential vending](https://docs.databricks.com/aws/en/external-access/credential-vending).
In the Elementary UI, choose **Credentials vending** under **Storage access method**.

This requires granting `EXTERNAL USE SCHEMA` on the relevant schemas.

When using this option, Elementary only reads the Delta transaction log files from storage.

#### Option 3: Direct storage access

Elementary can access the storage directly using credentials that you configure.
In the Elementary UI, choose **Direct storage access** under **Storage access method**.

When using this option, Elementary only reads the Delta transaction log files from storage.

For S3-backed Databricks storage, you can configure access in one of the following ways:

__AWS Role authentication__

<img
src="/pics/cloud/integrations/databricks/storage-direct-access-role.png"
alt="Databricks direct storage access using AWS role ARN"
/>

This is the recommended approach, as it provides better security and follows AWS best practices.
After choosing **Direct storage access**, select **AWS role ARN** under **Select S3 authentication method**.

1. Create an IAM role that Elementary can assume.
2. Select "Another AWS account" as the trusted entity.
3. Enter Elementary's AWS account ID: `743289191656`.
4. Optionally enable an external ID.
5. Attach a policy that grants read access to the Delta log files.

Use a policy similar to the following:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::databricks-metastore-bucket",
"arn:aws:s3:::databricks-metastore-bucket/*_delta_log*"
]
}
]
}
```

Provide the role ARN in the Elementary UI, and the external ID as well if you configured one.

__AWS access keys__

<img
src="/pics/cloud/integrations/databricks/storage-direct-access-keys.png"
alt="Databricks direct storage access using AWS access keys"
/>

If needed, you can instead provide direct AWS credentials.
After choosing **Direct storage access**, select **Secret access key** under **Select S3 authentication method**.

1. Create an IAM user that Elementary will use for storage access.
2. Enable programmatic access.
3. Attach the same read-only S3 policy shown above.
4. Provide the AWS access key ID and secret access key in the Elementary UI.

#### Access token (legacy)

<img
Expand Down
Loading