-
Notifications
You must be signed in to change notification settings - Fork 4
SQL GA - Get started #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rp-sql
Are you sure you want to change the base?
SQL GA - Get started #571
Changes from all commits
a2ee5eb
9fe925a
b787bf7
6caf890
469c5e8
61bd49e
76d113a
fa463e1
6eb3a66
50d7131
d525341
a0b51f4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| = Enable Redpanda SQL on a BYOC cluster | ||
| :description: Enable the Redpanda SQL engine on a BYOC cluster so you can query streaming data with standard PostgreSQL syntax. | ||
| :page-topic-type: how-to | ||
| :personas: platform_admin, data_engineer | ||
| :learning-objective-1: Enable Redpanda SQL on a new or existing BYOC cluster | ||
| :learning-objective-2: Scale or disable the SQL engine using the Cloud API | ||
| :learning-objective-3: Verify that the SQL engine is running and ready to accept connections | ||
|
|
||
| Enable Redpanda SQL on a BYOC cluster so you can query streaming data in Redpanda topics using standard PostgreSQL syntax. Iceberg-enabled topics can also be queried alongside their Iceberg-translated history. See xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg topics] for that workflow. | ||
|
|
||
| After reading this page, you will be able to: | ||
|
|
||
| * [ ] {learning-objective-1} | ||
| * [ ] {learning-objective-2} | ||
| * [ ] {learning-objective-3} | ||
|
|
||
| == Prerequisites | ||
|
|
||
| To enable Redpanda SQL engine, you need: | ||
|
|
||
| * Admin permissions in your Redpanda Cloud organization. | ||
| * If using the link:/api/doc/cloud-controlplane/topic/topic-cloud-api-overview[Cloud API] to enable SQL, a valid bearer token for the API. See link:/api/doc/cloud-controlplane/authentication[Authenticate to the Cloud API]. | ||
|
|
||
| == Enable Redpanda SQL | ||
|
|
||
| You can enable Redpanda SQL when you create a new BYOC cluster or on an existing cluster. | ||
|
|
||
| === On a new cluster | ||
|
|
||
| [tabs] | ||
| ===== | ||
| Cloud Console:: | ||
| + | ||
| -- | ||
| . Log in to https://cloud.redpanda.com[Redpanda Cloud^]. | ||
| . Start creating a new BYOC cluster on AWS. For details and prerequisites, see xref:get-started:cluster-types/byoc/aws/create-byoc-cluster-aws.adoc[]. | ||
| . In the cluster creation form, select the option to enable SQL. | ||
| // TODO: Confirm guidance to provide on selecting number of nodes | ||
| . Choose the number of SQL nodes to deploy. | ||
| + | ||
| The minimum is one node to enable SQL. You can scale up (maximum nine nodes) or down later as needed, but the cluster must have at least one SQL node to run the engine. | ||
| . Complete the remaining cluster configuration and deploy. | ||
| -- | ||
|
|
||
| Cloud API:: | ||
| + | ||
| -- | ||
| . Authenticate to the link:/api/doc/cloud-controlplane/topic/topic-cloud-api-overview[Cloud API]. For details, see link:/api/doc/cloud-controlplane/authentication[Authenticate to the Cloud API]. | ||
| // TODO: confirm field name change to rpsql | ||
| // Is selecting the number of nodes available with this endpoint? | ||
| . Make a link:/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster[`POST /v1/clusters`] request with `oxla.enabled` set to `true` in the cluster spec: | ||
| + | ||
| [,bash] | ||
| ---- | ||
| curl -X POST "https://api.redpanda.com/v1/clusters" \ | ||
| -H "Authorization: Bearer $AUTH_TOKEN" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "cluster": { | ||
| "name": "<cluster-name>", | ||
| "cloud_provider": "CLOUD_PROVIDER_AWS", | ||
| "type": "TYPE_BYOC", | ||
| "region": "<region>", | ||
| "zones": [ <zones> ], | ||
| "throughput_tier": "<tier>", | ||
| "resource_group_id": "<resource-group-id>", | ||
| "oxla": { | ||
| "enabled": true | ||
| } | ||
| } | ||
| }' | ||
| ---- | ||
| + | ||
| For the full request body and field reference, see the link:/api/doc/cloud-controlplane/operation/operation-clusterservice_createcluster[Create Cluster API]. | ||
| . The request returns the ID of a long-running operation. Poll the link:/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation[`GET /v1/operations/{operation.id}`] endpoint until the operation completes. | ||
| -- | ||
| ===== | ||
|
|
||
| === On an existing cluster | ||
|
|
||
| To enable, scale, or disable SQL on an existing cluster, you also need the cluster ID, which you can find in the *Details* section of the cluster overview in the Cloud Console. | ||
|
|
||
| // TODO: Confirm UI functionality | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @c-julin should have knowledge about this matter |
||
|
|
||
| . Authenticate to the link:/api/doc/cloud-controlplane/topic/topic-cloud-api-overview[Cloud API]. For details, see link:/api/doc/cloud-controlplane/authentication[Authenticate to the Cloud API]. | ||
| . Make a link:/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster[`PATCH /v1/clusters/{cluster.id}`] request, replacing `{cluster.id}` with your cluster ID: | ||
| + | ||
| [,bash] | ||
| ---- | ||
| curl -X PATCH "https://api.redpanda.com/v1/clusters/{cluster.id}" \ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confirm payload of this request with mr.Marat (mentioned above). |
||
| -H "Authorization: Bearer $AUTH_TOKEN" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"oxla":{"enabled":true}}' | ||
| ---- | ||
| + | ||
| The request returns the ID of a long-running operation. Poll the link:/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation[`GET /v1/operations/{operation.id}`] endpoint until the operation completes: | ||
| + | ||
| [,bash] | ||
| ---- | ||
| curl -X GET "https://api.redpanda.com/v1/operations/{operation.id}" \ | ||
| -H "Authorization: Bearer $AUTH_TOKEN" \ | ||
| -H "Content-Type: application/json" | ||
| ---- | ||
| + | ||
| When the operation is complete, the response shows `"state": "STATE_COMPLETED"`. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to talk a bit more about it, I'll send you a link to slack thread. |
||
|
|
||
| == Scale Redpanda SQL | ||
|
|
||
| Redpanda SQL supports horizontal scaling from one to nine nodes per cluster. You cannot scale to zero nodes. To remove Redpanda SQL from a cluster, disable the SQL engine instead. | ||
|
|
||
| // TODO: Confirm UI functionality | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @simonlord is adding this |
||
|
|
||
| Make a link:/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster[`PATCH /v1/clusters/{cluster.id}`] request with the new replica count. Replace `{cluster.id}` with your cluster ID and `<n>` with a value between 1 and 9: | ||
|
|
||
| [,bash] | ||
| ---- | ||
| curl -X PATCH "https://api.redpanda.com/v1/clusters/{cluster.id}" \ | ||
| -H "Authorization: Bearer $AUTH_TOKEN" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"oxla":{"replicas":<n>}}' | ||
| ---- | ||
|
|
||
| The request returns the ID of a long-running operation. Poll link:/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation[`GET /v1/operations/{operation.id}`] until the operation completes. | ||
|
|
||
| == Verify the SQL engine is running | ||
|
|
||
| After you enable Redpanda SQL, the cluster overview page in the Cloud Console shows the *SQL* tab and the *Details* pane displays the number of SQL nodes deployed with the cluster. | ||
|
|
||
| The *SQL* tab appears as soon as you enable SQL, but you can't connect until the engine is fully provisioned. Provisioning can take up to 30 minutes. Wait for the node-ready status indicator on the overview page to show the engine is ready. For the API flow, poll the long-running operation until it returns `STATE_COMPLETED`. | ||
|
|
||
| To verify the SQL engine is running, use the connection details on the *SQL* tab to connect with a PostgreSQL client, such as `psql` (v16 or later required). | ||
|
|
||
| The following shows how to connect using a bearer token. Log in to Redpanda Cloud with `rpk`, then retrieve a temporary authentication token for the SQL engine with `rpk cloud auth token` (xref:manage:rpk/rpk-install.adoc[`rpk` v26.1.6+] required): | ||
|
|
||
| [,bash] | ||
| ---- | ||
| rpk cloud login | ||
|
|
||
| rpsql_token=$(rpk cloud auth token) | ||
|
|
||
| psql "host=<sql-external-endpoint> port=5432 dbname=oxla user=ignored password=$rpsql_token options='-c auth_method=bearer' sslmode=require" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should add something about supported psql versions, @grzebiel should have the list There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. psql clisnt v16+ |
||
| ---- | ||
|
|
||
| == Inspect your SQL cluster | ||
|
|
||
| Redpanda SQL provides built-in commands to inspect the state of your SQL cluster: | ||
|
|
||
| [,sql] | ||
| ---- | ||
| SHOW NODES; -- List SQL compute nodes and their status | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll think adding some queries related to |
||
| SHOW REDPANDA TABLES; -- List SQL tables mapped to Redpanda topics | ||
| SHOW QUERIES; -- List currently running queries | ||
| ---- | ||
|
|
||
| == Disable Redpanda SQL | ||
|
|
||
| [WARNING] | ||
| ==== | ||
| Disabling Redpanda SQL purges the stored catalog state for the SQL engine (catalog metadata, table mappings, and role/grant data) and deletes its data from object storage, including Iceberg-translated data for Iceberg-enabled topics. In-flight queries fail when SQL is disabled. | ||
| ==== | ||
|
|
||
| Redpanda topic data and Schema Registry subjects are not affected. The Redpanda cluster itself continues to run normally; only the SQL engine and its associated state are removed. | ||
|
|
||
| Re-enabling SQL on the same cluster provisions a fresh engine: no prior catalog state, table mappings, or grants are restored. You must re-create catalogs, tables, and grants after re-enabling. | ||
|
|
||
| Make a link:/api/doc/cloud-controlplane/operation/operation-clusterservice_updatecluster[`PATCH /v1/clusters/{cluster.id}`] request with `oxla.enabled` set to `false`. Replace `{cluster.id}` with your cluster ID: | ||
|
|
||
| [,bash] | ||
| ---- | ||
| curl -X PATCH "https://api.redpanda.com/v1/clusters/{cluster.id}" \ | ||
| -H "Authorization: Bearer $AUTH_TOKEN" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"oxla":{"enabled":false}}' | ||
| ---- | ||
|
|
||
| The request returns the ID of a long-running operation. Poll link:/api/doc/cloud-controlplane/operation/operation-operationservice_getoperation[`GET /v1/operations/{operation.id}`] until the operation completes. | ||
|
|
||
| == Next steps | ||
|
|
||
| * xref:sql:get-started/sql-quickstart.adoc[Quickstart]: Connect to Redpanda SQL with `psql` and run your first query. | ||
| * xref:reference:sql/index.adoc[Redpanda SQL reference]: Explore the full SQL syntax, data types, functions, and clauses. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| = Get Started with Redpanda SQL | ||
| :description: Get started with Redpanda SQL, a column-oriented OLAP query engine built into Redpanda Cloud that lets you query streaming topics using standard SQL. | ||
| :page-layout: index |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,210 @@ | ||
| = Redpanda SQL Quickstart | ||
| :description: Connect to Redpanda SQL on a BYOC cluster and run your first query on streaming data. | ||
| :page-topic-type: tutorial | ||
| :personas: streaming_developer, data_engineer, platform_admin | ||
| :learning-objective-1: Connect to Redpanda SQL using psql and a bearer token | ||
| :learning-objective-2: Query a Redpanda topic with SQL | ||
|
|
||
| Redpanda SQL is a PostgreSQL-compatible SQL engine built into Redpanda Bring Your Own Cloud (BYOC). It lets you query streaming data in your Redpanda topics with standard SQL, without building ETL pipelines or deploying a separate analytics system. In this quickstart, you connect with `psql` and run your first query against a Redpanda topic. | ||
|
|
||
| For Iceberg-enabled topics, Redpanda SQL can also query the full Iceberg-translated history alongside live records. For that workflow, see xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg topics]. | ||
|
|
||
| After reading this page, you will be able to: | ||
|
|
||
| * [ ] {learning-objective-1} | ||
| * [ ] {learning-objective-2} | ||
|
|
||
| == Prerequisites | ||
|
|
||
| * A Redpanda BYOC cluster on AWS with Redpanda SQL enabled. See xref:sql:get-started/deploy-sql-cluster.adoc[]. | ||
| * Admin access to your cluster in the Redpanda Cloud Console, or a role with the *SQL: Manage* permission. You need one of these to view SQL connection details and to create catalogs, tables, and grants in the SQL engine. For more information on authorization in Redpanda Cloud, see xref:security:authorization/rbac/index.adoc[]. | ||
| * A Redpanda topic with a schema registered in Schema Registry. If you don't have one, follow the optional <<optional-produce-sample-data,Produce sample data>> section below to create a sample `orders` topic. | ||
| * xref:manage:rpk/rpk-install.adoc[`rpk` v26.1.6] or later installed on your local machine to generate an authentication token. | ||
| * https://www.postgresql.org/download/[`psql`^] v16 or later (PostgreSQL client) installed on your local machine. | ||
|
|
||
| [#optional-produce-sample-data] | ||
| == (Optional) Produce sample data | ||
|
|
||
| [TIP] | ||
| ==== | ||
| Skip this section if you already have a Redpanda topic with a schema registered in Schema Registry that you want to query. | ||
| ==== | ||
|
|
||
| If you don't have a schema-registered topic to query yet, follow these steps to create an `orders` topic with a small set of sample records. Redpanda SQL reads the topic's schema from Schema Registry to map fields to SQL columns, so the topic must have a registered schema before you can query it. | ||
|
|
||
| You also need permissions to create topics, register schemas, and produce records. | ||
|
|
||
| . https://cloud.redpanda.com/[Log in to Redpanda Cloud^] and select your cluster. | ||
|
|
||
| . On the *Topics* page, click *Create Topic*. Name the topic `orders` and create it with default settings. | ||
|
|
||
| . On the *Schema Registry* page, click *Create new schema*. | ||
|
|
||
| . Create a new schema with the following: | ||
| + | ||
| * *Strategy*: Topic | ||
| * *Topic name*: orders | ||
| * *Schema applies to*: Value | ||
| * *Schema definition*: Select Protobuf and paste the following schema definition: | ||
| + | ||
| [,proto] | ||
| ---- | ||
| syntax = "proto3"; | ||
|
|
||
| message Order { | ||
| int64 order_id = 1; | ||
| string customer = 2; | ||
| string product = 3; | ||
| int64 amount = 4; // amount in cents | ||
| string status = 5; // "pending", "shipped", "completed" | ||
| } | ||
| ---- | ||
|
|
||
| . Return to the *Topics* page and select the `orders` topic. Produce a few sample records: | ||
| // TODO: Verify exact steps to produce records in UI | ||
| + | ||
| [,bash] | ||
| ---- | ||
| {"order_id": 1, "customer": "alice", "product": "keyboard", "amount": 7500, "status": "completed"} | ||
| ---- | ||
| + | ||
| [,bash] | ||
| ---- | ||
| {"order_id": 2, "customer": "bob", "product": "monitor", "amount": 32000, "status": "shipped"} | ||
| ---- | ||
| + | ||
| [,bash] | ||
| ---- | ||
| {"order_id": 3, "customer": "carol", "product": "mouse", "amount": 4500, "status": "pending"} | ||
| ---- | ||
| + | ||
| [,bash] | ||
| ---- | ||
| {"order_id": 4, "customer": "alice", "product": "monitor", "amount": 32000, "status": "completed"} | ||
| ---- | ||
| + | ||
| [,bash] | ||
| ---- | ||
| {"order_id": 5, "customer": "dave", "product": "keyboard", "amount": 7500, "status": "pending"} | ||
| ---- | ||
|
|
||
| When you continue to the next section, use `orders` as the topic name when you define the SQL table. | ||
|
|
||
| == Connect to Redpanda SQL | ||
|
|
||
| SQL connection details are available on your cluster's *SQL* tab in the https://cloud.redpanda.com/[Cloud console]. To connect using `psql`: | ||
|
|
||
| . Log in to Redpanda Cloud with `rpk`. This opens a browser window for SSO sign-in: | ||
| + | ||
| [,bash] | ||
| ---- | ||
| rpk cloud login | ||
| ---- | ||
|
|
||
| . Retrieve a temporary authentication token for the SQL engine: | ||
| + | ||
| [,bash] | ||
| ---- | ||
| rpsql_token=$(rpk cloud auth token) | ||
| ---- | ||
|
|
||
| . Copy and run the `psql` connection string from the *SQL* tab: | ||
| + | ||
| [,bash] | ||
| ---- | ||
| psql "host=<sql-external-endpoint> port=5432 dbname=oxla user=ignored password=$rpsql_token options='-c auth_method=bearer' sslmode=require" | ||
| ---- | ||
|
|
||
| On a successful connection, you see output similar to: | ||
|
|
||
| // TODO: Verify current psql banner text. | ||
| [.no-copy] | ||
| ---- | ||
| psql (17.8 (Homebrew), server 16.0 (oxla version: 1.0.0, build: af2dffb-Release-x86_64-GNU, asio)) | ||
| SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, compression: off, ALPN: none) | ||
| Type "help" for help. | ||
|
|
||
| => | ||
| ---- | ||
|
|
||
| == Query topic data | ||
|
|
||
| When you enable Redpanda SQL, the engine automatically creates a Redpanda catalog named `default_redpanda_catalog` that connects to your cluster. To query a Redpanda topic, define a SQL table against the topic, then run standard SQL queries against the table. | ||
|
|
||
| . Define a SQL table from the topic with `CREATE TABLE`. The following example uses the `orders` topic from the optional sample data section. To use your own topic, replace `orders` with your topic name and `orders-value` with the Schema Registry subject that holds the topic's value schema. Your topic must have a schema registered in Schema Registry. | ||
| + | ||
| [,sql] | ||
| ---- | ||
| CREATE TABLE default_redpanda_catalog=>orders WITH ( | ||
| topic = 'orders', | ||
| schema_subject = 'orders-value', | ||
| confluent_wire_protocol = 'false' | ||
| ); | ||
| ---- | ||
| + | ||
| Redpanda SQL reads the registered schema and maps each top-level field to a SQL column. | ||
| + | ||
| Records produced through the Cloud Console don't carry the Confluent Schema Registry wire-format prefix, so the example sets `confluent_wire_protocol = 'false'`. If your producer client adds the wire format, set this option to `'true'` or omit it. | ||
|
|
||
| . Run SQL queries against the table. These examples use the `orders` schema from the optional sample data section. | ||
| + | ||
| View a sample of records: | ||
| + | ||
| [,sql] | ||
| ---- | ||
| SELECT * FROM default_redpanda_catalog=>orders LIMIT 10; | ||
| ---- | ||
| + | ||
| Count orders by status: | ||
| + | ||
| [,sql] | ||
| ---- | ||
| SELECT status, COUNT(*) AS total_orders | ||
| FROM default_redpanda_catalog=>orders | ||
| GROUP BY status; | ||
| ---- | ||
| + | ||
| Find the largest orders: | ||
| + | ||
| [,sql] | ||
| ---- | ||
| SELECT order_id, customer, product, amount | ||
| FROM default_redpanda_catalog=>orders | ||
| WHERE amount > 10000 | ||
| ORDER BY amount DESC | ||
| LIMIT 20; | ||
| ---- | ||
|
|
||
| == (Optional) Grant access to a non-admin user | ||
|
|
||
| Redpanda Cloud's data-plane RBAC controls Redpanda SQL access through two role permissions: | ||
|
|
||
| * *SQL: Manage*: superuser access to the SQL engine. A user with this role can read all topics, create catalogs and tables, and grant access to other users. | ||
| * *SQL: Access*: regular user access. A user with this role can connect to the SQL engine but has no access to any catalog or table until a SQL: Manage user grants it. | ||
|
|
||
| When you assign one of these roles to a user in Redpanda Cloud, the cluster provisions a corresponding user in the SQL engine. No manual `CREATE USER` is required. A SQL: Manage user then uses standard SQL `GRANT` statements to give the user access to specific catalogs or tables. Wildcard patterns are supported. | ||
|
|
||
| . In Redpanda Cloud, assign a role with the *SQL: Access* permission to the user. Roles are managed in *Organization IAM > Roles*; SQL permissions are under the *Data Plane* tab when you create or edit a role. See xref:security:authorization/rbac/rbac_dp.adoc[]. | ||
|
|
||
| . As a SQL: Manage user, grant `SELECT` on a specific table. The user identifier is the email on the user's Redpanda Cloud account: | ||
| + | ||
| [,sql] | ||
| ---- | ||
| GRANT SELECT ON TABLE default_redpanda_catalog=>orders TO "alice@example.com"; | ||
| ---- | ||
| + | ||
| Or grant `SELECT` on multiple tables that match a pattern: | ||
| + | ||
| [,sql] | ||
| ---- | ||
| GRANT SELECT ON TABLE default_redpanda_catalog=>orders_* TO "alice@example.com"; | ||
| ---- | ||
|
|
||
| The user can now connect to Redpanda SQL and run `SELECT` against the tables they've been granted. | ||
|
|
||
| == Next steps | ||
|
|
||
| // TODO: Uncomment once DOC-1990 and DOC-2006 merge (target pages are on those branches). | ||
| // * xref:sql:query-data/query-streaming-topics.adoc[Query streaming topics]: Map a Redpanda topic to a SQL table and run analytical queries against live streaming data. | ||
| // * xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg topics]: Run a single SQL query that spans live records and the Iceberg-translated history of a topic. | ||
| * xref:reference:sql/index.adoc[Redpanda SQL reference]: Explore the full SQL syntax, data types, functions, and clauses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double check request payload with @rpdevmp