Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@
** xref:sql:get-started/index.adoc[Get Started]
*** xref:sql:get-started/sql-quickstart.adoc[Quickstart]
*** xref:sql:get-started/deploy-sql-cluster.adoc[Enable Redpanda SQL]
*** xref:sql:get-started/what-is-redpanda-sql.adoc[Overview]
*** xref:sql:get-started/overview.adoc[]
**** xref:sql:get-started/oltp-vs-olap.adoc[]
**** xref:sql:get-started/redpanda-sql-vs-postgresql.adoc[]
** xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL]
Expand Down
16 changes: 12 additions & 4 deletions modules/sql/pages/get-started/oltp-vs-olap.adoc
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
= OLTP vs OLAP
:description: Understand the difference between OLTP (transactional) and OLAP (analytical) processing, and why Redpanda SQL uses an OLAP model for querying Kafka data.
:description: Understand the difference between OLTP (transactional) and OLAP (analytical) processing, and why Redpanda SQL uses an OLAP model for querying streaming data.
:page-topic-type: concept
:personas: app_developer, data_engineer, evaluator
:learning-objective-1: Distinguish OLTP from OLAP processing patterns
:learning-objective-2: Explain why Redpanda SQL uses an OLAP model

Redpanda SQL uses an OLAP (Online Analytical Processing) model — optimized for analytical queries over large datasets — rather than the OLTP (Online Transaction Processing) model used by traditional relational databases. This makes OLAP suitable for querying Redpanda topics at scale. This page explains the differences between OLTP and OLAP and how they apply to querying data with Redpanda SQL.
Redpanda SQL uses an OLAP (Online Analytical Processing) model, optimized for analytical queries over large datasets, rather than the OLTP (Online Transaction Processing) model used by traditional relational databases. This makes OLAP suitable for querying Redpanda glossterm:topic[,topics] at scale. This page explains the differences between OLTP and OLAP and how they apply to querying data with Redpanda SQL.

After reading this page, you will be able to:

* [ ] {learning-objective-1}
* [ ] {learning-objective-2}

== What is OLTP?

Online Transaction Processing (OLTP) supports transaction-oriented applications under a 3-tier architecture (such as a https://en.wikipedia.org/wiki/Third_normal_form[3NF^] approach). OLTP usually administers day-to-day transactions through a relational database.
Online Transaction Processing (OLTP) supports transaction-oriented applications under a 3-tier architecture (such as a https://en.wikipedia.org/wiki/Third_normal_form[3NF^] approach). OLTP administers day-to-day transactions through a relational database.

Some daily use cases for transactional processing include:

Expand All @@ -17,7 +25,7 @@ Some daily use cases for transactional processing include:

== What is OLAP?

OLAP stands for Online Analytical Processing and provides data analysis for business decisions. With OLAP, you can get information on multiple databases and data types with the ability to analyze them at the same time, even with complex queries.
OLAP stands for Online Analytical Processing and provides data analysis for business decisions. With OLAP, you can query information across multiple databases and data types simultaneously, including complex queries.

Some examples of OLAP in business analytics include:

Expand Down
105 changes: 105 additions & 0 deletions modules/sql/pages/get-started/overview.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
= Redpanda SQL Overview
:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Redpanda topics with PostgreSQL syntax.
:page-topic-type: overview
:page-aliases: sql:get-started/what-is-redpanda-sql.adoc
:personas: app_developer, data_engineer, evaluator
:learning-objective-1: Identify scenarios where Redpanda SQL fits your analytical needs
:learning-objective-2: Identify the query patterns Redpanda SQL supports
:learning-objective-3: Describe the architectural characteristics that enable those patterns

Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.

Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems.

After reading this page, you will be able to:

* [ ] {learning-objective-1}
* [ ] {learning-objective-2}
* [ ] {learning-objective-3}

== Why use Redpanda SQL

Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place.

Redpanda SQL scales horizontally across multiple nodes within a cluster (up to 9 nodes) and uses hardware efficiently within each node, so analytical workloads can grow without proportional infrastructure cost.

== Primary use cases

* *Real-time analytics on data streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers.
* *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records, including records older than your topic retention.
* *Application-embedded operational analytics*: Run high-concurrency OLAP queries for dashboards and operational tools from any PostgreSQL client.

== What you can do with Redpanda SQL

Redpanda SQL exposes data through xref:sql:query-data/redpanda-catalogs.adoc[catalogs], which are named collections of source data exposed as queryable SQL tables. You can work with that data using two primary query patterns.

=== Query streaming topics

You can expose a Redpanda Streaming topic as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's glossterm:schema[] from glossterm:Schema Registry[] to map fields to SQL columns, and you query the table with `SELECT`:

[,sql]
----
CREATE TABLE default_redpanda_catalog=>orders WITH (
topic = 'orders',
schema_subject = 'orders-value'
);

SELECT customer_id, SUM(amount) AS total
FROM default_redpanda_catalog=>orders
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
----

Analysts and developers can run these queries directly from any PostgreSQL client without moving data into a separate analytics store.

=== Query Iceberg topics

When a Redpanda topic is configured for Iceberg translation, Redpanda SQL queries its Iceberg-committed data through the same SQL surface as live streaming topics, reading Parquet data and Iceberg metadata directly from cloud storage.

On Iceberg-enabled topics, a single SQL query can return both the live records and the Iceberg-committed history in one result, with no overlap between them. Redpanda SQL plans this union internally, so you don't need to write a `UNION ALL` and rows aren't duplicated at the boundary between live and historical data.

== Read-only query engine

Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics (with optional Iceberg translation), and you expose that data to Redpanda SQL through catalog mappings. This architecture lets you run analytical queries over streaming and historical data without duplicating or moving it.

== Architecture characteristics

Redpanda SQL is built from the ground up in C++ for analytical workloads, with a focus on resource efficiency. The following sections describe the core architectural decisions that shape its performance and scalability.

=== Vectorized query execution

Redpanda SQL uses a massively parallel processing (MPP) architecture at the core of its compute engine for high-performance processing. While MPP has been the standard in analytics systems for over a decade, Redpanda SQL takes a modern approach: a clean-slate system built from the ground up in C++, without JVM overhead or third-party engine components. This applies recent advancements in computer science to a fresh codebase, with a focus on <<optimized-data-transfer-between-cpu-and-ram,low-level optimizations that improve resource efficiency>> in the query engine and across the system.

=== Columnar storage optimization

Transactional (OLTP) databases like PostgreSQL or Microsoft SQL Server use a row-oriented design, optimized for high-frequency writes. Columnar storage, by contrast, targets analytical workloads, allowing for faster scans and more efficient aggregations.

=== Decoupled storage and compute

Redpanda SQL uses a decoupled storage and compute architecture. Compute resources can be scaled independently of storage, allowing for more efficient resource allocation, easier deployment, and better cost control.

=== Distributed, multi-node architecture

Redpanda SQL is distributed, running across multiple nodes in parallel for horizontal scaling. Adaptive query pipelines handle different operations efficiently across nodes, and execution strategies are selected at runtime based on workload characteristics for optimal performance in both single-node and multi-node setups.

=== PostgreSQL wire protocol and SQL dialect

Redpanda SQL uses its own declarative query language under the hood but exposes a xref:reference:sql/index.adoc[PostgreSQL-compatible SQL surface] to users, including the PostgreSQL wire protocol. This means you can connect with `psql`, JDBC, ODBC, or any other PostgreSQL client and write SQL using familiar syntax.

=== Optimized data transfer between CPU and RAM

Redpanda SQL applies low-level memory access and caching optimizations to keep analytical workloads CPU-cache efficient rather than memory-bandwidth-bound:

* User-space storage caches minimize overhead from kernel-level memory operations.
* A custom data format enhances data locality.
* Hybrid row/column formats allow better alignment with CPU cache lines and vectorized execution.
* Temporal access patterns help retain frequently used data in memory longer, reducing cache misses.

== Next steps

* xref:sql:get-started/sql-quickstart.adoc[Quickstart]: enable Redpanda SQL on a BYOC cluster and run your first query.
* xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL]: connect from psql, JDBC, PHP PDO, or .NET Dapper.
* xref:reference:sql/index.adoc[Redpanda SQL Reference]: supported SQL statements, clauses, data types, functions, and operators.
* xref:sql:get-started/oltp-vs-olap.adoc[OLTP vs OLAP]: understand why Redpanda SQL uses an analytical (OLAP) model.
* xref:sql:get-started/redpanda-sql-vs-postgresql.adoc[Redpanda SQL vs PostgreSQL]: supported functions, operators, and behavioral differences.
134 changes: 34 additions & 100 deletions modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the tables under Functions and Mathematical operators as they didn't seem to describe any actual differences from PostgreSQL, and so may not be worth keeping. Are there any actual known differences w.r.t. functions and operators (other than the one with JSON)?

Original file line number Diff line number Diff line change
@@ -1,118 +1,46 @@
= Redpanda SQL vs PostgreSQL
:description: Comparison of Redpanda SQL and PostgreSQL covering supported functions, operators, and behavioral differences.
:page-topic-type: concept
:page-topic-type: reference
:personas: app_developer, data_engineer
:learning-objective-1: Identify which PostgreSQL functions and operators Redpanda SQL supports
:learning-objective-2: Recognize behavioral differences between Redpanda SQL and PostgreSQL

Redpanda SQL aims for close compatibility with PostgreSQL but differs in some functions, operators, and behaviors. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL.
// TODO: Expand the comparison with engineering input on:
// - Data-type coverage gaps relative to PostgreSQL
// - System-catalog coverage and divergences
// - Authn/authz model differences (e.g., GBAC in Redpanda SQL)
// - Stored procedures and triggers (not yet documented as supported or not)
// - Connection-protocol and session-semantics limitations beyond no-transactions
// - Any other significant differences in supported SQL syntax, functions, or behavior that impact users migrating from PostgreSQL or using both systems in parallel

== Functions

=== Mathematical
Redpanda SQL aims for close compatibility with PostgreSQL semantics, yet differs significantly in design and function. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL.

A mathematical function operates on input values provided as arguments and returns a numeric value as the operation's output.
Use this reference to:

[cols="1,3,2,1",options="header"]
|===
|Function |Description |Example |Available in Redpanda SQL
* [ ] {learning-objective-1}
* [ ] {learning-objective-2}

|ABS
|Returns the absolute value of a number.
|`SELECT ABS(-11);`
|Yes
For example, PostgreSQL is an online transactional processing (OLTP) database by default, whereas Redpanda SQL is an online analytical processing (OLAP) query engine. For more on the distinction, see xref:sql:get-started/oltp-vs-olap.adoc[].

|CEIL
|Returns the value after rounding up any positive or negative value to the nearest largest integer.
|`SELECT CEIL(53.7);`
|Yes
Redpanda SQL doesn't support common PostgreSQL transaction-processing operations like direct writes or upserts.

|FLOOR
|Returns the value after rounding down any positive or negative decimal value to the nearest integer.
|`SELECT FLOOR(53.6);`
|Yes

|LN
|Returns the natural logarithm of a given number.
|`SELECT LN(3);`
|Yes

|RANDOM
|Returns a random value between 0 and 1.
|`SELECT RANDOM();`
|Yes

|SQRT
|Returns the square root of a given positive number.
|`SELECT SQRT(225);`
|Yes
|===
Instead, Kafka-compatible producers write topics into Redpanda Streaming. From there, Redpanda SQL queries topics in local storage (the "hot" tier) as well as Apache Iceberg-compatible tables in object storage (the "cold" tier). A single query against a linked Redpanda catalog returns records from both tiers in one result, with no overlap between the live tail and the Iceberg-translated history, including records older than your topic retention.

=== Trigonometric
Redpanda SQL is _semantically_ compatible with PostgreSQL but not _code_ compatible. It can't use common PostgreSQL extensions such as pgvector, PostGIS, or pg_cron.

[cols="1,3,2,1",options="header"]
|===
|Function |Description |Example |Available in Redpanda SQL
== Functions

|SIN
|Returns the sine of the specified radian.
|`SELECT sin(0.2);`
|Yes
|===
// TODO: SME confirmation — are there meaningful function-level differences
// from PostgreSQL beyond the JSON operator? If not, remove this section
// entirely. Open question on PR #573 review thread
// (https://github.com/redpanda-data/cloud-docs/pull/573#discussion_r3228320300).

== Operators

=== Mathematical operators

[cols="1,1,2,1,1",options="header"]
|===
|Operator |Description |Example |Result |Available in Redpanda SQL

|`+`
|Addition
|`SELECT 5 + 8;`
|`13`
|Yes

|`-`
|Subtraction
|`SELECT 2 - 3;`
|`-1`
|Yes

|`-`
|Negation
|`SELECT -4;`
|`-4`
|Yes

|`*`
|Multiplication
|`SELECT 3 * 3;`
|`9`
|Yes

|`/`
|Division
|`SELECT 10 / 2;`
|`5`
|Yes

|`%`
|Modulo
|`SELECT 20 % 3;`
|`2`
|Yes

|`&`
|Bitwise AND
|`SELECT 91 & 15;`
|`11`
|Yes

|`#`
|Bitwise XOR
|`SELECT 17 # 5;`
|`20`
|Yes
|===
// TODO: SME confirmation — same question as for Functions above. Remove this
// subsection if there are no meaningful mathematical-operator differences.

=== JSON operators

Expand Down Expand Up @@ -183,11 +111,11 @@ SELECT ABS(-1.0);
* Redpanda SQL returns `1`
* PostgreSQL returns `1.0`

== Error differences
== Error-handling differences

[cols="1,2,2,2",options="header"]
|===
|Function |Input |Output Redpanda SQL |Output PostgreSQL
|Function |Input |Output (Redpanda SQL) |Output (PostgreSQL)

|LN
|`LN(0)`
Expand All @@ -214,3 +142,9 @@ SELECT ABS(-1.0);
|_unknown function pi_
|working as expected
|===

== Suggested reading

* xref:sql:get-started/overview.adoc[Redpanda SQL Overview]: what Redpanda SQL is, how it fits into Redpanda Cloud BYOC, and the analytical workloads it supports.
* xref:sql:get-started/oltp-vs-olap.adoc[OLTP vs OLAP]: why Redpanda SQL uses an analytical (OLAP) model rather than the transactional (OLTP) model used by traditional relational databases.
* xref:reference:sql/index.adoc[Redpanda SQL Reference]: supported SQL statements, clauses, data types, functions, and operators.
Loading