Aggregate Functions

Navigation: Functions — Date / Time · Functions — Conditional

This page documents aggregate functions for summarizing and analyzing data.

COUNT
SUM
AVG
MIN
MAX
FIRST_VALUE
LAST_VALUE
ARRAY_AGG
STDDEV / VARIANCE family
PERCENTILE_CONT / PERCENTILE_DISC
ROW_NUMBER / RANK / DENSE_RANK (ranking windows)

Overview

Aggregate functions perform calculations on sets of rows and return a single result. They are commonly used with GROUP BY clauses to summarize data by categories.

Key Concepts:

Aggregate Functions: Operate on multiple rows to produce a single result
GROUP BY: Groups rows that have the same values in specified columns
HAVING: Filters groups based on aggregate conditions
Window Functions: Perform calculations across rows related to the current row

Function: COUNT

Description:
Count rows or non-null expressions. With DISTINCT counts distinct values.

Syntax:

COUNT(*)
COUNT(expr)
COUNT(DISTINCT expr)
COUNT(*) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

* - Count all rows (including NULLs)
expr - Count non-NULL values in expression
DISTINCT expr - Count distinct non-NULL values

Output:

BIGINT - Number of rows/values

NULL Handling:

COUNT(*) includes rows with NULL values
COUNT(expr) excludes NULL values
COUNT(DISTINCT expr) excludes NULL values

Examples:

Basic COUNT:

-- Count all rows
SELECT COUNT(*) AS total FROM emp;
-- Result: total = 42

-- Count non-NULL values
SELECT COUNT(manager) AS employees_with_manager FROM emp;

-- Count with WHERE
SELECT COUNT(*) AS it_employees 
FROM emp 
WHERE department = 'IT';

COUNT DISTINCT:

-- Count distinct salaries
SELECT COUNT(DISTINCT salary) AS distinct_salaries FROM emp;
-- Result: 8

-- Count distinct departments
SELECT COUNT(DISTINCT department) AS dept_count FROM emp;

-- Count distinct non-NULL emails
SELECT COUNT(DISTINCT email) AS unique_emails FROM users;

COUNT with GROUP BY:

-- Count employees per department
SELECT 
  department,
  COUNT(*) AS employee_count
FROM emp
GROUP BY department;

-- Count orders per customer
SELECT 
  customer_id,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

-- Count with multiple columns
SELECT 
  department,
  job_title,
  COUNT(*) AS count
FROM emp
GROUP BY department, job_title;

COUNT with HAVING:

-- Departments with more than 10 employees
SELECT 
  department,
  COUNT(*) AS count
FROM emp
GROUP BY department
HAVING COUNT(*) > 10;

-- Customers with multiple orders
SELECT 
  customer_id,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1;

COUNT vs COUNT(*):

-- COUNT(*) counts all rows
SELECT COUNT(*) AS total_rows FROM emp;
-- Result: 42 (includes rows with NULL values)

-- COUNT(column) counts non-NULL values
SELECT COUNT(email) AS rows_with_email FROM emp;
-- Result: 38 (excludes 4 NULL emails)

-- Difference shows NULL count
SELECT 
  COUNT(*) AS total,
  COUNT(email) AS with_email,
  COUNT(*) - COUNT(email) AS without_email
FROM emp;

Practical Examples:

1. Data quality check:

SELECT 
  COUNT(*) AS total_records,
  COUNT(email) AS records_with_email,
  COUNT(phone) AS records_with_phone,
  COUNT(CASE WHEN email IS NOT NULL AND phone IS NOT NULL THEN 1 END) AS complete_records
FROM contacts;

2. Completion rate:

SELECT 
  department,
  COUNT(*) AS total,
  COUNT(performance_review) AS reviewed,
  ROUND(COUNT(performance_review) * 100.0 / COUNT(*), 2) AS review_completion_rate
FROM employees
GROUP BY department;

3. Active users:

SELECT 
  DATE_TRUNC('month', login_date) AS month,
  COUNT(DISTINCT user_id) AS active_users
FROM user_logins
WHERE login_date >= DATE_SUB(CURRENT_DATE, INTERVAL 6 MONTH)
GROUP BY DATE_TRUNC('month', login_date)
ORDER BY month;

Function: SUM

Description:
Sum of values.

Syntax:

SUM(expr)
SUM(DISTINCT expr)
SUM(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

expr - Numeric expression (INT, DOUBLE, DECIMAL, etc.)
DISTINCT expr - Sum of distinct values only

Output:

NUMERIC - Sum of values (same type as input, or promoted)

NULL Handling:

NULL values are ignored
If all values are NULL, returns NULL
Empty set returns NULL

Examples:

Basic SUM:

-- Total salary
SELECT SUM(salary) AS total_salary FROM emp;

-- Total revenue
SELECT SUM(amount) AS total_revenue FROM sales;

-- Sum with WHERE
SELECT SUM(salary) AS it_total_salary 
FROM emp 
WHERE department = 'IT';

SUM with GROUP BY:

-- Total salary per department
SELECT 
  department,
  SUM(salary) AS total_salary
FROM emp
GROUP BY department;

-- Revenue per product
SELECT 
  product_id,
  SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id;

-- Monthly sales
SELECT 
  DATE_TRUNC('month', order_date) AS month,
  SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

SUM DISTINCT:

-- Sum of distinct salaries (removes duplicates)
SELECT SUM(DISTINCT salary) AS sum_distinct_salaries FROM emp;

-- Sum of unique prices
SELECT SUM(DISTINCT price) AS sum_unique_prices FROM products;

SUM with Calculations:

-- Total compensation (salary + bonus)
SELECT 
  department,
  SUM(salary + COALESCE(bonus, 0)) AS total_compensation
FROM emp
GROUP BY department;

-- Total order value with tax
SELECT 
  SUM(subtotal + tax + shipping) AS total_order_value
FROM orders;

-- Weighted average preparation
SELECT 
  SUM(score * weight) AS weighted_sum,
  SUM(weight) AS total_weight
FROM test_scores;

SUM with CASE:

-- Conditional sum
SELECT 
  SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) AS completed_revenue,
  SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) AS pending_revenue
FROM orders;

-- Sum by category
SELECT 
  department,
  SUM(CASE WHEN gender = 'M' THEN salary ELSE 0 END) AS male_total,
  SUM(CASE WHEN gender = 'F' THEN salary ELSE 0 END) AS female_total
FROM emp
GROUP BY department;

Practical Examples:

1. Financial summary:

SELECT 
  DATE_TRUNC('quarter', order_date) AS quarter,
  SUM(total_amount) AS revenue,
  SUM(cost) AS expenses,
  SUM(total_amount - cost) AS profit
FROM orders
GROUP BY DATE_TRUNC('quarter', order_date)
ORDER BY quarter;

2. Inventory value:

SELECT 
  category,
  SUM(quantity * unit_price) AS inventory_value
FROM inventory
GROUP BY category
ORDER BY inventory_value DESC;

3. Running total (with window function):

SELECT 
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders
ORDER BY order_date;

Function: AVG

Description:
Average of values.

Syntax:

AVG(expr)
AVG(DISTINCT expr)
AVG(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

expr - Numeric expression
DISTINCT expr - Average of distinct values only

Output:

DOUBLE - Average value

NULL Handling:

NULL values are ignored
If all values are NULL, returns NULL
Empty set returns NULL

Examples:

Basic AVG:

-- Average salary
SELECT AVG(salary) AS avg_salary FROM emp;

-- Average with WHERE
SELECT AVG(salary) AS avg_it_salary 
FROM emp 
WHERE department = 'IT';

-- Average order value
SELECT AVG(total_amount) AS avg_order_value FROM orders;

AVG with GROUP BY:

-- Average salary per department
SELECT 
  department,
  AVG(salary) AS avg_salary
FROM emp
GROUP BY department;

-- Average rating per product
SELECT 
  product_id,
  AVG(rating) AS avg_rating,
  COUNT(*) AS review_count
FROM reviews
GROUP BY product_id;

-- Average daily sales
SELECT 
  DATE_TRUNC('day', order_date) AS day,
  AVG(total_amount) AS avg_daily_order
FROM orders
GROUP BY DATE_TRUNC('day', order_date)
ORDER BY day;

AVG DISTINCT:

-- Average of distinct salaries
SELECT AVG(DISTINCT salary) AS avg_distinct_salary FROM emp;

-- Average of unique prices
SELECT 
  category,
  AVG(DISTINCT price) AS avg_unique_price
FROM products
GROUP BY category;

AVG with Rounding:

-- Round average to 2 decimals
SELECT 
  department,
  ROUND(AVG(salary), 2) AS avg_salary
FROM emp
GROUP BY department;

-- Format as currency
SELECT 
  CONCAT('$', ROUND(AVG(salary), 2)) AS avg_salary_formatted
FROM emp;

AVG vs Manual Calculation:

-- Using AVG function
SELECT AVG(salary) AS avg_salary FROM emp;

-- Manual calculation (equivalent)
SELECT SUM(salary) / COUNT(salary) AS avg_salary FROM emp;

-- Difference with NULL handling
SELECT 
  AVG(bonus) AS avg_with_function,
  SUM(bonus) / COUNT(*) AS avg_manual_all,
  SUM(bonus) / COUNT(bonus) AS avg_manual_non_null
FROM emp;

Practical Examples:

1. Performance metrics:

SELECT 
  employee_id,
  AVG(sales_amount) AS avg_sale,
  AVG(customer_rating) AS avg_rating,
  COUNT(*) AS total_sales
FROM sales
WHERE sale_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
GROUP BY employee_id
HAVING AVG(sales_amount) > 1000;

2. Grade analysis:

SELECT 
  course_id,
  AVG(score) AS avg_score,
  MIN(score) AS min_score,
  MAX(score) AS max_score,
  COUNT(*) AS student_count
FROM exam_results
GROUP BY course_id
ORDER BY avg_score DESC;

3. Response time analysis:

SELECT 
  service_name,
  AVG(response_time_ms) AS avg_response,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time_ms) AS p95_response
FROM api_logs
WHERE log_date >= CURRENT_DATE - INTERVAL 1 DAY
GROUP BY service_name;

Function: MIN

Description:
Minimum value in group.

Syntax:

MIN(expr)
MIN(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

expr - Any comparable type (numeric, string, date, etc.)

Output:

Same type as input

NULL Handling:

NULL values are ignored
If all values are NULL, returns NULL
Empty set returns NULL

Examples:

Basic MIN:

-- Minimum salary
SELECT MIN(salary) AS min_salary FROM emp;

-- Earliest hire date
SELECT MIN(hire_date) AS earliest FROM emp;

-- Lowest price
SELECT MIN(price) AS lowest_price FROM products;

MIN with Different Types:

-- Numeric MIN
SELECT MIN(age) AS youngest FROM users;

-- Date MIN
SELECT MIN(order_date) AS first_order FROM orders;

-- String MIN (alphabetically first)
SELECT MIN(name) AS first_alphabetically FROM products;

-- Timestamp MIN
SELECT MIN(created_at) AS earliest_record FROM logs;

MIN with GROUP BY:

-- Minimum salary per department
SELECT 
  department,
  MIN(salary) AS min_salary
FROM emp
GROUP BY department;

-- Earliest order per customer
SELECT 
  customer_id,
  MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id;

-- Lowest price per category
SELECT 
  category,
  MIN(price) AS min_price,
  MAX(price) AS max_price
FROM products
GROUP BY category;

MIN with WHERE:

-- Minimum salary in IT department
SELECT MIN(salary) AS min_it_salary 
FROM emp 
WHERE department = 'IT';

-- Earliest order in 2025
SELECT MIN(order_date) AS first_2025_order 
FROM orders 
WHERE YEAR(order_date) = 2025;

Practical Examples:

1. Find oldest/newest records:

SELECT 
  customer_id,
  MIN(order_date) AS first_order,
  MAX(order_date) AS last_order,
  DATEDIFF(MAX(order_date), MIN(order_date)) AS customer_lifetime_days
FROM orders
GROUP BY customer_id;

2. Price range analysis:

SELECT 
  category,
  MIN(price) AS min_price,
  AVG(price) AS avg_price,
  MAX(price) AS max_price,
  MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;

3. Performance bounds:

SELECT 
  server_name,
  MIN(response_time) AS best_response,
  AVG(response_time) AS avg_response,
  MAX(response_time) AS worst_response
FROM server_logs
WHERE log_date = CURRENT_DATE
GROUP BY server_name;

Function: MAX

Description:
Maximum value in group.

Syntax:

MAX(expr)
MAX(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

expr - Any comparable type (numeric, string, date, etc.)

Output:

Same type as input

NULL Handling:

NULL values are ignored
If all values are NULL, returns NULL
Empty set returns NULL

Examples:

Basic MAX:

-- Maximum salary
SELECT MAX(salary) AS top_salary FROM emp;

-- Latest hire date
SELECT MAX(hire_date) AS most_recent FROM emp;

-- Highest price
SELECT MAX(price) AS highest_price FROM products;

MAX with Different Types:

-- Numeric MAX
SELECT MAX(age) AS oldest FROM users;

-- Date MAX
SELECT MAX(order_date) AS last_order FROM orders;

-- String MAX (alphabetically last)
SELECT MAX(name) AS last_alphabetically FROM products;

-- Timestamp MAX
SELECT MAX(updated_at) AS latest_update FROM records;

MAX with GROUP BY:

-- Maximum salary per department
SELECT 
  department,
  MAX(salary) AS max_salary
FROM emp
GROUP BY department;

-- Latest order per customer
SELECT 
  customer_id,
  MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id;

-- Highest price per category
SELECT 
  category,
  MAX(price) AS max_price
FROM products
GROUP BY category;

MAX with WHERE:

-- Maximum salary in IT department
SELECT MAX(salary) AS max_it_salary 
FROM emp 
WHERE department = 'IT';

-- Latest order in 2025
SELECT MAX(order_date) AS last_2025_order 
FROM orders 
WHERE YEAR(order_date) = 2025;

Practical Examples:

1. Find top performers:

SELECT 
  department,
  MAX(salary) AS top_salary,
  AVG(salary) AS avg_salary,
  MAX(salary) - AVG(salary) AS gap_to_top
FROM emp
GROUP BY department;

2. Latest activity:

SELECT 
  user_id,
  MAX(login_date) AS last_login,
  DATEDIFF(CURRENT_DATE, MAX(login_date)) AS days_since_login
FROM user_logins
GROUP BY user_id
HAVING DATEDIFF(CURRENT_DATE, MAX(login_date)) > 30;

3. Peak values:

SELECT 
  DATE_TRUNC('day', timestamp) AS day,
  MAX(cpu_usage) AS peak_cpu,
  MAX(memory_usage) AS peak_memory,
  MAX(active_connections) AS peak_connections
FROM system_metrics
WHERE timestamp >= DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)
GROUP BY DATE_TRUNC('day', timestamp)
ORDER BY day;

Function: FIRST_VALUE

Description:
Window function: returns the first value in an ordered partition. Pushed as top_hits size=1 to Elasticsearch when possible.

Syntax:

FIRST_VALUE(expr) OVER (
  [PARTITION BY partition_expr, ...]
  [ORDER BY order_expr [ASC|DESC], ...]
)

Inputs:

expr - Expression to return
PARTITION BY - Optional grouping columns
ORDER BY - Ordering specification (if not provided, only expr column name is used for sorting)

Output:

Same type as input expression

Behavior:

Returns the first value based on ORDER BY within each partition
If OVER is not provided, only the expr column name is used for sorting
Optimized to Elasticsearch top_hits aggregation with size=1

Examples:

Basic FIRST_VALUE:

-- First salary in each department (ordered by hire date)
SELECT 
  department,
  name,
  salary,
  FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS first_salary
FROM emp;

Without PARTITION BY:

-- First hire across entire company
SELECT 
  name,
  hire_date,
  FIRST_VALUE(name) OVER (ORDER BY hire_date ASC) AS first_hired_employee
FROM emp;

Without OVER clause:

-- Uses expr column name for sorting
SELECT 
  department,
  FIRST_VALUE(salary) AS first_salary_value
FROM emp;

Multiple Partitions:

-- First employee hired in each department and job title
SELECT 
  department,
  job_title,
  name,
  hire_date,
  FIRST_VALUE(name) OVER (
    PARTITION BY department, job_title
    ORDER BY hire_date ASC
  ) AS first_in_role
FROM emp;

Practical Examples:

1. First purchase per customer:

SELECT 
  customer_id,
  order_id,
  order_date,
  total_amount,
  FIRST_VALUE(total_amount) OVER (
    PARTITION BY customer_id
    ORDER BY order_date ASC
  ) AS first_order_amount
FROM orders;

2. Initial stock price:

SELECT 
  stock_symbol,
  trade_date,
  closing_price,
  FIRST_VALUE(closing_price) OVER (
    PARTITION BY stock_symbol
    ORDER BY trade_date ASC
  ) AS initial_price
FROM stock_prices;

3. Baseline metrics:

SELECT 
  server_name,
  timestamp,
  cpu_usage,
  FIRST_VALUE(cpu_usage) OVER (
    PARTITION BY server_name
    ORDER BY timestamp ASC
  ) AS baseline_cpu
FROM server_metrics
WHERE DATE(timestamp) = CURRENT_DATE;

Comparison with MIN:

-- FIRST_VALUE (order-dependent)
SELECT 
  department,
  FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS first_hired_salary
FROM emp;

-- MIN (value-dependent)
SELECT 
  department,
  MIN(salary) AS lowest_salary
FROM emp
GROUP BY department;

Function: LAST_VALUE

Description:
Window function: returns the last value in an ordered partition. Pushed to Elasticsearch by flipping sort order in top_hits.

Syntax:

LAST_VALUE(expr) OVER (
  [PARTITION BY partition_expr, ...]
  [ORDER BY order_expr [ASC|DESC], ...]
)

Inputs:

expr - Expression to return
PARTITION BY - Optional grouping columns
ORDER BY - Ordering specification (if not provided, only expr column name is used for sorting)

Output:

Same type as input expression

Behavior:

Returns the last value based on ORDER BY within each partition
If OVER is not provided, only the expr column name is used for sorting
Optimized to Elasticsearch top_hits by reversing sort order

Examples:

Basic LAST_VALUE:

-- Last salary in each department (ordered by hire date)
SELECT 
  department,
  name,
  salary,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS last_salary
FROM emp;

Without PARTITION BY:

-- Most recent hire across entire company
SELECT 
  name,
  hire_date,
  LAST_VALUE(name) OVER (ORDER BY hire_date ASC) AS last_hired_employee
FROM emp;

Without OVER clause:

-- Uses expr column name for sorting
SELECT 
  department,
  LAST_VALUE(salary) AS last_salary_value
FROM emp;

Multiple Partitions:

-- Last employee hired in each department and job title
SELECT 
  department,
  job_title,
  name,
  hire_date,
  LAST_VALUE(name) OVER (
    PARTITION BY department, job_title
    ORDER BY hire_date ASC
  ) AS last_in_role
FROM emp;

Practical Examples:

1. Most recent purchase per customer:

SELECT 
  customer_id,
  order_id,
  order_date,
  total_amount,
  LAST_VALUE(total_amount) OVER (
    PARTITION BY customer_id
    ORDER BY order_date ASC
  ) AS last_order_amount
FROM orders;

2. Latest stock price:

SELECT 
  stock_symbol,
  trade_date,
  closing_price,
  LAST_VALUE(closing_price) OVER (
    PARTITION BY stock_symbol
    ORDER BY trade_date ASC
  ) AS current_price
FROM stock_prices;

3. Current status:

SELECT 
  user_id,
  status_change_date,
  status,
  LAST_VALUE(status) OVER (
    PARTITION BY user_id
    ORDER BY status_change_date ASC
  ) AS current_status
FROM user_status_history;

Comparison with MAX:

-- LAST_VALUE (order-dependent)
SELECT 
  department,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS last_hired_salary
FROM emp;

-- MAX (value-dependent)
SELECT 
  department,
  MAX(salary) AS highest_salary
FROM emp
GROUP BY department;

FIRST_VALUE vs LAST_VALUE:

-- Compare first and last values
SELECT 
  department,
  FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS first_hire_salary,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS last_hire_salary,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) - FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS salary_change
FROM emp;

Function: ARRAY_AGG

Description:
Collect values into an array for each partition. Implemented using OVER and pushed to Elasticsearch as top_hits. Post-processing converts hits to an array of scalars.

Syntax:

ARRAY_AGG(expr) OVER (
  [PARTITION BY partition_expr, ...]
  [ORDER BY order_expr [ASC|DESC], ...]
)

Inputs:

expr - Expression to collect
PARTITION BY - Optional grouping columns
ORDER BY - Optional ordering (if not provided, only expr column name is used for sorting)

Output:

ARRAY<type_of_expr> - Array of collected values

Behavior:

Collects all values of expr within each partition into an array
If OVER is not provided, only the expr column name is used for sorting
Optimized to Elasticsearch top_hits aggregation
Post-processing converts hits to array of scalars

Examples:

Basic ARRAY_AGG:

-- Collect employee names per department
SELECT 
  department,
  ARRAY_AGG(name) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
    LIMIT 100
  ) AS employees
FROM emp;
-- Result: employees as an array of name values per department (sorted and limited)

Without PARTITION BY:

-- Collect all employee names (ordered)
SELECT 
  ARRAY_AGG(name) OVER (ORDER BY hire_date ASC) AS all_employees
FROM emp;

Without OVER clause:

-- Uses expr column name for sorting
SELECT 
  department,
  ARRAY_AGG(name) AS employee_list
FROM emp;

With Multiple Columns:

-- Collect salaries per department
SELECT 
  department,
  ARRAY_AGG(salary) OVER (
    PARTITION BY department 
    ORDER BY salary DESC
  ) AS salary_list
FROM emp;

Ordered Collection:

-- Collect products by price (high to low)
SELECT 
  category,
  ARRAY_AGG(product_name) OVER (
    PARTITION BY category 
    ORDER BY price DESC
  ) AS products_by_price
FROM products
LIMIT 100;

Practical Examples:

1. Customer order history:

SELECT 
  customer_id,
  ARRAY_AGG(order_id) OVER (
    PARTITION BY customer_id 
    ORDER BY order_date DESC
  ) AS order_history,
  ARRAY_AGG(total_amount) OVER (
    PARTITION BY customer_id 
    ORDER BY order_date DESC
  ) AS amount_history
FROM orders
LIMIT 1000;

2. Product tags:

SELECT 
  product_id,
  product_name,
  ARRAY_AGG(tag) OVER (
    PARTITION BY product_id 
    ORDER BY tag ASC
  ) AS tags
FROM product_tags
LIMIT 500;

3. Timeline of events:

SELECT 
  user_id,
  ARRAY_AGG(event_type) OVER (
    PARTITION BY user_id 
    ORDER BY event_timestamp ASC
  ) AS event_timeline,
  ARRAY_AGG(event_timestamp) OVER (
    PARTITION BY user_id 
    ORDER BY event_timestamp ASC
  ) AS timestamp_timeline
FROM user_events
WHERE event_timestamp >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
LIMIT 1000;

4. Hierarchical data:

SELECT 
  manager_id,
  ARRAY_AGG(employee_name) OVER (
    PARTITION BY manager_id 
    ORDER BY hire_date ASC
  ) AS direct_reports
FROM employees
WHERE manager_id IS NOT NULL
LIMIT 100;

LIMIT Consideration:

-- Always use LIMIT with ARRAY_AGG to prevent memory issues
SELECT 
  department,
  ARRAY_AGG(name) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
    LIMIT 100  -- Important: limits result set size
  ) AS employees
FROM emp;

Function: STDDEV / VARIANCE family

Description:
The six ANSI statistical aggregates compute the standard deviation and variance of a numeric column. All map to a single Elasticsearch extended_stats aggregation per call; each function projects the matching field from the response.

STDDEV is an alias for STDDEV_SAMP, and VARIANCE is an alias for VAR_SAMP — i.e. both default to the sample (Bessel-corrected) form, matching PostgreSQL, Snowflake, and MySQL 8.0+. (MySQL 5.5 and earlier defaulted STDDEV to population.)

Syntax:

STDDEV(expr)  | STDDEV_SAMP(expr)  | STDDEV_POP(expr)
VARIANCE(expr) | VAR_SAMP(expr)     | VAR_POP(expr)

-- windowed form
STDDEV(expr) OVER (PARTITION BY partition_expr, ...)

Inputs:

expr - Numeric column
PARTITION BY - Optional grouping columns (windowed form)

Output:

DOUBLE

Function → Elasticsearch extended_stats field:

SQL	ES `extended_stats` field	Min ES version
`STDDEV(x)`	`std_deviation_sampling`	7.7+
`STDDEV_SAMP(x)`	`std_deviation_sampling`	7.7+
`STDDEV_POP(x)`	`std_deviation`	6+
`VARIANCE(x)`	`variance_sampling`	7.7+
`VAR_SAMP(x)`	`variance_sampling`	7.7+
`VAR_POP(x)`	`variance`	6+

Behavior:

NULL values are ignored.
The un-suffixed std_deviation / variance keys are the population values (present on Elasticsearch 6+); the _sampling keys are the sample values (introduced in Elasticsearch 7.7). Consequently the sample variants — including the default STDDEV / VARIANCE — require Elasticsearch 7.7+. On older clusters the column is returned as null and a warning is logged.
Each call emits its own extended_stats aggregation; two stat calls over the same column emit two aggregations.

Examples:

-- Per-group sample standard deviation and population variance
SELECT department,
       STDDEV(salary)  AS sd,
       VAR_POP(salary) AS vp
FROM emp
GROUP BY department;

-- Windowed sample variance per partition
SELECT name, salary,
       VARIANCE(salary) OVER (PARTITION BY department) AS v
FROM emp;

Function: PERCENTILE_CONT / PERCENTILE_DISC

Description:
Compute a percentile of a numeric column. PERCENTILE_CONT is continuous (interpolated); PERCENTILE_DISC is the discrete form. Both map to the Elasticsearch percentiles aggregation (TDigest). Elasticsearch has no native discrete percentile, so PERCENTILE_DISC is continuous-backed — it returns the same interpolated value as PERCENTILE_CONT rather than the nearest actual data point.

Syntax:

PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY column)
PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY column) OVER (PARTITION BY partition_expr, ...)
PERCENTILE_CONT(p) OVER (PARTITION BY partition_expr, ... ORDER BY column)
PERCENTILE_CONT(column, p)

Inputs:

p - Percentile fraction, a literal in [0, 1] (e.g. 0.99 for p99). Out-of-range values are rejected at parse time.
column - Numeric value column. Supplied by the ORDER BY clause (WITHIN GROUP or OVER), or the shorthand's first argument.
Grouping comes from OVER (PARTITION BY ...) or a top-level GROUP BY (or neither — one value over the whole result set).

Output:

DOUBLE

Behavior:

NULL values are ignored.
Multiple percentile calls on the same value column may be coalesced into a single Elasticsearch aggregation.
Works on Elasticsearch 6+.

Examples:

-- p99 request latency per endpoint (SRE latency analysis)
SELECT endpoint,
       PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99
FROM requests
GROUP BY endpoint;

-- OVER form: partition and value column both in OVER
SELECT name,
       PERCENTILE_CONT(0.95) OVER (PARTITION BY department ORDER BY salary) AS p95
FROM emp;

-- Column-first shorthand (BI-tool friendly)
SELECT department, PERCENTILE_CONT(salary, 0.5) AS median
FROM emp
GROUP BY department;

Function: ROW_NUMBER / RANK / DENSE_RANK (ranking windows)

Description: The three ANSI ranking window functions assign an ordinal to each row within a partition, ordered by the OVER (... ORDER BY ...) clause:

ROW_NUMBER() — sequential 1-based ordinals (1, 2, 3, ...); no ties recognized.
RANK() — ties share a rank and the next rank skips (1, 2, 2, 4, ...).
DENSE_RANK() — ties share a rank and the next rank does not skip (1, 2, 2, 3, ...).

ORDER BY is required inside OVER; PARTITION BY is optional (absent → the whole result set is one partition). A LIMIT N inside OVER is pushed into the Elasticsearch top_hits sub-aggregation, returning only the top-N rows per partition.

Syntax:

ROW_NUMBER() OVER ([PARTITION BY ...] ORDER BY ... [LIMIT N])
RANK()       OVER ([PARTITION BY ...] ORDER BY ... [LIMIT N])
DENSE_RANK() OVER ([PARTITION BY ...] ORDER BY ... [LIMIT N])

Example:

SELECT name, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn,
       RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rk,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr
FROM emp;

Full ranking-window details and top-N push-down examples are in DQL statements — ranking windows.