Navigation: Functions — Date / Time · Functions — Conditional
This page documents aggregate functions for summarizing and analyzing data.
- COUNT
- SUM
- AVG
- MIN
- MAX
- FIRST_VALUE
- LAST_VALUE
- ARRAY_AGG
- STDDEV / VARIANCE family
- PERCENTILE_CONT / PERCENTILE_DISC
- ROW_NUMBER / RANK / DENSE_RANK (ranking windows)
Aggregate functions perform calculations on sets of rows and return a single result. They are commonly used with GROUP BY clauses to summarize data by categories.
Key Concepts:
- Aggregate Functions: Operate on multiple rows to produce a single result
- GROUP BY: Groups rows that have the same values in specified columns
- HAVING: Filters groups based on aggregate conditions
- Window Functions: Perform calculations across rows related to the current row
Description:
Count rows or non-null expressions. With DISTINCT counts distinct values.
Syntax:
COUNT(*)
COUNT(expr)
COUNT(DISTINCT expr)
COUNT(*) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)Inputs:
*- Count all rows (including NULLs)expr- Count non-NULL values in expressionDISTINCT expr- Count distinct non-NULL values
Output:
BIGINT- Number of rows/values
NULL Handling:
COUNT(*)includes rows with NULL valuesCOUNT(expr)excludes NULL valuesCOUNT(DISTINCT expr)excludes NULL values
Examples:
Basic COUNT:
-- Count all rows
SELECT COUNT(*) AS total FROM emp;
-- Result: total = 42
-- Count non-NULL values
SELECT COUNT(manager) AS employees_with_manager FROM emp;
-- Count with WHERE
SELECT COUNT(*) AS it_employees
FROM emp
WHERE department = 'IT';COUNT DISTINCT:
-- Count distinct salaries
SELECT COUNT(DISTINCT salary) AS distinct_salaries FROM emp;
-- Result: 8
-- Count distinct departments
SELECT COUNT(DISTINCT department) AS dept_count FROM emp;
-- Count distinct non-NULL emails
SELECT COUNT(DISTINCT email) AS unique_emails FROM users;COUNT with GROUP BY:
-- Count employees per department
SELECT
department,
COUNT(*) AS employee_count
FROM emp
GROUP BY department;
-- Count orders per customer
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
-- Count with multiple columns
SELECT
department,
job_title,
COUNT(*) AS count
FROM emp
GROUP BY department, job_title;COUNT with HAVING:
-- Departments with more than 10 employees
SELECT
department,
COUNT(*) AS count
FROM emp
GROUP BY department
HAVING COUNT(*) > 10;
-- Customers with multiple orders
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1;COUNT vs COUNT(*):
-- COUNT(*) counts all rows
SELECT COUNT(*) AS total_rows FROM emp;
-- Result: 42 (includes rows with NULL values)
-- COUNT(column) counts non-NULL values
SELECT COUNT(email) AS rows_with_email FROM emp;
-- Result: 38 (excludes 4 NULL emails)
-- Difference shows NULL count
SELECT
COUNT(*) AS total,
COUNT(email) AS with_email,
COUNT(*) - COUNT(email) AS without_email
FROM emp;Practical Examples:
1. Data quality check:
SELECT
COUNT(*) AS total_records,
COUNT(email) AS records_with_email,
COUNT(phone) AS records_with_phone,
COUNT(CASE WHEN email IS NOT NULL AND phone IS NOT NULL THEN 1 END) AS complete_records
FROM contacts;2. Completion rate:
SELECT
department,
COUNT(*) AS total,
COUNT(performance_review) AS reviewed,
ROUND(COUNT(performance_review) * 100.0 / COUNT(*), 2) AS review_completion_rate
FROM employees
GROUP BY department;3. Active users:
SELECT
DATE_TRUNC('month', login_date) AS month,
COUNT(DISTINCT user_id) AS active_users
FROM user_logins
WHERE login_date >= DATE_SUB(CURRENT_DATE, INTERVAL 6 MONTH)
GROUP BY DATE_TRUNC('month', login_date)
ORDER BY month;Description:
Sum of values.
Syntax:
SUM(expr)
SUM(DISTINCT expr)
SUM(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)Inputs:
expr- Numeric expression (INT,DOUBLE,DECIMAL, etc.)DISTINCT expr- Sum of distinct values only
Output:
NUMERIC- Sum of values (same type as input, or promoted)
NULL Handling:
- NULL values are ignored
- If all values are NULL, returns NULL
- Empty set returns NULL
Examples:
Basic SUM:
-- Total salary
SELECT SUM(salary) AS total_salary FROM emp;
-- Total revenue
SELECT SUM(amount) AS total_revenue FROM sales;
-- Sum with WHERE
SELECT SUM(salary) AS it_total_salary
FROM emp
WHERE department = 'IT';SUM with GROUP BY:
-- Total salary per department
SELECT
department,
SUM(salary) AS total_salary
FROM emp
GROUP BY department;
-- Revenue per product
SELECT
product_id,
SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id;
-- Monthly sales
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;SUM DISTINCT:
-- Sum of distinct salaries (removes duplicates)
SELECT SUM(DISTINCT salary) AS sum_distinct_salaries FROM emp;
-- Sum of unique prices
SELECT SUM(DISTINCT price) AS sum_unique_prices FROM products;SUM with Calculations:
-- Total compensation (salary + bonus)
SELECT
department,
SUM(salary + COALESCE(bonus, 0)) AS total_compensation
FROM emp
GROUP BY department;
-- Total order value with tax
SELECT
SUM(subtotal + tax + shipping) AS total_order_value
FROM orders;
-- Weighted average preparation
SELECT
SUM(score * weight) AS weighted_sum,
SUM(weight) AS total_weight
FROM test_scores;SUM with CASE:
-- Conditional sum
SELECT
SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) AS completed_revenue,
SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) AS pending_revenue
FROM orders;
-- Sum by category
SELECT
department,
SUM(CASE WHEN gender = 'M' THEN salary ELSE 0 END) AS male_total,
SUM(CASE WHEN gender = 'F' THEN salary ELSE 0 END) AS female_total
FROM emp
GROUP BY department;Practical Examples:
1. Financial summary:
SELECT
DATE_TRUNC('quarter', order_date) AS quarter,
SUM(total_amount) AS revenue,
SUM(cost) AS expenses,
SUM(total_amount - cost) AS profit
FROM orders
GROUP BY DATE_TRUNC('quarter', order_date)
ORDER BY quarter;2. Inventory value:
SELECT
category,
SUM(quantity * unit_price) AS inventory_value
FROM inventory
GROUP BY category
ORDER BY inventory_value DESC;3. Running total (with window function):
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders
ORDER BY order_date;Description:
Average of values.
Syntax:
AVG(expr)
AVG(DISTINCT expr)
AVG(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)Inputs:
expr- Numeric expressionDISTINCT expr- Average of distinct values only
Output:
DOUBLE- Average value
NULL Handling:
- NULL values are ignored
- If all values are NULL, returns NULL
- Empty set returns NULL
Examples:
Basic AVG:
-- Average salary
SELECT AVG(salary) AS avg_salary FROM emp;
-- Average with WHERE
SELECT AVG(salary) AS avg_it_salary
FROM emp
WHERE department = 'IT';
-- Average order value
SELECT AVG(total_amount) AS avg_order_value FROM orders;AVG with GROUP BY:
-- Average salary per department
SELECT
department,
AVG(salary) AS avg_salary
FROM emp
GROUP BY department;
-- Average rating per product
SELECT
product_id,
AVG(rating) AS avg_rating,
COUNT(*) AS review_count
FROM reviews
GROUP BY product_id;
-- Average daily sales
SELECT
DATE_TRUNC('day', order_date) AS day,
AVG(total_amount) AS avg_daily_order
FROM orders
GROUP BY DATE_TRUNC('day', order_date)
ORDER BY day;AVG DISTINCT:
-- Average of distinct salaries
SELECT AVG(DISTINCT salary) AS avg_distinct_salary FROM emp;
-- Average of unique prices
SELECT
category,
AVG(DISTINCT price) AS avg_unique_price
FROM products
GROUP BY category;AVG with Rounding:
-- Round average to 2 decimals
SELECT
department,
ROUND(AVG(salary), 2) AS avg_salary
FROM emp
GROUP BY department;
-- Format as currency
SELECT
CONCAT('$', ROUND(AVG(salary), 2)) AS avg_salary_formatted
FROM emp;AVG vs Manual Calculation:
-- Using AVG function
SELECT AVG(salary) AS avg_salary FROM emp;
-- Manual calculation (equivalent)
SELECT SUM(salary) / COUNT(salary) AS avg_salary FROM emp;
-- Difference with NULL handling
SELECT
AVG(bonus) AS avg_with_function,
SUM(bonus) / COUNT(*) AS avg_manual_all,
SUM(bonus) / COUNT(bonus) AS avg_manual_non_null
FROM emp;Practical Examples:
1. Performance metrics:
SELECT
employee_id,
AVG(sales_amount) AS avg_sale,
AVG(customer_rating) AS avg_rating,
COUNT(*) AS total_sales
FROM sales
WHERE sale_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
GROUP BY employee_id
HAVING AVG(sales_amount) > 1000;2. Grade analysis:
SELECT
course_id,
AVG(score) AS avg_score,
MIN(score) AS min_score,
MAX(score) AS max_score,
COUNT(*) AS student_count
FROM exam_results
GROUP BY course_id
ORDER BY avg_score DESC;3. Response time analysis:
SELECT
service_name,
AVG(response_time_ms) AS avg_response,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time_ms) AS p95_response
FROM api_logs
WHERE log_date >= CURRENT_DATE - INTERVAL 1 DAY
GROUP BY service_name;Description:
Minimum value in group.
Syntax:
MIN(expr)
MIN(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)Inputs:
expr- Any comparable type (numeric, string, date, etc.)
Output:
- Same type as input
NULL Handling:
- NULL values are ignored
- If all values are NULL, returns NULL
- Empty set returns NULL
Examples:
Basic MIN:
-- Minimum salary
SELECT MIN(salary) AS min_salary FROM emp;
-- Earliest hire date
SELECT MIN(hire_date) AS earliest FROM emp;
-- Lowest price
SELECT MIN(price) AS lowest_price FROM products;MIN with Different Types:
-- Numeric MIN
SELECT MIN(age) AS youngest FROM users;
-- Date MIN
SELECT MIN(order_date) AS first_order FROM orders;
-- String MIN (alphabetically first)
SELECT MIN(name) AS first_alphabetically FROM products;
-- Timestamp MIN
SELECT MIN(created_at) AS earliest_record FROM logs;MIN with GROUP BY:
-- Minimum salary per department
SELECT
department,
MIN(salary) AS min_salary
FROM emp
GROUP BY department;
-- Earliest order per customer
SELECT
customer_id,
MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id;
-- Lowest price per category
SELECT
category,
MIN(price) AS min_price,
MAX(price) AS max_price
FROM products
GROUP BY category;MIN with WHERE:
-- Minimum salary in IT department
SELECT MIN(salary) AS min_it_salary
FROM emp
WHERE department = 'IT';
-- Earliest order in 2025
SELECT MIN(order_date) AS first_2025_order
FROM orders
WHERE YEAR(order_date) = 2025;Practical Examples:
1. Find oldest/newest records:
SELECT
customer_id,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order,
DATEDIFF(MAX(order_date), MIN(order_date)) AS customer_lifetime_days
FROM orders
GROUP BY customer_id;2. Price range analysis:
SELECT
category,
MIN(price) AS min_price,
AVG(price) AS avg_price,
MAX(price) AS max_price,
MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;3. Performance bounds:
SELECT
server_name,
MIN(response_time) AS best_response,
AVG(response_time) AS avg_response,
MAX(response_time) AS worst_response
FROM server_logs
WHERE log_date = CURRENT_DATE
GROUP BY server_name;Description:
Maximum value in group.
Syntax:
MAX(expr)
MAX(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)Inputs:
expr- Any comparable type (numeric, string, date, etc.)
Output:
- Same type as input
NULL Handling:
- NULL values are ignored
- If all values are NULL, returns NULL
- Empty set returns NULL
Examples:
Basic MAX:
-- Maximum salary
SELECT MAX(salary) AS top_salary FROM emp;
-- Latest hire date
SELECT MAX(hire_date) AS most_recent FROM emp;
-- Highest price
SELECT MAX(price) AS highest_price FROM products;MAX with Different Types:
-- Numeric MAX
SELECT MAX(age) AS oldest FROM users;
-- Date MAX
SELECT MAX(order_date) AS last_order FROM orders;
-- String MAX (alphabetically last)
SELECT MAX(name) AS last_alphabetically FROM products;
-- Timestamp MAX
SELECT MAX(updated_at) AS latest_update FROM records;MAX with GROUP BY:
-- Maximum salary per department
SELECT
department,
MAX(salary) AS max_salary
FROM emp
GROUP BY department;
-- Latest order per customer
SELECT
customer_id,
MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id;
-- Highest price per category
SELECT
category,
MAX(price) AS max_price
FROM products
GROUP BY category;MAX with WHERE:
-- Maximum salary in IT department
SELECT MAX(salary) AS max_it_salary
FROM emp
WHERE department = 'IT';
-- Latest order in 2025
SELECT MAX(order_date) AS last_2025_order
FROM orders
WHERE YEAR(order_date) = 2025;Practical Examples:
1. Find top performers:
SELECT
department,
MAX(salary) AS top_salary,
AVG(salary) AS avg_salary,
MAX(salary) - AVG(salary) AS gap_to_top
FROM emp
GROUP BY department;2. Latest activity:
SELECT
user_id,
MAX(login_date) AS last_login,
DATEDIFF(CURRENT_DATE, MAX(login_date)) AS days_since_login
FROM user_logins
GROUP BY user_id
HAVING DATEDIFF(CURRENT_DATE, MAX(login_date)) > 30;3. Peak values:
SELECT
DATE_TRUNC('day', timestamp) AS day,
MAX(cpu_usage) AS peak_cpu,
MAX(memory_usage) AS peak_memory,
MAX(active_connections) AS peak_connections
FROM system_metrics
WHERE timestamp >= DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)
GROUP BY DATE_TRUNC('day', timestamp)
ORDER BY day;Description:
Window function: returns the first value in an ordered partition. Pushed as top_hits size=1 to Elasticsearch when possible.
Syntax:
FIRST_VALUE(expr) OVER (
[PARTITION BY partition_expr, ...]
[ORDER BY order_expr [ASC|DESC], ...]
)Inputs:
expr- Expression to returnPARTITION BY- Optional grouping columnsORDER BY- Ordering specification (if not provided, only expr column name is used for sorting)
Output:
- Same type as input expression
Behavior:
- Returns the first value based on
ORDER BYwithin each partition - If
OVERis not provided, only the expr column name is used for sorting - Optimized to Elasticsearch
top_hitsaggregation withsize=1
Examples:
Basic FIRST_VALUE:
-- First salary in each department (ordered by hire date)
SELECT
department,
name,
salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) AS first_salary
FROM emp;Without PARTITION BY:
-- First hire across entire company
SELECT
name,
hire_date,
FIRST_VALUE(name) OVER (ORDER BY hire_date ASC) AS first_hired_employee
FROM emp;Without OVER clause:
-- Uses expr column name for sorting
SELECT
department,
FIRST_VALUE(salary) AS first_salary_value
FROM emp;Multiple Partitions:
-- First employee hired in each department and job title
SELECT
department,
job_title,
name,
hire_date,
FIRST_VALUE(name) OVER (
PARTITION BY department, job_title
ORDER BY hire_date ASC
) AS first_in_role
FROM emp;Practical Examples:
1. First purchase per customer:
SELECT
customer_id,
order_id,
order_date,
total_amount,
FIRST_VALUE(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date ASC
) AS first_order_amount
FROM orders;2. Initial stock price:
SELECT
stock_symbol,
trade_date,
closing_price,
FIRST_VALUE(closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY trade_date ASC
) AS initial_price
FROM stock_prices;3. Baseline metrics:
SELECT
server_name,
timestamp,
cpu_usage,
FIRST_VALUE(cpu_usage) OVER (
PARTITION BY server_name
ORDER BY timestamp ASC
) AS baseline_cpu
FROM server_metrics
WHERE DATE(timestamp) = CURRENT_DATE;Comparison with MIN:
-- FIRST_VALUE (order-dependent)
SELECT
department,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) AS first_hired_salary
FROM emp;
-- MIN (value-dependent)
SELECT
department,
MIN(salary) AS lowest_salary
FROM emp
GROUP BY department;Description:
Window function: returns the last value in an ordered partition. Pushed to Elasticsearch by flipping sort order in top_hits.
Syntax:
LAST_VALUE(expr) OVER (
[PARTITION BY partition_expr, ...]
[ORDER BY order_expr [ASC|DESC], ...]
)Inputs:
expr- Expression to returnPARTITION BY- Optional grouping columnsORDER BY- Ordering specification (if not provided, only expr column name is used for sorting)
Output:
- Same type as input expression
Behavior:
- Returns the last value based on
ORDER BYwithin each partition - If
OVERis not provided, only the expr column name is used for sorting - Optimized to Elasticsearch
top_hitsby reversing sort order
Examples:
Basic LAST_VALUE:
-- Last salary in each department (ordered by hire date)
SELECT
department,
name,
salary,
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) AS last_salary
FROM emp;Without PARTITION BY:
-- Most recent hire across entire company
SELECT
name,
hire_date,
LAST_VALUE(name) OVER (ORDER BY hire_date ASC) AS last_hired_employee
FROM emp;Without OVER clause:
-- Uses expr column name for sorting
SELECT
department,
LAST_VALUE(salary) AS last_salary_value
FROM emp;Multiple Partitions:
-- Last employee hired in each department and job title
SELECT
department,
job_title,
name,
hire_date,
LAST_VALUE(name) OVER (
PARTITION BY department, job_title
ORDER BY hire_date ASC
) AS last_in_role
FROM emp;Practical Examples:
1. Most recent purchase per customer:
SELECT
customer_id,
order_id,
order_date,
total_amount,
LAST_VALUE(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date ASC
) AS last_order_amount
FROM orders;2. Latest stock price:
SELECT
stock_symbol,
trade_date,
closing_price,
LAST_VALUE(closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY trade_date ASC
) AS current_price
FROM stock_prices;3. Current status:
SELECT
user_id,
status_change_date,
status,
LAST_VALUE(status) OVER (
PARTITION BY user_id
ORDER BY status_change_date ASC
) AS current_status
FROM user_status_history;Comparison with MAX:
-- LAST_VALUE (order-dependent)
SELECT
department,
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) AS last_hired_salary
FROM emp;
-- MAX (value-dependent)
SELECT
department,
MAX(salary) AS highest_salary
FROM emp
GROUP BY department;FIRST_VALUE vs LAST_VALUE:
-- Compare first and last values
SELECT
department,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) AS first_hire_salary,
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) AS last_hire_salary,
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) - FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date ASC
) AS salary_change
FROM emp;Description:
Collect values into an array for each partition. Implemented using OVER and pushed to Elasticsearch as top_hits. Post-processing converts hits to an array of scalars.
Syntax:
ARRAY_AGG(expr) OVER (
[PARTITION BY partition_expr, ...]
[ORDER BY order_expr [ASC|DESC], ...]
)Inputs:
expr- Expression to collectPARTITION BY- Optional grouping columnsORDER BY- Optional ordering (if not provided, only expr column name is used for sorting)
Output:
ARRAY<type_of_expr>- Array of collected values
Behavior:
- Collects all values of
exprwithin each partition into an array - If
OVERis not provided, only the expr column name is used for sorting - Optimized to Elasticsearch
top_hitsaggregation - Post-processing converts hits to array of scalars
Examples:
Basic ARRAY_AGG:
-- Collect employee names per department
SELECT
department,
ARRAY_AGG(name) OVER (
PARTITION BY department
ORDER BY hire_date ASC
LIMIT 100
) AS employees
FROM emp;
-- Result: employees as an array of name values per department (sorted and limited)Without PARTITION BY:
-- Collect all employee names (ordered)
SELECT
ARRAY_AGG(name) OVER (ORDER BY hire_date ASC) AS all_employees
FROM emp;Without OVER clause:
-- Uses expr column name for sorting
SELECT
department,
ARRAY_AGG(name) AS employee_list
FROM emp;With Multiple Columns:
-- Collect salaries per department
SELECT
department,
ARRAY_AGG(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_list
FROM emp;Ordered Collection:
-- Collect products by price (high to low)
SELECT
category,
ARRAY_AGG(product_name) OVER (
PARTITION BY category
ORDER BY price DESC
) AS products_by_price
FROM products
LIMIT 100;Practical Examples:
1. Customer order history:
SELECT
customer_id,
ARRAY_AGG(order_id) OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS order_history,
ARRAY_AGG(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS amount_history
FROM orders
LIMIT 1000;2. Product tags:
SELECT
product_id,
product_name,
ARRAY_AGG(tag) OVER (
PARTITION BY product_id
ORDER BY tag ASC
) AS tags
FROM product_tags
LIMIT 500;3. Timeline of events:
SELECT
user_id,
ARRAY_AGG(event_type) OVER (
PARTITION BY user_id
ORDER BY event_timestamp ASC
) AS event_timeline,
ARRAY_AGG(event_timestamp) OVER (
PARTITION BY user_id
ORDER BY event_timestamp ASC
) AS timestamp_timeline
FROM user_events
WHERE event_timestamp >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
LIMIT 1000;4. Hierarchical data:
SELECT
manager_id,
ARRAY_AGG(employee_name) OVER (
PARTITION BY manager_id
ORDER BY hire_date ASC
) AS direct_reports
FROM employees
WHERE manager_id IS NOT NULL
LIMIT 100;LIMIT Consideration:
-- Always use LIMIT with ARRAY_AGG to prevent memory issues
SELECT
department,
ARRAY_AGG(name) OVER (
PARTITION BY department
ORDER BY hire_date ASC
LIMIT 100 -- Important: limits result set size
) AS employees
FROM emp;Description:
The six ANSI statistical aggregates compute the standard deviation and variance of a numeric column. All map to a single Elasticsearch extended_stats aggregation per call; each function projects the matching field from the response.
STDDEV is an alias for STDDEV_SAMP, and VARIANCE is an alias for VAR_SAMP — i.e. both default to the sample (Bessel-corrected) form, matching PostgreSQL, Snowflake, and MySQL 8.0+. (MySQL 5.5 and earlier defaulted STDDEV to population.)
Syntax:
STDDEV(expr) | STDDEV_SAMP(expr) | STDDEV_POP(expr)
VARIANCE(expr) | VAR_SAMP(expr) | VAR_POP(expr)
-- windowed form
STDDEV(expr) OVER (PARTITION BY partition_expr, ...)Inputs:
expr- Numeric columnPARTITION BY- Optional grouping columns (windowed form)
Output:
DOUBLE
Function → Elasticsearch extended_stats field:
| SQL | ES extended_stats field |
Min ES version |
|---|---|---|
STDDEV(x) |
std_deviation_sampling |
7.7+ |
STDDEV_SAMP(x) |
std_deviation_sampling |
7.7+ |
STDDEV_POP(x) |
std_deviation |
6+ |
VARIANCE(x) |
variance_sampling |
7.7+ |
VAR_SAMP(x) |
variance_sampling |
7.7+ |
VAR_POP(x) |
variance |
6+ |
Behavior:
NULLvalues are ignored.- The un-suffixed
std_deviation/variancekeys are the population values (present on Elasticsearch 6+); the_samplingkeys are the sample values (introduced in Elasticsearch 7.7). Consequently the sample variants — including the defaultSTDDEV/VARIANCE— require Elasticsearch 7.7+. On older clusters the column is returned asnulland a warning is logged. - Each call emits its own
extended_statsaggregation; two stat calls over the same column emit two aggregations.
Examples:
-- Per-group sample standard deviation and population variance
SELECT department,
STDDEV(salary) AS sd,
VAR_POP(salary) AS vp
FROM emp
GROUP BY department;
-- Windowed sample variance per partition
SELECT name, salary,
VARIANCE(salary) OVER (PARTITION BY department) AS v
FROM emp;Description:
Compute a percentile of a numeric column. PERCENTILE_CONT is continuous (interpolated); PERCENTILE_DISC is the discrete form. Both map to the Elasticsearch percentiles aggregation (TDigest). Elasticsearch has no native discrete percentile, so PERCENTILE_DISC is continuous-backed — it returns the same interpolated value as PERCENTILE_CONT rather than the nearest actual data point.
Syntax:
PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY column)
PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY column) OVER (PARTITION BY partition_expr, ...)
PERCENTILE_CONT(p) OVER (PARTITION BY partition_expr, ... ORDER BY column)
PERCENTILE_CONT(column, p)Inputs:
p- Percentile fraction, a literal in[0, 1](e.g.0.99for p99). Out-of-range values are rejected at parse time.column- Numeric value column. Supplied by theORDER BYclause (WITHIN GROUPorOVER), or the shorthand's first argument.- Grouping comes from
OVER (PARTITION BY ...)or a top-levelGROUP BY(or neither — one value over the whole result set).
Output:
DOUBLE
Behavior:
NULLvalues are ignored.- Multiple percentile calls on the same value column may be coalesced into a single Elasticsearch aggregation.
- Works on Elasticsearch 6+.
Examples:
-- p99 request latency per endpoint (SRE latency analysis)
SELECT endpoint,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99
FROM requests
GROUP BY endpoint;
-- OVER form: partition and value column both in OVER
SELECT name,
PERCENTILE_CONT(0.95) OVER (PARTITION BY department ORDER BY salary) AS p95
FROM emp;
-- Column-first shorthand (BI-tool friendly)
SELECT department, PERCENTILE_CONT(salary, 0.5) AS median
FROM emp
GROUP BY department;Description:
The three ANSI ranking window functions assign an ordinal to each row within a
partition, ordered by the OVER (... ORDER BY ...) clause:
ROW_NUMBER()— sequential 1-based ordinals (1, 2, 3, ...); no ties recognized.RANK()— ties share a rank and the next rank skips (1, 2, 2, 4, ...).DENSE_RANK()— ties share a rank and the next rank does not skip (1, 2, 2, 3, ...).
ORDER BY is required inside OVER; PARTITION BY is optional (absent → the
whole result set is one partition). A LIMIT N inside OVER is pushed into the
Elasticsearch top_hits sub-aggregation, returning only the top-N rows per partition.
Syntax:
ROW_NUMBER() OVER ([PARTITION BY ...] ORDER BY ... [LIMIT N])
RANK() OVER ([PARTITION BY ...] ORDER BY ... [LIMIT N])
DENSE_RANK() OVER ([PARTITION BY ...] ORDER BY ... [LIMIT N])Example:
SELECT name, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dr
FROM emp;Full ranking-window details and top-N push-down examples are in DQL statements — ranking windows.
| Function | Purpose | Input | Output | NULL Handling |
|---|---|---|---|---|
COUNT(*) |
Count all rows | Any | BIGINT |
Includes NULLs |
COUNT(expr) |
Count non-NULL values | Any | BIGINT |
Excludes NULLs |
COUNT(DISTINCT expr) |
Count distinct values | Any | BIGINT |
Excludes NULLs |
SUM(expr) |
Sum values | Numeric | Numeric | Ignores NULLs |
AVG(expr) |
Average values | Numeric | DOUBLE |
Ignores NULLs |
MIN(expr) |
Minimum value | Comparable | Same as input | Ignores NULLs |
MAX(expr) |
Maximum value | Comparable | Same as input | Ignores NULLs |
FIRST_VALUE(expr) |
First value (ordered) | Any | Same as input | Depends on ORDER |
LAST_VALUE(expr) |
Last value (ordered) | Any | Same as input | Depends on ORDER |
ARRAY_AGG(expr) |
Collect into array | Any | ARRAY<type> |
Includes NULLs |
STDDEV(expr) |
Sample std deviation | Numeric | DOUBLE |
Ignores NULLs |
STDDEV_SAMP(expr) |
Sample std deviation | Numeric | DOUBLE |
Ignores NULLs |
STDDEV_POP(expr) |
Population std dev | Numeric | DOUBLE |
Ignores NULLs |
VARIANCE(expr) |
Sample variance | Numeric | DOUBLE |
Ignores NULLs |
VAR_SAMP(expr) |
Sample variance | Numeric | DOUBLE |
Ignores NULLs |
VAR_POP(expr) |
Population variance | Numeric | DOUBLE |
Ignores NULLs |
PERCENTILE_CONT(p) |
Continuous percentile | Numeric | DOUBLE |
Ignores NULLs |
PERCENTILE_DISC(p) |
Discrete percentile | Numeric | DOUBLE |
Ignores NULLs |
ROW_NUMBER() |
Sequential ordinal | — | BIGINT |
n/a (window) |
RANK() |
Rank, ties skip | — | BIGINT |
n/a (window) |
DENSE_RANK() |
Rank, ties dense | — | BIGINT |
n/a (window) |