This cheat sheet covers practical SQL techniques I've been exploring, from ranking functions to date manipulation.
More sections coming soon!
Window functions let you perform calculations across a set of rows related to the current row, without collapsing them into a single output like GROUP BY does.
Think of it as running an aggregation while still keeping every individual row visible.
Scenario:
We have a table of transactions across different countries.
Our goal is to rank which countries have the highest average transaction amount.
SELECT
location,
ROUND(AVG(transaction_amount), 0) AS avg_transaction_amt,
ROW_NUMBER() OVER (ORDER BY ROUND(AVG(transaction_amount), 0) DESC) AS ranks
FROM retail_fraud_detection_100k
GROUP BY location;
ROW_NUMBER() assigns a unique sequential number to every row, ordered by the column you specify. Even when two rows have the same value (USA and India both at 121), they still get different numbers.
SELECT
location,
ROUND(AVG(transaction_amount), 0) AS avg_transaction_amt,
DENSE_RANK() OVER (ORDER BY ROUND(AVG(transaction_amount), 0) DESC) AS ranks
FROM retail_fraud_detection_100k
GROUP BY location;
DENSE_RANK() gives tied rows the same rank, and the next rank continues right after with no gaps. USA, India, and Germany all get rank 1, then UK gets rank 2 (not 4).
SELECT
location,
ROUND(AVG(transaction_amount), 0) AS avg_transaction_amt,
RANK() OVER (ORDER BY ROUND(AVG(transaction_amount), 0) DESC) AS ranks
FROM retail_fraud_detection_100k
GROUP BY location;
RANK() also gives tied rows the same rank, but skips numbers afterward to account for the tie. USA, Germany, and India all get rank 1, so the next rank jumps to 4, not 2.
SELECT
location,
ROUND(AVG(transaction_amount), 2) AS avg_transaction_amt,
PERCENT_RANK() OVER (ORDER BY ROUND(AVG(transaction_amount), 2) DESC) AS ranks
FROM retail_fraud_detection_100k
GROUP BY location;
PERCENT_RANK() shows each row's relative position as a value between 0 and 1, calculated as
(RANK - 1) / (total rows - 1).
The top row always gets 0.0 and the bottom always gets 1.0.
What if you only want to see the Top 3 countries?
SELECT *
FROM (
SELECT
location,
ROUND(AVG(transaction_amount), 2) AS avg_transaction_amt,
ROW_NUMBER() OVER (ORDER BY ROUND(AVG(transaction_amount), 2) DESC) AS ranks
FROM retail_fraud_detection_100k
GROUP BY location
) AS tempo
WHERE ranks < 4;
You can't filter on a window function directly in a WHERE clause, so we wrap the ranked query as a subquery first. The outer WHERE ranks < 4 then filters it down to the top 3.
SELECT
customer_id,
transaction_timestamp AS purchase_date,
transaction_amount,
LAG(transaction_amount) OVER (
PARTITION BY customer_id
ORDER BY transaction_timestamp DESC
) AS previous_spent
FROM retail_fraud_detection_100k;
LAG() pulls the value from the previous row within a partition. Here it shows each customer's prior transaction amount alongside their current one. The first transaction per customer returns NULL since there's nothing before it.
Date functions allow you to extract, name, truncate, and compare date/time values stored in your database. They're essential for time-series analysis, reporting by period, and calculating durations.
SELECT
transaction_timestamp,
DATEPART(MILLISECOND, transaction_timestamp) AS millisecond,
DATEPART(MINUTE, transaction_timestamp) AS minute,
DATEPART(HOUR, transaction_timestamp) AS hour,
DATEPART(DAY, transaction_timestamp) AS day,
DATEPART(DAYOFYEAR, transaction_timestamp) AS day_of_year,
DATEPART(MONTH, transaction_timestamp) AS month
FROM retail_fraud_detection_100k;
DATEPART() extracts a specific numeric component from a datetime, hour, day, month, etc. For example, DATEPART(MONTH, '2026-03-04') returns 3. Useful for grouping or filtering by a particular time unit.
SELECT
transaction_timestamp,
DATENAME(WEEKDAY, transaction_timestamp) AS day,
DATENAME(MONTH, transaction_timestamp) AS month
FROM retail_fraud_detection_100k;
DATENAME() works like DATEPART(), but returns a string instead of a number. So rather than getting 3 for March, you get "March", making outputs much more readable in reports.
SELECT
transaction_timestamp,
DATETRUNC(MINUTE, transaction_timestamp) AS time_by_minute,
DATETRUNC(HOUR, transaction_timestamp) AS time_by_hour,
DATETRUNC(DAY, transaction_timestamp) AS time_by_day
FROM retail_fraud_detection_100k;
DATETRUNC() snaps a datetime down to the start of the specified unit, minute, hour, day, etc. It returns a full datetime with the smaller units zeroed out, making it great for grouping transactions into time buckets.
New dataset: For this function, we use a patient appointment dataset that has two date columns:
booking_dateandappointment_date.
SELECT
booking_date,
appointment_date,
DATEDIFF(DAY, booking_date, appointment_date) AS difference_in_days,
DATEDIFF(MONTH, booking_date, appointment_date) AS difference_in_months
FROM patient_no_show_dataset;
DATEDIFF() returns the difference between two dates as an integer, in the unit you choose. Here it shows how many days (and months) each patient waited between booking and their appointment.
This cheat sheet is a work in progress! Next up:
- CTEs (Common Table Expressions)
- JOINS (INNER, LEFT, RIGHT, FULL OUTER)
- Subqueries & Correlated Subqueries
- Aggregate Functions (SUM, COUNT, AVG with GROUP BY)
- CASE WHEN statements
- String Functions