SQL Window Functions: Practical Examples for Data Engineers
Master RANK, LAG, LEAD, NTILE, and running totals with real-world SQL window function examples. Includes performance tips and common pitfalls.
# SQL Window Functions: Practical Examples for Data Engineers
Window functions are one of the most powerful features in modern SQL — and one of the most underused. Unlike aggregate functions that collapse rows into a single result, window functions perform calculations across a set of rows while keeping each row intact. This guide covers the most useful window functions with real-world examples.
What Is a Window Function?
A window function operates on a "window" of rows defined by the OVER() clause. The key difference from GROUP BY is that window functions do not reduce the number of rows in the result set.
-- GROUP BY collapses rows
SELECT department, SUM(salary) AS total
FROM employees
GROUP BY department;
-- Window function keeps all rows
SELECT name, department, salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;1. RANK and DENSE_RANK
Use RANK() when you need to assign a position to each row within a partition. DENSE_RANK() avoids gaps in the ranking sequence.
-- Top 3 salespeople per region SELECT name, region, total_sales, RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS rank, DENSE_RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS dense_rank FROM sales_summary;
When to use RANK vs DENSE_RANK:
| Scenario | Use |
|---|
|---|---|
| Top-N with gaps allowed (1, 2, 2, 4) | RANK() |
|---|---|
| Top-N without gaps (1, 2, 2, 3) | DENSE_RANK() |
| Unique sequential position | ROW_NUMBER() |
2. LAG and LEAD — Comparing Adjacent Rows
LAG() accesses the previous row; LEAD() accesses the next row. Essential for period-over-period comparisons.
-- Month-over-month revenue change
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS revenue_change,
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0
/ NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0),
2
) AS pct_change
FROM monthly_revenue
ORDER BY month;Pro tip: Use NULLIF() in the denominator to avoid division-by-zero errors when the previous value is 0.
3. Running Totals with SUM OVER
Running totals are one of the most common window function use cases. The ROWS BETWEEN clause controls which rows are included in the window.
-- Cumulative sales by date
SELECT
sale_date,
daily_sales,
SUM(daily_sales) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
AVG(daily_sales) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_avg
FROM daily_sales;Frame options:
| Frame | Meaning |
|---|
|---|---|
| ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | From first row to current |
|---|---|
| ROWS BETWEEN 6 PRECEDING AND CURRENT ROW | Last 7 rows (rolling window) |
| ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | From current to last row |
4. NTILE — Bucketing Rows into Percentiles
NTILE(n) divides rows into n equal buckets. Useful for quartile analysis, decile segmentation, and A/B test grouping.
-- Segment customers into quartiles by lifetime value
SELECT
customer_id,
lifetime_value,
NTILE(4) OVER (ORDER BY lifetime_value DESC) AS quartile,
CASE NTILE(4) OVER (ORDER BY lifetime_value DESC)
WHEN 1 THEN 'Top 25%'
WHEN 2 THEN 'Upper Mid'
WHEN 3 THEN 'Lower Mid'
WHEN 4 THEN 'Bottom 25%'
END AS segment
FROM customers;5. FIRST_VALUE and LAST_VALUE
Return the first or last value in a window. Useful for finding the initial state or most recent value per group.
-- Show each employee's salary vs the highest salary in their department
SELECT
name,
department,
salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_max_salary,
ROUND(salary * 100.0 / FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
), 1) AS pct_of_max
FROM employees;Important: LAST_VALUE() requires an explicit frame clause (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) because the default frame only goes to the current row.
6. Deduplication with ROW_NUMBER
ROW_NUMBER() is the standard approach for removing duplicate rows while keeping the most recent record.
-- Keep only the latest record per customer
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY updated_at DESC
) AS rn
FROM customer_events
)
SELECT * FROM ranked WHERE rn = 1;7. Performance Tips
Window functions can be expensive on large datasets. Follow these guidelines:
Index the PARTITION BY and ORDER BY columns. The query engine needs to sort and group data for each window. An index on (department, salary DESC) dramatically speeds up PARTITION BY department ORDER BY salary DESC.
Avoid redundant window definitions. If you use the same window multiple times, define it once with a WINDOW clause:
-- Without WINDOW clause (redundant) SELECT name, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dense_rank, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS row_num FROM employees; -- With WINDOW clause (cleaner and potentially faster) SELECT name, RANK() OVER w AS rank, DENSE_RANK() OVER w AS dense_rank, ROW_NUMBER() OVER w AS row_num FROM employees WINDOW w AS (PARTITION BY dept ORDER BY salary DESC);
Materialise intermediate results. If you apply multiple window functions to the same CTE, the engine may re-scan the base table multiple times. Use a CTE or subquery to materialise the intermediate result first.
Common Pitfalls
Forgetting that WHERE filters run before window functions. If you filter rows with WHERE, the window function only sees the filtered rows. Use a subquery or CTE to apply window functions before filtering.
Confusing ROWS and RANGE frames. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW includes all rows with the same ORDER BY value as the current row (ties). ROWS BETWEEN is strictly positional. For running totals, ROWS is almost always what you want.
NULL handling in LAG/LEAD. LAG() returns NULL for the first row (no previous row exists). Always handle NULLs explicitly with COALESCE or NULLIF.
Conclusion
Window functions unlock a class of analytical queries that are either impossible or extremely verbose without them. Mastering RANK, LAG/LEAD, running totals, and NTILE covers the vast majority of real-world analytical SQL requirements.
*SQL Querywise Advisor analyses your window function queries and suggests index optimisations, frame clause corrections, and performance improvements automatically.*
Try SQL Querywise on your own queries
3 free analyses — DocGen, Advisor, Reviewer, Explainer, Converter, and Analysis. No credit card required.
Try the live demo