Performance 9 min readApr 2026

SQL Query Optimisation for BigQuery: 10 Techniques to Cut Costs and Boost Performance

BigQuery charges per byte scanned — not per query. Learn how partitioning, clustering, column pruning, and smarter JOIN patterns can reduce your bill by up to 90% while making queries run faster.

BigQuery SQL Performance GCP Cost Optimisation Partitioning Clustering

Why BigQuery Optimisation Is Different

BigQuery is a serverless, columnar data warehouse built for petabyte-scale analytics. Unlike traditional RDBMS engines, it has no indexes, no query cache that persists across sessions, and — critically — it charges you per byte scanned, not per query or per second. A single unoptimised query against a 10 TB table can cost more than an entire day of well-tuned workloads.

Optimising for BigQuery means thinking about three things simultaneously: I/O reduction (scan less data), computation efficiency (do less work per row), and shuffle minimisation (pass fewer bytes between execution stages). The 10 techniques below address all three dimensions.

1. Never Use SELECT *

This is the single most impactful change you can make. BigQuery is a columnar store — it only reads the columns you explicitly request. SELECT * forces a full scan of every column in the table, even if your query only needs two of them.

sql

-- ❌ Scans every column — costs 10x more than necessary
SELECT * FROM `project.dataset.events`
WHERE event_date = '2026-04-01';

-- ✅ Reads only 3 columns — dramatically reduces bytes scanned
SELECT user_id, event_name, created_at
FROM `project.dataset.events`
WHERE event_date = '2026-04-01';

If you need most columns but want to exclude a few, use SELECT * EXCEPT (col1, col2) — it is far cheaper than a full wildcard scan.

2. Partition Your Tables and Always Filter on the Partition Column

Partitioning divides a table into segments based on a date or integer column. BigQuery only reads the partitions that match your WHERE clause — all other partitions are skipped entirely, at zero cost.

sql

-- Table partitioned by event_date (DATE column)

-- ❌ No partition filter — scans the entire table
SELECT user_id, event_name
FROM `project.dataset.events`
WHERE event_name = 'purchase';

-- ✅ Partition pruning — only reads April 2026 data
SELECT user_id, event_name
FROM `project.dataset.events`
WHERE event_date BETWEEN '2026-04-01' AND '2026-04-30'
  AND event_name = 'purchase';

For ingestion-time partitioned tables, use the _PARTITIONTIME pseudocolumn. Always verify that partition pruning is active by checking the bytes processed estimate in the BigQuery console before running the query.

3. Add Clustering to Complement Partitioning

Clustering sorts the data within each partition by one or more columns. When your WHERE clause filters on a clustered column, BigQuery skips entire blocks of data within the partition — a technique called block pruning.

sql

-- Table partitioned by event_date, clustered by (country, event_name)

-- ✅ Both partition pruning and block pruning apply
SELECT user_id, revenue
FROM `project.dataset.events`
WHERE event_date = '2026-04-15'
  AND country = 'PT'
  AND event_name = 'purchase';

Choose clustering columns in order of selectivity: the most frequently filtered column first. Up to four clustering columns are supported. Clustering is free to apply and maintenance is automatic.

4. Materialise Repeated Subqueries and CTEs

BigQuery's query optimiser does not guarantee that a CTE referenced multiple times is computed only once. Each reference may trigger a separate full evaluation, multiplying I/O and slot consumption.

sql

-- ❌ CTE may be evaluated twice — double the cost
WITH active_users AS (
  SELECT user_id FROM `project.dataset.users`
  WHERE status = 'active'
)
SELECT a.user_id, COUNT(e.event_id) AS events
FROM active_users a
JOIN `project.dataset.events` e ON a.user_id = e.user_id
GROUP BY a.user_id
HAVING user_id IN (SELECT user_id FROM active_users WHERE signup_date > '2025-01-01');

For CTEs used in more than one place, materialise the result into a temporary table using CREATE TEMP TABLE or a destination table with an expiry. The storage cost is negligible compared to the repeated scan cost.

5. Place the Largest Table First in JOINs

BigQuery uses a broadcast join strategy when one side of a join is significantly smaller than the other: it sends the entire small table to every worker processing the large table. For this to work efficiently, the large table must be on the left side of the JOIN.

sql

-- ✅ Large table (events, billions of rows) on the left;
--    small table (countries, ~200 rows) on the right
SELECT e.user_id, e.revenue, c.region
FROM `project.dataset.events` e  -- large
JOIN `project.dataset.countries` c  -- small
  ON e.country_code = c.code
WHERE e.event_date = '2026-04-15';

For two large tables, BigQuery uses a hash join (shuffle join). Reduce the size of both sides with filters and column selection before the join to minimise the shuffle volume.

6. Use INT64 Instead of STRING for Join Keys

BigQuery does not use indexes. Every join involves comparing values across rows. STRING comparisons are significantly more expensive than INT64 comparisons because strings are variable-width and require byte-by-byte evaluation.

sql

-- ❌ String join key — slower and more expensive
SELECT o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_uuid = c.uuid;  -- UUID strings

-- ✅ Integer join key — faster comparison, less shuffle data
SELECT o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;  -- INT64

If your schema uses UUIDs as primary keys, consider adding a surrogate integer key for join-heavy analytical queries.

7. Filter Before Joining, Not After

Applying WHERE filters inside a subquery or CTE before the join reduces the number of rows that participate in the join, cutting both shuffle volume and computation.

sql

-- ❌ Joins all events, then filters — shuffles billions of rows
SELECT e.user_id, u.country, COUNT(*) AS purchases
FROM `project.dataset.events` e
JOIN `project.dataset.users` u ON e.user_id = u.id
WHERE e.event_name = 'purchase'
  AND e.event_date >= '2026-01-01';

-- ✅ Filters events first, then joins the smaller result set
SELECT e.user_id, u.country, COUNT(*) AS purchases
FROM (
  SELECT user_id
  FROM `project.dataset.events`
  WHERE event_name = 'purchase'
    AND event_date >= '2026-01-01'
) e
JOIN `project.dataset.users` u ON e.user_id = u.id
GROUP BY e.user_id, u.country;

8. Avoid ORDER BY Without LIMIT

Sorting in BigQuery requires concentrating all data on a single worker node to produce a globally ordered result. On large tables, this causes a resources exceeded error or extremely long runtimes.

sql

-- ❌ Sorts millions of rows on a single node — often fails
SELECT user_id, revenue
FROM `project.dataset.events`
ORDER BY revenue DESC;

-- ✅ Only the top 1000 rows are sorted
SELECT user_id, revenue
FROM `project.dataset.events`
ORDER BY revenue DESC
LIMIT 1000;

For ranking use cases, prefer window functions with a pre-filtered dataset rather than a global ORDER BY.

9. Use Approximate Aggregation Functions

BigQuery provides approximate versions of expensive aggregation functions that are orders of magnitude faster and use far less memory, with a typical error rate of less than 1%.

Exact Function	Approximate Equivalent	Typical Speedup

|---|---|---|

`COUNT(DISTINCT col)`	`APPROX_COUNT_DISTINCT(col)`	5–20×
PERCENTILE_CONT(0.5)	APPROX_QUANTILES(col, 100)[OFFSET(50)]	3–10×
TOP(col, n)	APPROX_TOP_COUNT(col, n)	3–8×

sql

-- ❌ Exact distinct count — expensive on high-cardinality columns
SELECT COUNT(DISTINCT user_id) AS unique_users
FROM `project.dataset.events`;

-- ✅ Approximate — 99%+ accurate, dramatically faster
SELECT APPROX_COUNT_DISTINCT(user_id) AS unique_users
FROM `project.dataset.events`;

For dashboards and exploratory analysis, approximate functions are almost always the right choice.

10. Use the Query Plan Explanation Before Running Expensive Queries

BigQuery provides a query plan explanation (also called the execution graph) that shows the stages of your query, bytes read per stage, slot time consumed, and shuffle volume. You can access it in the BigQuery console after running a query, or estimate it using the dry-run API before execution.

Key signals to look for in the query plan:

- Stage with high bytes written relative to bytes read — indicates an expensive shuffle; consider filtering earlier.

- JOIN stage with more output rows than input rows — likely a cross join or missing filter; review your join conditions.

- Single stage consuming 90%+ of slot time — a bottleneck; consider splitting the query or materialising intermediate results.

The dry-run API returns the estimated bytes processed without actually running the query, which is invaluable for validating optimisations before they hit production.

Putting It All Together

The most impactful BigQuery optimisations follow a clear hierarchy:

1. **Column pruning** (avoid `SELECT *`) — always apply first.

2. **Partition pruning** — filter on the partition column in every query against partitioned tables.

3. **Clustering** — add for high-selectivity filter columns within partitions.

4. **Join optimisation** — large table first, filter before joining, use integer keys.

5. **Computation** — approximate aggregations, avoid repeated CTEs, split complex queries.

Applied consistently, these techniques routinely reduce BigQuery costs by 60–90% and cut query runtimes by a similar margin. The key insight is that BigQuery rewards queries that read less data — every optimisation that reduces bytes scanned or shuffled translates directly into lower costs and faster results.

SQL Querywise Advisor analyses your BigQuery SQL and automatically flags missing partition filters, SELECT * usage, suboptimal join patterns, and expensive aggregations — giving you actionable recommendations without manual query plan inspection.

SQL Querywise Advisor detects BigQuery anti-patterns and recommends cost and performance optimisations automatically.

Try SQL Querywise on your own queries

3 free analyses — DocGen, Advisor, Reviewer, Explainer, Converter, and Analysis. No credit card required.

Try the live demo