If you use Google Cloud, you’re likely using BigQuery. It’s a data warehouse I’ve been using for years because of its excellent ROI, you spend almost no time babysitting it, it scales effortlessly to petabytes, and it supports a huge range of workloads.
But there are no free lunches. Google needs a way to measure and charge for compute usage. BigQuery offers two models: on-demand, which bills you simply for the bytes processed, and the capacity model, which is a better fit for consistent monthly spend or if you need more dedicated compute.
The primary problem with the capacity model is underutilised slot hours. While the autoscaler is great for typical data warehouse workloads, like long-running, interactive queries, it’s highly inefficient for most data pipelines and BI workloads.
This core limitation is likely by design. If you need slots on the fly, Google charges a 1-minute minimum for 50-slot increments to avoid wasting compute on its end, effectively offloading that inefficiency to the user. The key to reducing costs is a counter-intuitive strategy: intentionally slowing things down by taming the autoscaler. We’ll trade seconds or minutes of latency, often unnoticeable in batch pipelines, for significant dollar savings.
To illustrate this, let’s look at a fictional “6-second query.” The autoscaler provisions a large number of slots (e.g., 1,000) to finish it as fast as possible. But here’s the trap: you’re billed for that peak number of slots for a minimum of 60 seconds, even if the job is already done.
Let’s look at the math of waste:
In this common scenario, 90% of your spend is pure waste. You’re paying for idle capacity while the slots sit there, waiting for more work.
The fix is a counter-intuitive strategy: limiting the autoscaler’s peak power by setting a lower max_slots
value in your reservation's configuration.
This introduces a concept called “slot contention”, which is typically something you want to avoid. However, in this case, it’s a deliberate strategy. When a query requests more slots than your lower cap allows, in most cases it doesn’t fail. BigQuery’s scheduler simply queues the work, forcing the job to run with fewer slots over a slightly longer duration.
let’s revisit our example: Let’s cap max_slots
at 50. A job that requires 6,000 slot-seconds will now take roughly 2 minutes to finish.
You've cut the job cost by 90% by accepting a longer duration, ensuring you only pay for the actual used compute. This is a fundamental tradeoff, a simple, fictional example shows a 20x runtime increase which may not be ok in a real production workload. Ultimately, the decision to accept or not more waste for a shorter duration depends on the user's pipeline SLAs.
Note: Be careful not to under-provision your reservation. Setting the max_slots
too low can cause queries to fail with resource errors, or exhaust the number of jobs that can be queued. Also it’s possible to have it increase shuffle if you get significant slot contention, which ends up increasing the original work slot-seconds. I recommend testing the optimal setting for each pipeline.
Don’t use a one-size-fits-all approach. Create separate reservations for different needs.
elt-reservation
: For batch pipelines (like dbt or Airflow), assign your projects here and set a low max_slots
cap. This value should be based on your pipeline's SLA, ensuring the jobs complete in an acceptable time without impacting downstream users.bi-interactive-reservation
: Use this for BI tools like Looker and Tableau. Assign analytics projects here to ensure good performance, and set a high max_slots
cap, as slot contention is generally unacceptable for this workload. It's one of the most challenging workload types to keep waste down.Note: For On-Demand, be sure to add quotas to prevent costs from blowing up. As of September 1, 2025, the default on-demand quota for BigQuery’s QueryUsagePerDay
is 200 TiB of data processed per project per day. Previously, this was unlimited for new projects.
Don’t just guess the right max_slots
value. Use INFORMATION_SCHEMA.JOBS_TIMELINE
and INFORMATION_SCHEMA.RESERVATION_TIMELINE
data to model a cost-versus-performance curve.
JOBS_TIMELINE
: The Record of Work Done. This view provides near real-time metadata for all jobs, broken down into one-second timeslices. The period_slot_ms
column is the source of truth for actual slot utilisation.
RESERVATIONS_TIMELINE
: The Record of Capacity Provided. This view provides a historical record of reservation metadata, including slots_assigned
(baseline capacity) and autoscale.current_slots
(the number of slots added by the autoscaler).
Combine the work done with the capacity provided to track your Waste KPI. A high percentage (e.g., > 50%) suggests a significant mismatch between workload patterns and autoscaler behaviour.
By running the sample query below:
-- ===================================================================================
-- Configuration Block: Adjust these parameters for your analysis
-- ===================================================================================
DECLARE reservation_admin_project_id STRING DEFAULT 'your-reservation-admin-project';
DECLARE region STRING DEFAULT 'us';
DECLARE start_timestamp TIMESTAMP DEFAULT TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY);
DECLARE end_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP();
-- Note: Adjust the cost based on your BigQuery Edition and commitment level.
-- Example: Enterprise Edition Pay-as-you-go is ~$0.06 per slot-hour.
DECLARE cost_per_slot_hour FLOAT64 DEFAULT 0.06;
-- ===================================================================================
-- Query Logic: Do not modify below this line
-- ===================================================================================
WITH
-- Step 1: Calculate the total number of slots UTILIZED by all jobs within each
-- reservation, aggregated to a one-second granularity. This represents the
-- actual work done.
utilized_slots_per_second AS (
SELECT
period_start,
reservation_id,
-- Sum slot-milliseconds from all jobs in the same second and convert to slots.
SUM(period_slot_ms) / 1000 AS utilized_slots
FROM
-- Note: Use JOBS_TIMELINE_BY_PROJECT or JOBS_TIMELINE_BY_ORGANIZATION as needed.
`region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE
WHERE
job_creation_time BETWEEN start_timestamp AND end_timestamp
AND period_start BETWEEN start_timestamp AND end_timestamp
AND reservation_id IS NOT NULL
GROUP BY
1,
2
),
-- Step 2: Calculate the total number of slots BILLED for each reservation at a
-- one-second granularity. This is achieved by unnesting the per_second_details
-- struct from the RESERVATIONS_TIMELINE view.
billed_slots_per_second AS (
SELECT
details.start_time AS period_start,
res.reservation_id,
-- Baseline slots are a fixed cost, not part of autoscaler waste, but are included
-- for a complete picture of total provisioned capacity.
details.slots_assigned AS billed_baseline_slots,
-- These are the slots added by the autoscaler. This is the capacity we are
-- analyzing for waste.
details.autoscale_current_slots AS billed_autoscaled_slots
FROM
`your-reservation-admin-project.region-us`.INFORMATION_SCHEMA.RESERVATIONS_TIMELINE AS res,
UNNEST(per_second_details) AS details
WHERE
res.period_start BETWEEN TIMESTAMP_TRUNC(start_timestamp, MINUTE) AND end_timestamp
),
-- Step 3: Join the utilized and billed data. A FULL OUTER JOIN is crucial to
-- capture seconds where slots were billed but had zero utilization (pure waste).
waste_calculation_per_second AS (
SELECT
COALESCE(u.period_start, b.period_start) AS period_start,
COALESCE(u.reservation_id, b.reservation_id) AS reservation_id,
IFNULL(u.utilized_slots, 0) AS utilized_slots,
IFNULL(b.billed_baseline_slots, 0) AS billed_baseline_slots,
IFNULL(b.billed_autoscaled_slots, 0) AS billed_autoscaled_slots,
-- Calculate total billed capacity for context.
IFNULL(b.billed_baseline_slots, 0) + IFNULL(b.billed_autoscaled_slots, 0) AS total_billed_slots
FROM
utilized_slots_per_second AS u
FULL OUTER JOIN billed_slots_per_second AS b
ON u.period_start = b.period_start AND u.reservation_id = b.reservation_id
)
-- Step 4: Final aggregation to summarize waste over the entire analysis period
-- for each reservation.
SELECT
reservation_id,
-- Total utilized slot-hours: The sum of all work done by jobs.
ROUND(SUM(utilized_slots) / 3600, 2) AS total_utilized_slot_hours,
-- Total billed autoscaled slot-hours: The capacity added and billed by the autoscaler.
ROUND(SUM(billed_autoscaled_slots) / 3600, 2) AS total_billed_autoscaled_slot_hours,
-- Total wasted autoscaled slot-hours: The core metric. This is the billed autoscaled
-- capacity that was not used by any job.
ROUND(
SUM(
-- We only count waste against the autoscaled portion. If utilization exceeds
-- the autoscaled slots, it means baseline slots are being used, which is not waste.
GREATEST(0, billed_autoscaled_slots - GREATEST(0, utilized_slots - billed_baseline_slots))
) / 3600,
2
) AS total_wasted_autoscaled_slot_hours,
-- Waste Percentage: The proportion of billed autoscaled capacity that was wasted.
-- This is a key performance indicator (KPI) for autoscaler efficiency.
SAFE_DIVIDE(
SUM(
GREATEST(0, billed_autoscaled_slots - GREATEST(0, utilized_slots - billed_baseline_slots))
),
SUM(billed_autoscaled_slots)
) AS waste_percentage,
-- Potential Cost Savings: The financial impact of the calculated waste.
ROUND(
SUM(
GREATEST(0, billed_autoscaled_slots - GREATEST(0, utilized_slots - billed_baseline_slots))
) / 3600 * cost_per_slot_hour,
2
) AS potential_cost_savings_usd
FROM
waste_calculation_per_second
WHERE
reservation_id IS NOT NULL
GROUP BY
1
ORDER BY
potential_cost_savings_usd DESC;
Frame it as a business decision by looking at the potential_cost_savings_usd
metric, if it justifies the work, look at the current runtime of your pipeline and how far it is from the execution SLA, and aim to get the waste_percentage
as low as possible.
Apply your modeled max_slots
cap to the elt-reservation
. Measure the impact on your next pipeline run's cost and duration and the waste_percentage
KPI. Iterate by adjusting the cap until you hit the optimal balance defined by your business needs.
A data environment is never static. Your jobs have variable demand throughout the day, week, and month. As data volume grows, new models are created, and queries change. Small tweaks, like introducing new joins or an “innocent” UDF in the query projection, can significantly increase slot usage. This is why continuously monitoring and adjusting autoscaler capacity is a never-ending task. By watching your slot usage in real-time, observing the job queue, and adjusting your capacity cap based on the time of day or day of the week, you can unlock even more savings while still meeting your pipeline’s SLAs.
The takeaway is simple: the goal isn’t always maximum performance; it’s appropriate performance at the lowest possible cost. True cost mastery comes from actively managing your cloud resources, not accepting the default settings.
By understanding the economic model of the services you use, you can shape your workload’s demand profile to align with it, running it as efficiently as possible.
A static environment is easy to tune, but a real data environment is a beast of it’s own. Keeping a reservation optimally tuned while meeting your business SLAs is a problem that requires more than a one-time adjustment. That’s where automation can help, continuously tuning and optimising your reservations to keep your costs in check as your workloads evolve.
That’s where Alvin comes in. When manual isn’t enough, Alvin automates your reservation’s max_slots
to minimise waste and slot contention.It applies the same principles you would, but with an automation layer that analyses historical and real-time slot usage. Since BI workloads are especially challenging to optimise, an automated solution is the only option.