Automated BigQuery Optimization

If you use Google Cloud, you’re likely using BigQuery. It’s a data warehouse I’ve been using for years because of its excellent ROI, you spend almost no time babysitting it, it scales effortlessly to petabytes, and it supports a huge range of workloads.

But there are no free lunches. Google needs a way to measure and charge for compute usage. BigQuery offers two models: on-demand, which bills you simply for the bytes processed, and the capacity model, which is a better fit for consistent monthly spend or if you need more dedicated compute.

The primary problem with the capacity model is underutilised slot hours. While the autoscaler is great for typical data warehouse workloads, like long-running, interactive queries, it’s highly inefficient for most data pipelines and BI workloads.

This core limitation is likely by design. If you need slots on the fly, Google charges a 1-minute minimum for 50-slot increments to avoid wasting compute on its end, effectively offloading that inefficiency to the user. The key to reducing costs is a counter-intuitive strategy: intentionally slowing things down by taming the autoscaler. We’ll trade seconds or minutes of latency, often unnoticeable in batch pipelines, for significant dollar savings.

The Hidden Cost: Understanding the “One-Minute Billing Trap”

To illustrate this, let’s look at a fictional “6-second query.” The autoscaler provisions a large number of slots (e.g., 1,000) to finish it as fast as possible. But here’s the trap: you’re billed for that peak number of slots for a minimum of 60 seconds, even if the job is already done.

Let’s look at the math of waste:

Actual work: 1,000 slots × 6 seconds = 6,000 slot-seconds.
Billed work: 1,000 slots × 60 seconds = 60,000 slot-seconds.

In this common scenario, 90% of your spend is pure waste. You’re paying for idle capacity while the slots sit there, waiting for more work.

The Solution: Taming “Slot Contention”

The fix is a counter-intuitive strategy: limiting the autoscaler’s peak power by setting a lower max_slots value in your reservation's configuration.

This introduces a concept called “slot contention”, which is typically something you want to avoid. However, in this case, it’s a deliberate strategy. When a query requests more slots than your lower cap allows, in most cases it doesn’t fail. BigQuery’s scheduler simply queues the work, forcing the job to run with fewer slots over a slightly longer duration.

let’s revisit our example: Let’s cap max_slots at 50. A job that requires 6,000 slot-seconds will now take roughly 2 minutes to finish.

New billed work: The job takes 120 seconds, so you are billed for 50 slots for 120 seconds, resulting in the same 6,000 slot-seconds, but now the work done matches the capacity provided.

You've cut the job cost by 90% by accepting a longer duration, ensuring you only pay for the actual used compute. This is a fundamental tradeoff, a simple, fictional example shows a 20x runtime increase which may not be ok in a real production workload. Ultimately, the decision to accept or not more waste for a shorter duration depends on the user's pipeline SLAs.

Note: Be careful not to under-provision your reservation. Setting the max_slots too low can cause queries to fail with resource errors, or exhaust the number of jobs that can be queued. Also it’s possible to have it increase shuffle if you get significant slot contention, which ends up increasing the original work slot-seconds. I recommend testing the optimal setting for each pipeline.

Your 4-Step Implementation Plan

Step 1: Isolate Your Workloads with Reservations

Don’t use a one-size-fits-all approach. Create separate reservations for different needs.

elt-reservation: For batch pipelines (like dbt or Airflow), assign your projects here and set a low max_slots cap. This value should be based on your pipeline's SLA, ensuring the jobs complete in an acceptable time without impacting downstream users.
bi-interactive-reservation: Use this for BI tools like Looker and Tableau. Assign analytics projects here to ensure good performance, and set a high max_slots cap, as slot contention is generally unacceptable for this workload. It's one of the most challenging workload types to keep waste down.
On-Demand: For unpredictable, ad-hoc jobs, don’t assign them to a reservation. They’ll use pay-per-byte pricing, which is often cheaper for infrequent use.

Note: For On-Demand, be sure to add quotas to prevent costs from blowing up. As of September 1, 2025, the default on-demand quota for BigQuery’s QueryUsagePerDay is 200 TiB of data processed per project per day. Previously, this was unlimited for new projects.

Step 2: Find Your “Sweet Spot” with Modeling

Don’t just guess the right max_slots value. Use INFORMATION_SCHEMA.JOBS_TIMELINE and INFORMATION_SCHEMA.RESERVATION_TIMELINE data to model a cost-versus-performance curve.

JOBS_TIMELINE: The Record of Work Done. This view provides near real-time metadata for all jobs, broken down into one-second timeslices. The period_slot_ms column is the source of truth for actual slot utilisation.

RESERVATIONS_TIMELINE: The Record of Capacity Provided. This view provides a historical record of reservation metadata, including slots_assigned (baseline capacity) and autoscale.current_slots (the number of slots added by the autoscaler).

Combine the work done with the capacity provided to track your Waste KPI. A high percentage (e.g., > 50%) suggests a significant mismatch between workload patterns and autoscaler behaviour.

By running the sample query below:

-- ===================================================================================
-- Configuration Block: Adjust these parameters for your analysis
-- ===================================================================================
DECLARE reservation_admin_project_id STRING DEFAULT 'your-reservation-admin-project';
DECLARE region STRING DEFAULT 'us';
DECLARE start_timestamp TIMESTAMP DEFAULT TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY);
DECLARE end_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP();
-- Note: Adjust the cost based on your BigQuery Edition and commitment level.
-- Example: Enterprise Edition Pay-as-you-go is ~$0.06 per slot-hour.
DECLARE cost_per_slot_hour FLOAT64 DEFAULT 0.06;

-- ===================================================================================
-- Query Logic: Do not modify below this line
-- ===================================================================================
WITH
  -- Step 1: Calculate the total number of slots UTILIZED by all jobs within each
  -- reservation, aggregated to a one-second granularity. This represents the
  -- actual work done.
  utilized_slots_per_second AS (
    SELECT
      period_start,
      reservation_id,
      -- Sum slot-milliseconds from all jobs in the same second and convert to slots.
      SUM(period_slot_ms) / 1000 AS utilized_slots
    FROM
      -- Note: Use JOBS_TIMELINE_BY_PROJECT or JOBS_TIMELINE_BY_ORGANIZATION as needed.
      `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE
    WHERE
      job_creation_time BETWEEN start_timestamp AND end_timestamp
      AND period_start BETWEEN start_timestamp AND end_timestamp
      AND reservation_id IS NOT NULL
    GROUP BY
      1,
      2
  ),
  -- Step 2: Calculate the total number of slots BILLED for each reservation at a
  -- one-second granularity. This is achieved by unnesting the per_second_details
  -- struct from the RESERVATIONS_TIMELINE view.
  billed_slots_per_second AS (
    SELECT
      details.start_time AS period_start,
      res.reservation_id,
      -- Baseline slots are a fixed cost, not part of autoscaler waste, but are included
      -- for a complete picture of total provisioned capacity.
      details.slots_assigned AS billed_baseline_slots,
      -- These are the slots added by the autoscaler. This is the capacity we are
      -- analyzing for waste.
      details.autoscale_current_slots AS billed_autoscaled_slots
    FROM
      `your-reservation-admin-project.region-us`.INFORMATION_SCHEMA.RESERVATIONS_TIMELINE AS res,
      UNNEST(per_second_details) AS details
    WHERE
      res.period_start BETWEEN TIMESTAMP_TRUNC(start_timestamp, MINUTE) AND end_timestamp
  ),
  -- Step 3: Join the utilized and billed data. A FULL OUTER JOIN is crucial to
  -- capture seconds where slots were billed but had zero utilization (pure waste).
  waste_calculation_per_second AS (
    SELECT
      COALESCE(u.period_start, b.period_start) AS period_start,
      COALESCE(u.reservation_id, b.reservation_id) AS reservation_id,
      IFNULL(u.utilized_slots, 0) AS utilized_slots,
      IFNULL(b.billed_baseline_slots, 0) AS billed_baseline_slots,
      IFNULL(b.billed_autoscaled_slots, 0) AS billed_autoscaled_slots,
      -- Calculate total billed capacity for context.
      IFNULL(b.billed_baseline_slots, 0) + IFNULL(b.billed_autoscaled_slots, 0) AS total_billed_slots
    FROM
      utilized_slots_per_second AS u
      FULL OUTER JOIN billed_slots_per_second AS b
      ON u.period_start = b.period_start AND u.reservation_id = b.reservation_id
  )
-- Step 4: Final aggregation to summarize waste over the entire analysis period
-- for each reservation.
SELECT
  reservation_id,
  -- Total utilized slot-hours: The sum of all work done by jobs.
  ROUND(SUM(utilized_slots) / 3600, 2) AS total_utilized_slot_hours,
  -- Total billed autoscaled slot-hours: The capacity added and billed by the autoscaler.
  ROUND(SUM(billed_autoscaled_slots) / 3600, 2) AS total_billed_autoscaled_slot_hours,
  -- Total wasted autoscaled slot-hours: The core metric. This is the billed autoscaled
  -- capacity that was not used by any job.
  ROUND(
    SUM(
      -- We only count waste against the autoscaled portion. If utilization exceeds
      -- the autoscaled slots, it means baseline slots are being used, which is not waste.
      GREATEST(0, billed_autoscaled_slots - GREATEST(0, utilized_slots - billed_baseline_slots))
    ) / 3600,
    2
  ) AS total_wasted_autoscaled_slot_hours,
  -- Waste Percentage: The proportion of billed autoscaled capacity that was wasted.
  -- This is a key performance indicator (KPI) for autoscaler efficiency.
  SAFE_DIVIDE(
    SUM(
      GREATEST(0, billed_autoscaled_slots - GREATEST(0, utilized_slots - billed_baseline_slots))
    ),
    SUM(billed_autoscaled_slots)
  ) AS waste_percentage,
  -- Potential Cost Savings: The financial impact of the calculated waste.
  ROUND(
    SUM(
      GREATEST(0, billed_autoscaled_slots - GREATEST(0, utilized_slots - billed_baseline_slots))
    ) / 3600 * cost_per_slot_hour,
    2
  ) AS potential_cost_savings_usd
FROM
  waste_calculation_per_second
WHERE
  reservation_id IS NOT NULL
GROUP BY
  1
ORDER BY
  potential_cost_savings_usd DESC;

‍

Frame it as a business decision by looking at the potential_cost_savings_usd metric, if it justifies the work, look at the current runtime of your pipeline and how far it is from the execution SLA, and aim to get the waste_percentage as low as possible.

Step 3: Apply, Measure, and Tune

Apply your modeled max_slots cap to the elt-reservation. Measure the impact on your next pipeline run's cost and duration and the waste_percentageKPI. Iterate by adjusting the cap until you hit the optimal balance defined by your business needs.

Step 4: Going Beyond

A data environment is never static. Your jobs have variable demand throughout the day, week, and month. As data volume grows, new models are created, and queries change. Small tweaks, like introducing new joins or an “innocent” UDF in the query projection, can significantly increase slot usage. This is why continuously monitoring and adjusting autoscaler capacity is a never-ending task. By watching your slot usage in real-time, observing the job queue, and adjusting your capacity cap based on the time of day or day of the week, you can unlock even more savings while still meeting your pipeline’s SLAs.

Conclusion: A New Mindset for Cloud FinOps

The takeaway is simple: the goal isn’t always maximum performance; it’s appropriate performance at the lowest possible cost. True cost mastery comes from actively managing your cloud resources, not accepting the default settings.

By understanding the economic model of the services you use, you can shape your workload’s demand profile to align with it, running it as efficiently as possible.

A static environment is easy to tune, but a real data environment is a beast of it’s own. Keeping a reservation optimally tuned while meeting your business SLAs is a problem that requires more than a one-time adjustment. That’s where automation can help, continuously tuning and optimising your reservations to keep your costs in check as your workloads evolve.

That’s where Alvin comes in. When manual isn’t enough, Alvin automates your reservation’s max_slots to minimise waste and slot contention.It applies the same principles you would, but with an automation layer that analyses historical and real-time slot usage. Since BI workloads are especially challenging to optimise, an automated solution is the only option.

‍

Cut your BigQuery spend with no code changes

Automated optimization with zero ongoing effort.

Run Savings Report

The Price of Speed: Taming BigQuery Autoscaler Waste