How GPU Rental Pricing Actually Works: On-demand vs Spot vs Reserved in 2026
On-demand is 2-3x more expensive than spot. Reserved instances lock in 12-month rates at 40-50% discounts but kill flexibility. Here is how to pick the right model.
T. Camadan
AI infrastructure engineer who has spent $200K+ on GPU rentals across 8 production deployments. Former ML platform lead at a Series B startup.
Quick Answer
On-demand is 2-3x more expensive than spot but guarantees availability. Spot instances offer 30-50% discounts with interruption risk. Reserved instances lock in 40-50% discounts but require 12-month commitments. Choose spot for training workloads you can checkpoint. Choose reserved for production inference you cannot afford to interrupt.
The Three Pricing Models Explained
GPU rental providers have settled on three pricing tiers that mirror cloud compute economics. Understanding how they work—and when each applies—requires dropping the marketing and looking at actual deployment patterns.
On-Demand: Pay Per Hour, No Strings Attached
On-demand is the default model. You spin up an instance, you pay by the hour, you shut it down, you stop paying. The simplicity is the value.
What you get:
- Guaranteed availability (provider maintains inventory)
- No interruption risk
- No commitment required
- Hourly or per-minute billing
What you pay (April 2026 examples):
- H100 80GB: $4.29-7.20/hr depending on provider
- A100 80GB: $2.49-4.50/hr
- RTX 4090 24GB: $0.50-2.10/hr
On-demand is the baseline reference point. All discounts are relative to these rates.
Spot: The Discounted Alternative
Spot instances are excess capacity offered at discounts because the provider needs to fill unused GPU time. The discount is real, but so is the interruption risk.
How spot pricing works: Providers maintain pools of idle GPUs. When demand is low, prices drop to fill capacity. When demand spikes, prices increase or capacity disappears entirely. Spot prices can change every 5-30 minutes.
Typical spot discounts:
- 30-50% off on-demand rates
- H100 spot: $1.89-3.80/hr
- A100 spot: $1.25-2.40/hr
- RTX 4090 spot: $0.35-1.19/hr
The deeper discount comes with a catch: providers can terminate spot instances with 30 seconds to 2 minutes notice when capacity is reclaimed for on-demand customers.
Reserved: The Commitment Discount
Reserved instances lock you into a term (typically 12 months) in exchange for significant discounts. Think of it like buying a subscription vs paying per glass of wine.
Reserved discounts:
- 40-50% off on-demand rates
- 12-month terms are standard
- Month-to-month available at smaller discounts (10-15%)
The economics:
If you run an H100 for 12 months continuously at $6/hr on-demand, you pay $52,000. At reserved rates ($3.30/hr), you pay ~$28,000. That is $24,000 savings—but only if you actually use it continuously.
When On-Demand Makes Sense
On-demand is overpriced for most use cases. But it is the right choice when:
Unpredictable or Variable Workloads
If your GPU needs vary day-to-day or week-to-week—experiment-heavy development phases, prototype testing, variable production traffic—on-demand is the practical choice. Locking into reserved terms for workloads that do not exist yet is a recipe for waste.
Short-Term Projects
A two-week training sprint? A one-day benchmark run? A weekend hackathon? On-demand at premium rates is cheaper than paying for reserved capacity you will not use.
Production Inference with SLAs
If your API serves customers with uptime guarantees, on-demand (or reserved) with SLA guarantees is non-negotiable. Spot interruptions on a customer-facing API are a support nightmare. The premium is the cost of reliability.
First-Time GPU Rental
If you are new to GPU cloud, on-demand is the right starting point. Learn your usage patterns before committing to terms. Spot requires infrastructure knowledge to handle interruptions gracefully.
When Spot Makes Sense
Spot instances are the most cost-effective for teams who understand their trade-offs.
Batch Training Runs
Training jobs are inherently interruption-tolerant if you implement checkpointing. Save state every 100-500 steps depending on job length. When interrupted, resume from the last checkpoint. Spot discounts of 40% on training workloads reduce cost-per-experiment by a third.
Distributed Training with Fault Tolerance
Modern training frameworks (PyTorch Elastic, NVIDIA FL Ange, AWS Spot Checkpointing) handle spot interruptions natively. If your training pipeline is built for fault tolerance, spot is the obvious choice.
Development and Experimentation
Development workloads are typically idle more than they run. You write code, review outputs, iterate. GPU utilization during development is often 20-40%. Paying on-demand rates for idle time is wasteful. Use spot and accept the interruption risk.
Cost-Sensitive Research
Academic research budgets are finite. Spot discounts of 30-50% stretch compute grants significantly. The interruption risk is manageable with proper checkpointing and prioritization of shorter training runs.
When Reserved Makes Sense
Reserved instances are a strategic commitment, not a tactical choice.
Stable Production Inference at Scale
If you have a production API serving consistent traffic 24/7, reserved terms make economic sense. The break-even calculation is simple: if you need the GPU for >4-6 months continuously, reserved beats on-demand.
Example: A production API needing H100 at 10 hours/day for 12 months:
- On-demand: $6.00 × 10hr × 365 = $21,900
- Reserved: $3.30 × 10hr × 365 = $12,045
- Savings: $9,855 (45%)
Enterprise SLA Requirements
Enterprise customers often require contractual uptime guarantees that spot (no SLA) cannot provide. Reserved instances typically include SLA terms with financial remedies for downtime.
Predictable Team Expansion
If you are hiring ML engineers on 12-month plans and your infrastructure scales with headcount, reserved terms align with your growth trajectory. The commitment matches the predictable cost.
The Hidden Math Nobody Talks About
Utilization Rate Matters More Than Hourly Price
The most common mistake: choosing a GPU based on hourly price without calculating utilization-adjusted cost.
Scenario: A100 at $2.40/hr spot vs H100 at $4.50/hr on-demand
At first glance, A100 looks cheaper. But if your workload runs 3x faster on H100:
- A100: 9 hours × $2.40 = $21.60
- H100: 3 hours × $4.50 = $13.50
H100 is actually cheaper because your total compute time is shorter.
The Interruption Recovery Cost
Spot savings are not pure profit. Interruption recovery has costs:
- Checkpoint overhead: Writing state to persistent storage adds 5-10% to wall clock time
- Rework on failure: If checkpointing fails or is too infrequent, you lose work and repeat training
- Engineering investment: Building interruption-tolerant pipelines requires 1-2 weeks of engineering time upfront
Calculate the true cost of interruption tolerance before claiming spot’s theoretical savings.
Egress Fees Change the Math
At scale, data egress fees dwarf compute costs.
- Downloading 500GB of training data monthly from RunPod at $0.05/GB = $25/month
- Same data from Vast.ai at $0.01/GB = $5/month
- Same data from Lambda Labs (1TB free then $0.09/GB) = $0 first month, then ~$45/month after free tier
For data-intensive training workloads, egress fees can add 15-25% to compute costs.
Provider-Specific Nuances
Lambda Labs: Reserved-First Philosophy
Lambda heavily discounts reserved terms (40-50%) and charges premium rates for on-demand. Their business model incentivizes commitment. For stable production workloads, this works in your favor if you commit.
RunPod: Hybrid Flexibility
RunPod offers monthly commitments with pro-rated refunds—midway between pure on-demand and locked reserved terms. This is the most practical model for teams with growing but unpredictable GPU needs.
Vast.ai: Market-Based Pricing
Vast.ai’s marketplace model means prices fluctuate constantly based on supply and demand. During low-demand periods, you can find H100 spot instances for under $2/hour. During GPU shortages (new model releases, hype cycles), prices spike or availability disappears. Patience and timing matter on Vast.ai.
CoreWeave: Reserved Discounts Without 12-Month Lock
CoreWeave offers 1-month reserved terms at 15-25% discounts—a middle ground between Lambda’s aggressive 12-month discount and RunPod’s monthly commitment. For teams that want commitment discounts without full 12-month lock-in, CoreWeave is worth considering.
Building a Hybrid Strategy
Most production deployments use a mix of pricing models:
Production Layer: Reserved for Stability
Your production inference APIs should run on reserved instances with SLA guarantees. This is the non-negotiable foundation.
Training Layer: Spot for Efficiency
Batch training jobs can run on spot instances with checkpointing. Build your training pipelines to be interruption-tolerant from day one. The savings compound over months.
Development Layer: On-Demand for Flexibility
Developer workstations and experiment runners use on-demand instances. You need flexibility during the exploration phase before committing to reserved terms.
The Buffer Layer: Pre-Launched Spot
For latency-sensitive batch jobs that cannot wait for instance startup, maintain a pool of pre-launched spot instances that boot and wait. When a training run starts, you do not pay cold-start penalties.
How to Decide: A Practical Checklist
Choose On-Demand If:
- Your GPU needs vary significantly week-to-week
- You are in a development or experimentation phase
- You need GPUs for <4 months total
- You cannot handle interruption recovery in your pipelines
Choose Spot If:
- Your training jobs support checkpointing
- You are cost-sensitive and can tolerate uncertainty
- Your workloads are batch-oriented, not real-time
- You have engineering bandwidth to build interruption tolerance
Choose Reserved If:
- You have stable, predictable GPU needs >8hr/day
- You need SLA-backed uptime for production serving
- You can commit to 12-month terms
- Your usage is predictable enough to size capacity accurately
Choose Hybrid If:
- You have both production and training workloads
- Your team has DevOps capacity to manage multiple pricing tiers
- You want to optimize cost without sacrificing reliability
The Decision Matrix
| Workload Type | Pricing Model | Why |
|---|---|---|
| Production API serving | Reserved | SLA + predictable cost |
| Batch training runs | Spot | Cost savings + checkpoint tolerance |
| Development / iteration | On-demand | Flexibility + no commitment |
| Prototype testing | On-demand or Spot | Depends on duration and pipelineaturity |
| R&D experiments | Spot | Cost-sensitive if checkpointing is built |
| Inference endpoints | Reserved (stable) or On-demand (variable) | Depends on traffic pattern |
The right model is not a one-time choice. Re-evaluate every quarter as your workloads stabilize or change.
Authority Sources:
- AWS EC2 Spot Instances — Official AWS spot documentation
- Google Cloud Spot VMs — GCP preemptible VMs
- Lambda Labs Reserved Instances — Official reserved pricing tiers
- NIST Cloud Computing Standards — Government cloud standards reference
:::tip Continue Reading:
- For real-time GPU pricing across all providers, see the GPU Rental Index for live hourly rates
- To calculate your specific cost by workload, use our Project Budgeter
- For provider comparisons, see Vast.ai vs RunPod vs Lambda
- For spot instance strategies, read our Spot Instances for AI Training :::
Related Posts
- The Complete Guide to Spot Instances for AI Training in 2026: Save 40-60% Without the Nightmares
- AMD MI300X vs NVIDIA H100: The Underdog’s Real Challenge in 2026 (Honest Assessment)
- CoreWeave vs AWS: Enterprise GPU Hosting Face-Off 2026 (Real Costs, Real SLAs)
References
- PromptCost.org — AI API pricing data and analysis
- OpenAI Pricing — GPT-4o API pricing
- Anthropic API Pricing — Claude API pricing
Frequently Asked Questions
How much cheaper are spot instances compared to on-demand GPUs?
Spot instances run 30-50% cheaper than on-demand across all providers. H100 on-demand at $5.50/hr drops to $3.80/hr on spot. A100 goes from $3.40/hr to $2.40/hr. The discount varies by GPU type and provider.
What happens when spot instances are interrupted?
Spot instances can be terminated with 30 seconds to 2 minutes notice depending on provider. You lose all data in memory and any non-persisted state. Your training job must support checkpointing to resume from the last saved state.
Are reserved instances worth the 12-month commitment?
Reserved instances only make sense if you have predictable, stable GPU needs exceeding 8 hours/day for 12+ months. The break-even point is typically 4-6 months of continuous use vs switching to on-demand or spot.
Which pricing model fits which use case?
On-demand: Development, testing, unpredictable workloads. Spot: Batch training, interruption-tolerant workloads, cost-optimized teams. Reserved: Production inference, enterprise SLAs, predictable high-volume usage.
Can I switch between pricing models mid-project?
Yes, but with friction. Your infrastructure must support both checkpoint-based interruption recovery and different availability characteristics. Design for spot from day one if cost optimization is a priority.
How do providers determine spot prices?
Spot prices fluctuate based on supply-demand balance in each region. Prices update every 5-30 minutes depending on provider. When a new AI model releases, GPU demand spikes and spot prices jump 20-40% within hours.
What is the break-even point for reserved instance discounts?
With 40-50% reserved discounts, the break-even vs on-demand is typically 4-6 months of equivalent usage. If you need GPUs for <4 months, on-demand is cheaper even at premium rates.
Do spot instances have SLAs?
No major provider offers SLA guarantees on spot instances. This is the fundamental trade-off: you save money but accept interruption risk. Enterprise deployments must plan for spot failures or pay for on-demand/reserved.
What are the minimum commitment requirements?
Lambda Labs: 12-month reserved minimum. RunPod: Monthly commitments with pro-rated refunds. Vast.ai: No commitment required. CoreWeave: 1-month reserved terms available at lower discounts than 12-month.
Share this article