Programme / savings playbook
Cloud Cost Savings Playbook 2026: 8 strategies ranked by impact
Eight cloud cost optimisation strategies, ordered by typical realised impact. Each entry includes the levers, the saving range, and the concrete steps to execute. Most teams should sequence top to bottom, not run them all in parallel.
Achieve 60-75% commitment discount coverage on stable workloads
Typical saving: 30-50% on covered compute
AWS Compute Savings Plans, Azure Reserved Instances, GCP Committed Use Discounts
Commitment discounts (Savings Plans, RIs, CUDs) are the single biggest lever for stable production workloads. Most teams under-cover and leave significant savings on the table. Target 60-75% coverage of steady-state, leaving the spiky top on on-demand or Spot.
Steps to execute
- 01Run a 30-day Cost Explorer / Cost Management coverage report
- 02Identify the steady-state floor on each major workload
- 03Buy 3-year all-upfront commitments on absolutely stable workloads
- 04Buy 1-year on the layer above that with some uncertainty
- 05Review quarterly and adjust as workloads evolve
Migrate compatible workloads to ARM-based instances
Typical saving: 20-40% on compute
AWS Graviton 4, Azure Cobalt 100, GCP Tau T2A
AWS Graviton, Azure Cobalt, and GCP Tau T2A all offer 20-40% better price-performance than x86 equivalents for compatible workloads. Most managed services and Linux containers run unmodified.
Steps to execute
- 01Identify ARM-compatible workloads (most managed services, Linux containers, JIT languages)
- 02Run benchmark on a sample workload to validate performance
- 03Update CI/CD to build multi-architecture images
- 04Deploy to a small fleet and measure real-world performance and cost
- 05Roll out across all compatible workloads
Move fault-tolerant workloads to Spot capacity
Typical saving: 60-90% on eligible compute
AWS Spot, Azure Spot VMs, GCP Spot VMs
Spot, Spot VMs, and former Preemptible VMs offer 60-90% off list. Suitable for batch processing, ML training, CI runners, and any stateless or checkpointed workload that tolerates interruption.
Steps to execute
- 01Identify fault-tolerant workloads: batch, training, CI, EKS data plane
- 02Use mixed-instance autoscaling groups or Karpenter for diversity
- 03Implement graceful shutdown handlers (2-min on AWS, 30-sec on Azure / GCP)
- 04Use capacity-optimised allocation strategies to minimise interruption
- 05Monitor interruption rates and adjust pool diversity
Right-size with native recommendation tools
Typical saving: 10-25% on compute
All three providers, free native tools
Most production workloads run at 20-30% CPU utilisation. Native rightsizing tools surface candidates with actionable recommendations. Memory metrics often require an agent on the instance.
Steps to execute
- 01Enable AWS Compute Optimizer / Azure Advisor / GCP Recommender
- 02Install memory metric agents (CloudWatch agent, Azure Monitor agent, Ops Agent)
- 03Wait 14+ days for sufficient metrics
- 04Review recommendations sorted by potential saving
- 05Apply changes in non-production first, validate, then roll to production
Storage lifecycle policies and intelligent tiering
Typical saving: 20-90% on cold data
AWS S3, Azure Blob, GCP Cloud Storage
Object storage tiering automatically moves data through Hot, Cool, Cold, and Archive based on access patterns. AWS S3 Intelligent-Tiering and GCP Autoclass are automatic. Azure uses policy-based lifecycle rules.
Steps to execute
- 01Enable S3 Intelligent-Tiering on general-purpose buckets (no retrieval fees)
- 02Configure Azure Blob lifecycle rules based on last-access time
- 03Enable GCP Autoclass on buckets with mixed access patterns
- 04Add archive tier transitions for known cold data (180+ days idle)
- 05Audit retrieval patterns quarterly to ensure tiers are correctly sized
Eliminate hidden costs through bill audit
Typical saving: 10-20% of total bill
All three providers; see /hidden-costs for full catalogue
NAT Gateway, log ingestion, idle endpoints, and snapshot accumulation typically add 10-20% to the bill. A monthly bill audit grouped by usage type rather than service surfaces these line items.
Steps to execute
- 01Run cost reports grouped by usage type for past 90 days
- 02Identify line items unrelated to active workloads
- 03Apply VPC endpoints to reduce NAT Gateway processing
- 04Add log exclusion filters to reduce CloudWatch / Log Analytics ingestion
- 05Audit and delete unattached IPs, idle LBs, orphaned snapshots monthly
Apply Azure Hybrid Benefit on Microsoft estates
Typical saving: 40-80% on eligible Azure VMs
Azure VMs, AKS, App Service, SQL Database, SQL Managed Instance
Azure-specific lever for Windows Server and SQL Server workloads. Existing on-prem licences with Software Assurance can apply to Azure VMs and PaaS services.
Steps to execute
- 01Inventory Windows Server and SQL Server licences with Software Assurance
- 02Run Azure Cost Management Hybrid Benefit calculator
- 03Identify eligible VMs and database services
- 04Enable Hybrid Benefit on each eligible resource
- 05Stack with Reserved Instances for compounding effect
Architecture-level efficiency
Typical saving: Variable, often largest gains long-term
Architecture and engineering practice
The biggest savings often come from architecture decisions: serverless for low-utilisation services, managed services where appropriate, multi-tenant patterns, and avoiding architectural mistakes (cross-AZ chatty services, oversized always-on infrastructure).
Steps to execute
- 01Pre-deployment cost reviews for new services
- 02Audit cross-AZ traffic and apply topology-aware routing
- 03Evaluate serverless options (Lambda, Container Apps, Cloud Run) for low-utilisation services
- 04Consolidate dev / staging environments where possible
- 05Add cost as a non-functional requirement in design docs
Common questions
FAQ
Which strategy delivers the fastest savings?+
Commitment discount coverage and bill audit are the fastest. Commitments deliver immediate 30-50% discount on covered compute the moment you buy them. Bill audits routinely surface 5-10% in idle resources and unused capacity that can be eliminated within a sprint. Both are achievable in the first 30 days of a serious FinOps effort.
What is the most overlooked cost lever?+
Storage lifecycle policies. Most teams set up object storage at the Hot or Standard tier and never tier down, even though most data is rarely accessed after 30-90 days. Enabling intelligent tiering on general-purpose buckets typically saves 20-40% on object storage with no architecture change and no risk.
When should architecture changes be considered for cost?+
When a single workload exceeds $10K/mo or represents more than 5% of total cloud spend. Below that, the optimisation effort exceeds the saving. Above that, architecture-level redesigns (serverless conversion, regional re-platforming, multi-tenant consolidation) often deliver the largest single-cut savings.
How much can a mature FinOps practice save overall?+
Industry benchmarks show 20-40% reduction from baseline for most organisations that move from no FinOps practice to mature Run-stage capabilities. The first 10-15% comes from basic visibility and obvious wins. The next 10-25% comes from commitment optimisation, rightsizing, and architecture changes. Above 40% reduction is unusual and typically requires major workload re-platforming.
Should we apply all strategies at once?+
No. Sequence matters. Right-size first (or you commit to oversized capacity). Then apply commitments on the right-sized footprint. Then layer Spot on fault-tolerant workloads. Then storage tiering. Then bill audit and hidden-cost cleanup. Architecture changes follow the data once you understand where the bill actually goes.
Continue reading