Multi-Cloud Is Not a Resilience Strategy

The pitch sounds airtight. Run workloads across AWS and GCP (or Azure), eliminate vendor lock-in, negotiate pricing leverage, and if one provider has an outage, you're already running elsewhere. It is the infrastructure equivalent of "we have a backup."

Except most teams that adopt multi-cloud don't get any of those benefits. They get the operational overhead without the payoff.

The Resilience Argument Doesn't Survive Contact with Reality

When AWS us-east-1 had its extended EC2 and EBS incident in December 2021, thousands of services went down. But the teams that stayed up weren't running multi-cloud. They were running in multiple regions within the same provider.

Cloud provider outages that take down an entire provider globally are extraordinarily rare. Single-region failures happen, but the fix, active multi-region on one cloud, is dramatically simpler than full multi-cloud. You don't need two different IAM systems, two different VPC models, two different Kubernetes distributions, and two different object storage APIs. You need one provider, multiple regions, and runbooks you've actually tested.

The real causes of most production incidents: bad deploys, misconfigured infrastructure, application bugs, and dependency failures. None of those are solved by running on two clouds. If your config is wrong, it's wrong everywhere.

The Cost Optimisation Argument Has a Hidden Bill

Some teams justify multi-cloud by playing providers against each other on price. The theory: if AWS increases rates, you shift workloads to GCP. In practice, this only works if your workloads are genuinely portable and your team has deep operational expertise on both platforms.

What actually happens: you pay full rate on both clouds because your usage on each is too fragmented to hit commitment discount tiers. AWS and GCP both incentivise concentration. A 3-year compute savings plan on AWS or committed use discounts on GCP can reduce your bill by 40-60%. If your workload is split, you can't hit the thresholds that unlock those rates.

The result is you spend more, not less, and your FinOps analysis becomes twice as complicated.

What It Actually Costs Your Engineering Team

Running multi-cloud means your platform team maintains:

Two networking models (VPCs, peering, transit gateways)
Two IAM systems with different primitives and audit trails
Two observability integrations, usually neither used well
Two CI/CD deployment targets with separate credential management
Developer cognitive load split across two ecosystems

For a platform team of five, that is not a doubled workload. It is a workload that never gets finished. Everything runs at 60% quality because there is not enough time to go deep on either platform.

The teams that run multi-cloud well, Netflix, Spotify, Uber, all have dedicated infrastructure organisations with hundreds of engineers. They built abstraction layers that hide cloud-specific primitives from product engineers. If you don't have the headcount to build that abstraction layer, you're just exposing your product engineers to two different sets of footguns instead of one.

When Multi-Cloud Is Actually Justified

There are legitimate reasons to run on multiple providers:

Regulatory or data residency requirements that mandate specific provider certifications in specific regions
M&A integration where you're absorbing a company on a different cloud and migration isn't immediate
Best-of-breed managed services where one provider is genuinely superior for a specific workload

That last case is worth separating out. Using BigQuery for analytics alongside AWS compute, or Azure OpenAI alongside GCP infrastructure, is not really a multi-cloud strategy. It is using the right tool for a specific job. That is reasonable. Running your entire application footprint across two general-purpose compute platforms is the expensive version.

Design for Portability, Not for Multi-Cloud

The right hedge against vendor lock-in is not running on two clouds simultaneously. It is building workloads that could move if they had to: containerised applications, IaC that abstracts provider specifics, storage interfaces that don't hard-code S3 API calls, and clean separation between compute and data.

That discipline costs you almost nothing operationally. Actively running two clouds costs 30-40% of your platform team's bandwidth and a bill that is higher than a single-provider commitment would be.

Most teams under 200 engineers should consolidate. Pick the provider that fits your team's expertise and your target market (AWS in most cases, GCP if you're ML-heavy or in gaming, Azure if your customers are enterprise). Go deep on one platform. Earn the commitment discounts. Build multi-region within it.

The engineers freed from maintaining dual-cloud infrastructure will do more for your actual reliability than any theoretical failover path.

If you are running multi-cloud and genuinely getting the benefits, I want to hear how you solved the operational overhead. And if you are planning a cloud strategy review, book a call and we can work through whether your current setup is serving your goals or just adding complexity.