How I Evaluate Cloud Provider Security Claims for Critical Workloads

Cloud provider marketing pages converge on a familiar script: enterprise-grade security, compliance certifications listed prominently, encryption everywhere. The sales engineer walks you through the shared responsibility model with a clean diagram showing exactly where the provider’s obligations end and yours begin. It looks tidy, reassuring, and almost entirely disconnected from how security actually works once workloads are running in production.

The gap between what a provider claims and what the contract guarantees catches teams repeatedly. Common gaps include: a “managed” database that leaves encryption key rotation entirely to the customer, a SOC 2 report whose scope excludes the exact services you depend on, or an incident response SLA that guarantees notification within 72 hours, which is an eternity when you’re holding customer financial data.

Evaluating provider security claims is a skill that compounds, and it follows the same framework whether you’re placing workloads in AWS or evaluating a crypto custodian for digital assets.

The Shared Responsibility Model in Practice

Every major cloud provider publishes a shared responsibility model. AWS, GCP, and Azure all have their own versions, and they all communicate the same fundamental idea: the provider secures the infrastructure, you secure what you put on it. The marketing version is a clean horizontal line. The reality is a jagged, overlapping mess where ownership is ambiguous in precisely the areas that matter most.

Take managed Kubernetes as a concrete example. EKS, GKE, and AKS all “manage” the control plane: API server availability, etcd backups, control plane upgrades. That’s real value. But workload security, RBAC configuration, network policies, secrets management, pod security standards, runtime monitoring, and container image provenance all remain your problem. Many teams discover this gradually, usually when something breaks.

The shared responsibility boundary also shifts depending on which service tier you consume. Running your own EC2 instances gives you more control and more responsibility. Using Lambda or Fargate pushes more operational security to the provider, albeit with less visibility into what’s happening underneath. Neither is inherently more secure; they’re different trade-off profiles, and the right choice depends on your team’s operational maturity and threat model.

Reading SOC 2 Reports for Real Information

SOC 2 Type II reports are the currency of cloud security trust, and most teams treat them as checkboxes: do you have one? Great, next question. That approach misses nearly everything useful these reports contain.

Three things I look for before anything else. First, scope exclusions. A SOC 2 report covers specific systems and services within a defined boundary. Providers regularly exclude newer services, acquired products, or specific infrastructure components from the audit scope. If the service you depend on falls outside the boundary, the report tells you nothing about it.

Second, Complementary User Entity Controls, usually buried in Section IV or an appendix. These are the controls the auditor assumes you are implementing. The provider’s controls only work if yours are in place. When a SOC 2 report says the provider “restricts access to production systems,” the CUEC section might reveal that this depends on you configuring federation correctly, rotating service account credentials, and monitoring access logs on your side. Skip the CUECs and you have a false sense of coverage.

A SOC 2 report with unread Complementary User Entity Controls tells you about the provider’s controls whilst hiding your own obligations.

Third, the testing period and any exceptions noted. A 12-month Type II report with noted exceptions in access review procedures is more honest than a 3-month Type II with a clean opinion, because the shorter window means less opportunity for controls to fail. Pay attention to the testing window, and ask why if a provider switched auditors recently.

Encryption Claims: What “Encrypted” Actually Means

Encryption is the most overloaded term in cloud security marketing. “Data encrypted at rest and in transit” appears on every provider’s compliance page, and it tells you almost nothing about your actual risk exposure.

At-rest encryption with provider-managed keys means the provider can decrypt your data. For most workloads, that’s acceptable, since you trust the provider with compute access anyway. But for regulated workloads or sensitive financial data, the question is whether you can bring your own keys (BYOK) or, better yet, hold your own keys (HYOK) in an external HSM that the provider never accesses. The difference matters enormously during a provider-side breach or a law enforcement request in a jurisdiction you didn’t anticipate.

In-transit encryption is largely a solved problem with TLS, albeit with caveats around internal service mesh traffic and east-west communication within a VPC. The emerging frontier is in-use encryption: confidential computing via technologies like AMD SEV, Intel SGX/TDX, and ARM CCA. These protect data whilst it’s being processed, meaning even a compromised hypervisor can’t read your workload’s memory. If your threat model includes a compromised or coerced provider, confidential computing is the only technical control that addresses it, and the maturity varies significantly across providers.

Evaluating Provider Incident Response

Security incidents at cloud providers happen, and what matters is how you’ll find out and what access you’ll have to investigate.

Breach notification timelines vary wildly. Some providers commit to 24-hour notification, some to 72 hours, and some offer only “commercially reasonable” language that means whatever their legal team decides it means. For workloads handling financial data or PII, 72 hours of silence after a provider-side breach can turn a containable incident into a regulatory exposure.

Beyond notification timing, ask about forensics access. When a provider-side incident affects your workloads, can you get detailed logs? Raw audit trails? Or do you get a sanitized summary that tells you an incident occurred without enough detail to assess your own exposure? Most providers default to the sanitized summary, and getting anything more requires contract negotiation before the incident happens.

Provider SLAs for incident response are also worth stress-testing against your own incident response plan. If your team’s runbook assumes you can isolate affected resources within 30 minutes, but the provider’s SLA gives them 4 hours to respond to a severity-1 support ticket, your runbook has a dependency it hasn’t accounted for.

The Questions I Ask Before Placing Critical Workloads

Over time, I’ve converged on a set of questions that cut through marketing and reveal the operational reality of a provider’s security posture:

Data residency and sovereignty. Where does data physically reside, and can you contractually guarantee it stays there? This matters for GDPR, for financial regulations with data localization requirements, and for any workload where a government access request in an unexpected jurisdiction would create legal exposure.

Key management architecture. Can you bring external keys? Does the provider’s KMS support automatic rotation? What happens to encrypted data if you revoke the key, and how quickly does revocation propagate? The answer to that last question reveals whether the provider’s encryption is a genuine security boundary or a compliance checkbox.

Audit log retention and immutability. How long are control plane logs retained by default, and can you extend retention? More importantly, can the provider modify or delete those logs, or are they append-only? Immutable audit logs are the foundation of any serious forensic investigation.

Deletion guarantees. When you delete data, what actually happens? Is it a logical delete with physical media erasure on a schedule, or can you trigger immediate cryptographic erasure? For regulated workloads, the gap between logical and physical deletion creates compliance risk that most teams discover during an audit.

Deletion guarantees reveal a great deal about a provider’s operational maturity, because solving data deletion at scale is architecturally hard.

Dependency mapping. Which third-party sub-processors handle your data? Cloud providers use sub-contractors for everything from support tooling to CDN edge nodes, and each one extends the trust boundary in ways the primary provider’s SOC 2 may not cover.

The Crypto Custody Parallel

If this framework sounds familiar to anyone in the digital asset space, it should. Evaluating a crypto custodian for holding private keys uses identical reasoning. Who holds the keys, what’s the key ceremony process, what happens during a compromise, how do you verify that deletion is real, what does the insurance actually cover? I’ve written about this in the context of wallet security and key management, and the overlap is striking.

The custodian’s SOC 2 report has the same scope exclusion risks. Their “institutional-grade security” marketing has the same gap between claim and contract. The shared responsibility model, where the custodian secures the infrastructure whilst you manage access policies and operational procedures, maps directly onto the cloud provider relationship.

Trust, but Verify the Contract

Cloud providers are infrastructure partners, and trusting them is reasonable. Trusting their marketing without reading the contract, the SOC 2 report, the CUEC appendix, and the incident response SLA is how teams end up surprised during the worst possible moment.

The evaluation work is tedious, and it pays for itself the first time something goes wrong. Build the muscle: read the reports, ask the uncomfortable questions during procurement, and design your architecture so that provider-side failures don’t become your existential crises. The teams that treat provider security evaluation as a recurring discipline rather than a one-time checkbox are the ones that survive the incidents you can’t predict.