We adopted our first CSPM tool the way most teams do: after an audit finding made someone nervous enough to sign a purchase order. The pitch was compelling. One pane of glass across three cloud providers, continuous compliance checking, automated remediation for the most common misconfigurations. Within a quarter, compliance scores had climbed and the security team was presenting beautiful slide decks to the board, all from dashboards that looked excellent, whilst our IAM policies had holes wide enough to drive a breach through the entire time.

That experience reshaped how I think about cloud security tooling, and specifically about the gap between posture assessment and actual security.

What CSPM Gets Right

Credit where it’s earned: CSPM tools solve a real problem. Cloud environments drift. An engineer opens a port for debugging and forgets to close it. A storage bucket policy gets modified during an incident and never reverted. Compliance frameworks demand evidence that you’re checking for these things, and manually auditing hundreds of resources across multiple accounts is a losing game.

Configuration drift detection catches the easy wins: publicly accessible databases, unencrypted volumes, security groups that allow 0.0.0.0/0 ingress. These are real risks, and catching them automatically before they become audit findings has genuine value. Multi-cloud visibility matters too, because most organizations of any size run workloads in at least two providers, and each provider’s native tooling only sees its own estate.

The compliance baseline checks are useful for frameworks like SOC 2, PCI-DSS, and CIS benchmarks. Having automated evidence collection saves weeks of audit preparation. I still value that capability, and I wouldn’t run a cloud environment without some form of configuration scanning.

The trap is mistaking configuration compliance for security, and I fell into it for longer than I’d like to admit.

The Coverage Illusion

What the dashboard doesn’t show: a green compliance score and a vulnerable environment can coexist comfortably. CSPM tools check infrastructure configuration against known-good patterns, which means they catch deviations from baseline, misconfigurations that match documented vulnerability classes, and policy violations that someone thought to write a rule for.

What they don’t catch is the more dangerous class of problems: IAM policies that grant excessive permissions through subtle combinations, service account keys that have been rotated into new accounts but never revoked from old ones, cross-account role assumptions that create transitive trust chains no single team fully understands. The CSPM dashboard shows green checks for “MFA enabled on root account” and “CloudTrail logging active” whilst an overprivileged Lambda execution role can read every secret in your Secrets Manager.

A compliance dashboard full of green checks is a security artifact, a record of what you measured, with the things you failed to measure structurally absent.

The coverage illusion persists because the metrics look good. Stakeholders see improvement quarter over quarter. Security reviews focus on the checklist items that have automated evidence. Meanwhile, the actual attack surface, the IAM layer, gets reviewed annually at best, usually by someone who doesn’t have the full cross-account picture.

IAM as the Real Attack Surface

A disproportionate share of cloud breaches follow a consistent pattern: an attacker obtains valid credentials, then uses those credentials to move laterally through overprivileged access paths. The initial access vector varies, albeit leaked keys in a public repository, a compromised CI/CD pipeline, a phishing attack against a developer with console access. What stays constant is the exploitation of excessive IAM permissions.

The IAM problems that recur in audits fall into predictable categories. Wildcard policies are the most obvious: "Action": "*" on "Resource": "*" shows up in production environments far more often than it should, usually inherited from a development environment policy that was copy-pasted during a deadline. Long-lived access keys are nearly as bad, particularly when they belong to service accounts that multiple teams share and no single team owns.

Cross-account role chaining creates the most subtle risk. Organization A trusts Organization B’s role, which trusts a third role in a development account, which has a policy attached that was last reviewed eighteen months ago. Each link in that chain made sense when it was created. The composite trust path creates an exposure that exists in the gaps between teams, between accounts, between the quarterly access reviews that only look at one account at a time.

Modern CSPM platforms like Wiz, Orca, and Prisma Cloud have begun incorporating CIEM (Cloud Infrastructure Entitlement Management) capabilities that analyze IAM permissions, effective access paths, and unused entitlements. This convergence is a meaningful step, and teams evaluating CSPM tooling today should weight CIEM integration heavily in their selection criteria. The underlying challenge persists regardless of tooling improvements, though: the organizational habit of treating dashboard scores as security outcomes is a process problem, and bolting better IAM analysis onto an existing CSPM workflow changes what the dashboard can surface without necessarily changing how the organization responds to those findings.

Overprivileged service accounts deserve special attention. When a team needs a Lambda function to write to S3 and read from DynamoDB, the path of least resistance is attaching a managed policy that grants broader access than required. The function works, the ticket closes, and the excess permissions persist indefinitely because nothing breaks and the CSPM tool doesn’t flag managed policies as misconfigurations.

Policy-as-Code vs Runtime Reality

Terraform says one thing, the console says another, and the actual runtime permissions say a third. This gap between policy-as-code and enforcement is where I spent months building confidence in a system that was, in practice, divergent from its own documentation.

Infrastructure-as-code gives you a version-controlled, reviewable, auditable definition of what your environment should look like. That’s valuable, and I’d never advocate going back to click-ops provisioning. The problem emerges when teams treat the Terraform state file as the source of truth for what exists, rather than as an aspirational document that may or may not reflect reality.

Console modifications happen during incidents, during debugging sessions, during that Friday afternoon when the deploy pipeline is broken and the fix needs to ship. Each manual change widens the gap between declared state and actual state. Terraform drift detection helps, albeit only if you run it frequently enough and actually remediate what it finds. Most teams run it weekly at best and treat the drift reports as informational rather than actionable.

The runtime enforcement gap goes deeper than configuration drift. A security group rule might match the Terraform definition perfectly whilst the application behind it has changed its network behavior in ways that make that rule insufficient. The infrastructure matches the code, the code matches the plan, and the plan no longer matches the threat model because the threat model was written when the application had half as many integration points.

Runtime Protection vs Posture Assessment

CSPM tells you what’s misconfigured. Cloud Workload Protection Platforms (CWPP) tell you what’s happening at runtime. Neither, in isolation, stops an attacker who has already obtained valid credentials and is operating within their granted permissions.

This distinction took me too long to internalize. Posture assessment is periodic, configuration-focused, and backward-looking: it tells you whether your environment matches a known-good state as of the last scan. Runtime protection is continuous, behavior-focused, and present-tense: it watches for anomalous activity, unexpected process execution, and suspicious API call patterns. You need both, and they answer fundamentally different questions.

The integration between these layers matters more than the individual tools. A CSPM finding about an overly permissive security group becomes urgent when the runtime layer shows active exploitation of that permissiveness. A runtime alert about unusual API calls becomes actionable when the posture layer can confirm which permissions made those calls possible. Disconnected tools generate noise; connected tools generate signal.

What Actually Works

A few patterns have proven durable across the environments I’ve operated in.

Service Control Policies as guardrails. SCPs operate at the organization level and constrain what any account can do, regardless of the IAM policies within that account. Denying sts:AssumeRole to external accounts, preventing the creation of IAM users with console access in production, restricting which regions can run compute workloads: these are blunt instruments, and that’s their value. They establish boundaries that can’t be circumvented by a well-meaning engineer with admin access to a single account.

Permission boundaries on every operator and service identity. Permission boundaries cap the maximum permissions an IAM entity can have, regardless of what policies are attached. Combining permission boundaries with least-privilege policies creates a defense-in-depth approach to IAM that survives the inevitable policy drift.

Credential rotation automation that doesn’t rely on manual intervention. If rotating a service account key requires someone to run a script, it won’t happen on schedule. Build credential rotation into the deployment pipeline so that long-lived keys become short-lived by default.

CloudTrail analysis as a continuous security practice. CloudTrail tells you what actually happened, as opposed to what your policies say should be possible. Regular analysis of API call patterns, especially AssumeRole chains, iam:* operations, and cross-account activity, surfaces the gaps between intended and actual access patterns.

The security controls that survive organizational change are the ones that enforce themselves without depending on the team that created them remembering to maintain them.

Building a Security Posture That Outlasts You

The hardest part of cloud security architecture has little to do with tooling. Tools change, providers add capabilities, threat landscapes evolve. What persists is the organizational structure around security decisions.

A security posture that depends on a specific team’s vigilance will degrade the moment that team’s attention shifts, which happens every time there’s a reorg, a leadership change, or a sufficiently exciting new project. The controls that endure are the ones that enforce themselves: SCPs that prevent entire classes of misconfiguration, permission boundaries that cap privilege escalation, automated rotation that removes manual scheduling from the equation.

Documentation helps, albeit only when it’s coupled to enforcement. A security policy document that says “all service accounts must follow least privilege” is worth less than an SCP that prevents Action: * policies from being created in the first place. The SCP enforces what the document can only describe. When those two things diverge, enforcement wins, and building your security architecture around enforceable controls rather than documented intentions produces measurably better security outcomes.