Terraform Plans That Fight Back: Catching Security Drift Before Apply

Alprina Security Team

Cover Image for Terraform Plans That Fight Back: Catching Security Drift Before Apply

Alprina Security Team

August 20, 2024

Hook: The Terraform Plan That Quietly Opened SSH

Your team runs Terraform via GitHub Actions. A developer adds a bastion host and accidentally sets cidr_blocks = ["0.0.0.0/0"] on port 22. The plan output buries that change under 800 lines of drift, no one notices, and production exposes SSH to the internet for three days until a penetration test spots it. The developer assumed CI would fail dangerous plans, but your pipeline only runs terraform plan and posts the output as a comment.

This article shows how to wire security checks directly into Terraform workflows: policy-as-code, targeted diff scanners, unit tests for modules, and drift detection that developers run locally. We focus on tools you can own: Open Policy Agent (OPA), Checkov, Terratest, and custom scripts.

The Problem Deep Dive

Common gaps:

Human plan reviews. Developers skim plan output and miss security-sensitive diffs.
Late detection. Drift detection runs weekly; misconfigurations go live between scans.
Module regressions. Shared modules lack tests, so changes introduce insecure defaults.
No policy gating. Terraform apply succeeds even when resources violate org policies.

Typical pipeline:

steps:
  - run: terraform init
  - run: terraform plan -out plan.tfplan
  - uses: actions/github-script@v6
    with:
      script: core.setOutput("plan", fs.readFileSync("plan.txt", "utf8"))

No policy check, no plan parsing.

Technical Solutions

Quick Patch: Static Analysis with Checkov or tfsec

Add checkov -d . or tfsec . to CI. They catch obvious misconfigurations.

- name: Checkov
  uses: bridgecrewio/checkov-action@v12
  with:
    directory: terraform
    soft_fail: false

Durable Fix: Policy-as-Code with OPA or Sentinel

Write OPA policies that evaluate plan JSON:

package policies.security

deny[msg] {
  input.resource_changes[_].type == "aws_security_group_rule"
  change := input.resource_changes[_].change
  change.after.cidr_blocks[_] == "0.0.0.0/0"
  change.after.from_port == 22
  msg := sprintf("SSH open to world: %s", [change.address])
}

Pipeline step:

terraform plan -out plan.tfplan
terraform show -json plan.tfplan > plan.json
opa eval --data policies --input plan.json "data.policies.security.deny" --fail-defined

If any deny exists, fail the build.

Module Unit Tests with Terratest

Test modules for secure defaults:

func TestSecurityGroupDefaults(t *testing.T) {
  t.Parallel()
  terraformOptions := &terraform.Options{
    TerraformDir: "../modules/bastion",
    Vars: map[string]interface{}{"cidr_blocks": []string{"10.0.0.0/16"}},
  }
  defer terraform.Destroy(t, terraformOptions)
  terraform.InitAndApply(t, terraformOptions)

  sg := terraform.OutputList(t, terraformOptions, "sg_cidr_blocks")
  require.NotContains(t, sg, "0.0.0.0/0")
}

Run Terratest in CI with localstack or real cloud accounts dedicated to tests.

Drift Detection

Run terraform plan nightly against production state. Pipe JSON into policies and alert on violations. Avoid auto-apply; treat drift detection as read-only.

Developer Tooling

Provide pre-commit hooks running terraform fmt, checkov, and opa eval.
Add make plan-secure target bundling plan and policy checks.
Document how to update policies when exceptions are legitimate.

Approval Workflow

Integrate policy results with pull request labels. Example: only allow apply when policy:pass label exists. Use GitHub Checks to display violations inline.

Alprina Integration

Alprina can ingest plan JSON and enforce centralized policies. Developers get consistent feedback locally and in CI.

Testing & Verification

Unit-test policies with OPA:

test_allow_private_ssh {
  plan := load_json("testdata/ssh_private.json")
  not deny with input as plan
}

Run integration tests in ephemeral accounts using Terraform Cloud workspaces or Terragrunt with --terragrunt-iam-role to assume test roles.
Validate pipeline by intentionally creating a plan that opens port 22; ensure CI fails.
Track metrics: number of policy failures per week, time to remediate, drift events.

Common Questions & Edge Cases

What about intentional internet access? Allow exceptions via annotations in code (allow_internet = true). Policy reads the flag and requires justification.

Do policies slow down developers? Keep feedback fast. Run OPA locally via pre-commit and provide sample violation output. Document remediation steps.

How do we manage policy sprawl? Version policies in Git. Use semantic commits. Add policy unit tests to prevent regressions.

Can we rely on Terraform Cloud run tasks? Yes, integrate OPA or Checkov as run tasks to enforce policies centrally.

What about multi-cloud? Build provider-agnostic policies when possible. OPA can evaluate any plan JSON, including Azure or GCP resources.

Conclusion

Terraform becomes safer when policy checks are first-class citizens in your workflow. Automate plan evaluation, test modules like code, and catch drift before apply. Your team keeps shipping infrastructure quickly without trading away security.

Alprina Blog