Terraform Plans That Fight Back: Catching Security Drift Before Apply



Hook: The Terraform Plan That Quietly Opened SSH
Your team runs Terraform via GitHub Actions. A developer adds a bastion host and accidentally sets cidr_blocks = ["0.0.0.0/0"] on port 22. The plan output buries that change under 800 lines of drift, no one notices, and production exposes SSH to the internet for three days until a penetration test spots it. The developer assumed CI would fail dangerous plans, but your pipeline only runs terraform plan and posts the output as a comment.
This article shows how to wire security checks directly into Terraform workflows: policy-as-code, targeted diff scanners, unit tests for modules, and drift detection that developers run locally. We focus on tools you can own: Open Policy Agent (OPA), Checkov, Terratest, and custom scripts.
The Problem Deep Dive
Common gaps:
- Human plan reviews. Developers skim plan output and miss security-sensitive diffs.
- Late detection. Drift detection runs weekly; misconfigurations go live between scans.
- Module regressions. Shared modules lack tests, so changes introduce insecure defaults.
- No policy gating. Terraform apply succeeds even when resources violate org policies.
Typical pipeline:
steps:
- run: terraform init
- run: terraform plan -out plan.tfplan
- uses: actions/github-script@v6
with:
script: core.setOutput("plan", fs.readFileSync("plan.txt", "utf8"))
No policy check, no plan parsing.
Technical Solutions
Quick Patch: Static Analysis with Checkov or tfsec
Add checkov -d . or tfsec . to CI. They catch obvious misconfigurations.
- name: Checkov
uses: bridgecrewio/checkov-action@v12
with:
directory: terraform
soft_fail: false
Durable Fix: Policy-as-Code with OPA or Sentinel
Write OPA policies that evaluate plan JSON:
package policies.security
deny[msg] {
input.resource_changes[_].type == "aws_security_group_rule"
change := input.resource_changes[_].change
change.after.cidr_blocks[_] == "0.0.0.0/0"
change.after.from_port == 22
msg := sprintf("SSH open to world: %s", [change.address])
}
Pipeline step:
terraform plan -out plan.tfplan
terraform show -json plan.tfplan > plan.json
opa eval --data policies --input plan.json "data.policies.security.deny" --fail-defined
If any deny exists, fail the build.
Module Unit Tests with Terratest
Test modules for secure defaults:
func TestSecurityGroupDefaults(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../modules/bastion",
Vars: map[string]interface{}{"cidr_blocks": []string{"10.0.0.0/16"}},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
sg := terraform.OutputList(t, terraformOptions, "sg_cidr_blocks")
require.NotContains(t, sg, "0.0.0.0/0")
}
Run Terratest in CI with localstack or real cloud accounts dedicated to tests.
Drift Detection
Run terraform plan nightly against production state. Pipe JSON into policies and alert on violations. Avoid auto-apply; treat drift detection as read-only.
Developer Tooling
- Provide
pre-commithooks runningterraform fmt,checkov, andopa eval. - Add
make plan-securetarget bundling plan and policy checks. - Document how to update policies when exceptions are legitimate.
Approval Workflow
Integrate policy results with pull request labels. Example: only allow apply when policy:pass label exists. Use GitHub Checks to display violations inline.
Alprina Integration
Alprina can ingest plan JSON and enforce centralized policies. Developers get consistent feedback locally and in CI.
Testing & Verification
- Unit-test policies with OPA:
test_allow_private_ssh {
plan := load_json("testdata/ssh_private.json")
not deny with input as plan
}
-
Run integration tests in ephemeral accounts using Terraform Cloud workspaces or Terragrunt with
--terragrunt-iam-roleto assume test roles. -
Validate pipeline by intentionally creating a plan that opens port 22; ensure CI fails.
-
Track metrics: number of policy failures per week, time to remediate, drift events.
Common Questions & Edge Cases
What about intentional internet access? Allow exceptions via annotations in code (allow_internet = true). Policy reads the flag and requires justification.
Do policies slow down developers? Keep feedback fast. Run OPA locally via pre-commit and provide sample violation output. Document remediation steps.
How do we manage policy sprawl? Version policies in Git. Use semantic commits. Add policy unit tests to prevent regressions.
Can we rely on Terraform Cloud run tasks? Yes, integrate OPA or Checkov as run tasks to enforce policies centrally.
What about multi-cloud? Build provider-agnostic policies when possible. OPA can evaluate any plan JSON, including Azure or GCP resources.
Conclusion
Terraform becomes safer when policy checks are first-class citizens in your workflow. Automate plan evaluation, test modules like code, and catch drift before apply. Your team keeps shipping infrastructure quickly without trading away security.