Secure your AI stack with Alprina. Request access or email hello@alprina.com.

Alprina Blog

Safe Rollbacks: Securing SQL Migrations Before They Torch Production

Cover Image for Safe Rollbacks: Securing SQL Migrations Before They Torch Production
Alprina Security Team
Alprina Security Team

Hook: The Migration That Dropped a Column Faster Than You Could Say Rollback

A Rails migration removes users.phone_number after product deprecates SMS login. The migration runs fine in staging, but in production a legacy report still references the column. Queries start failing, dashboards go blank, and the on-call engineer attempts to rollback the migration. Unfortunately the down migration is empty. The only way back is restoring from backup and replaying hours of writes. The outage becomes a security incident when MFA fallback numbers vanish, locking users out.

Developers often treat migrations as simple schema changes, but in regulated systems they control access, retention, and auditing. This article covers how to build safe migrations: reversible scripts, shadow tables, feature flags, and automated tests that run before deploy. We will use examples in Rails, Django, and plain SQL.

The Problem Deep Dive

Common pain points:

  • Irreversible migrations. Removing columns or data without backups.
  • Mixed DDL and DML. Scripts mutate data while changing schema; rollbacks are complex.
  • Missing gating. Migrations deploy automatically without validation.
  • No drift detection. Manual hotfixes diverge schema between environments.

Example anti-pattern:

class DropPhoneNumber < ActiveRecord::Migration[7.0]
  def change
    remove_column :users, :phone_number, :string
  end
end

change cannot derive a rollback; down is blank.

Technical Solutions

Quick Patch: Always Provide up/down

Write explicit up/down methods:

class DropPhoneNumber < ActiveRecord::Migration[7.0]
  def up
    add_column :users, :phone_number_backup, :string
    execute "INSERT INTO users_backup(id, phone_number) SELECT id, phone_number FROM users"
    remove_column :users, :phone_number
  end

  def down
    add_column :users, :phone_number, :string
    execute "UPDATE users SET phone_number = b.phone_number FROM users_backup b WHERE users.id = b.id"
    drop_table :users_backup
  end
end

Even if you never roll back, documenting the path is essential.

Durable Fix: Two-Phase Migrations with Feature Flags

  1. Add new column or table.
  2. Backfill data in batches (idempotent script).
  3. Switch application code via feature flag.
  4. Remove old column only after validation.

SQL example (Postgres):

ALTER TABLE users ADD COLUMN phone_numbers jsonb DEFAULT '[]'::jsonb;

-- backfill script in application or offline job

-- After feature flag enabled
ALTER TABLE users DROP COLUMN phone_number;

Use ALTER TABLE ... SET NOT NULL only after verifying no nulls remain.

Migration Testing

  • Run migrations in CI against prod snapshots (masking sensitive data).
  • Use tools like strong_migrations (Rails) or django-migration-linter to warn about dangerous operations.
  • Write smoke tests verifying views, stored procedures, and downstream ETL still function.

Versioned Schema Contracts

  • Maintain schema dumps (structure.sql) per environment.
  • Compare dumps in CI to detect unauthorized changes.
  • Use tools like skeema or atlas to manage schema diffs.

Observability

  • Instrument migrations with duration, lock wait time, and rows affected metrics.
  • Emit events when migrations run, including author and ticket ID.

Alprina Policies

Scan migration files for remove_column without backup. Flag change methods removing columns. Ensure down migrations exist and are tested.

Testing & Verification

  • Run rails db:migrate and rails db:rollback STEP=1 in CI to ensure reversibility.
  • For Django, python manage.py migrate --plan and --database per environment.
  • Use pg_dump --schema-only before and after migration to compare.
  • Add integration tests hitting endpoints that depend on the modified schema.

Common Questions & Edge Cases

Do we have to keep backup tables forever? No. Keep them until confident nothing depends on old schema, then drop in later migration.

How to handle large tables? Use phased migrations (add column, backfill in batches, swap). Avoid locking statements like ALTER TABLE ... TYPE on large tables without downtime planning.

What about zero-downtime deployments? Coordinate app rollout with migrations. Use additive changes first, then deploy app, then clean up.

Can we use tools like Liquibase or Flyway? Yes. They provide checksums and history tables, making drift obvious.

How to handle data privacy? Maintain data retention policies in migrations. Do not delete regulated data without legal sign-off.

Conclusion

Safe migrations require discipline: explicit rollbacks, phased rollouts, and comprehensive testing. Treat schema changes as code with guardrails, and the next time you drop a column, you will know exactly how to recover if something goes sideways.