Faster, smaller, and sanitized MySQL database dumps


mysql drupal

Developers often need copies of production data for local development, and standard workflows for getting these dumps have a lot of downsides.

  1. Moving SQL dumps around means an increased risk of sensitive data being exposed.
  2. Most sanitization methods are a multi-step process.
  3. Large databases put strain on slow internet connections.
  4. Large databases take way too long to import.

The Solution

mtk-dump is an extremely powerful replacement for mysqldump. Define your sanitization and minification rules in a simple yaml file, and produce small and safe SQL dumps for development. Check out some configuration examples for the cool stuff it can do.

mtk-dump \
  --config=mtk.yml \
  --host=mysql.svc \
  --port=3306 \
  --user=foo \
  --password=bar \
  app_environment > db.sql

You can get mtk-dump from the releases page, or with the skpr/mtk-dump docker image.

Examples

These examples could all live in a single mtk.yml file and be included with the --config=mtk.yml flag.

Sanitize account credentials

The rewrite block allows you to set values for specific columns. These can be explicit strings or mysql expressions.

rewrite:
  # Table: users
  users:
    # Column: email
    # MySQL expression to give each account a unique email in the format 123@localhost
    email: concat(id, "@localhost")
    # Column: pass
    # Explicit string to overwrite the hashed password.
    pass: '"sanitized"'

Exclude orders older than 1 week

Using the where block, you can exclude rows that don’t match a given mysql WHERE clause.

where:
  # Table: orders
  # WHERE clause to find rows with recent created timestamp.
  orders: |-
      created >= DATE_SUB(NOW(), INTERVAL 7 DAY)

Exclude data from specific tables

Use nodata to keep the table structure, but no data.

nodata:
  - queue
  - sessions

Exclude entire tables

Use ignore to exclude tables entirely.

ignore:
  - __ACQUIA_MONITORING__

Exclude body field data for unpublished revisions in Drupal

This example uses subqueries to find the revision IDs of the currently published drupal entities, and only export body data for those revisions.

where:
  # Table: node_revision__body
  node_revision__body: |-
    revision_id IN (SELECT vid FROM node)
  # Table: paragraph_revision__body
  paragraph_revision__body: |-
    revision_id IN (SELECT revision_id FROM paragraphs_item)

Sanitizing SQL dumps

If you already have a SQL file (such as a production backup) and want to sanitize it, you can use a cool little script included in the skpr/mtk-dump docker image to do this.

SQL_INPUT=db.sql
SQL_OUTPUT=sanitized.sql
MTK_DUMP_CONFIG=mtk.yml
docker run --rm \
  -v $(pwd):/data \
  -w /data \
  -e MTK_DUMP_CONFIG skpr/mtk-dump database-sanitize ${SQL_INPUT} ${SQL_OUTPUT}

More Thoughts

Replace mysqldump

I haven’t done this myself, but as mtk-dump takes the same connection flags, you could hypothetically overwrite the mysqldump binary on your system. mtk configs could then be toggled with the MTK_DUMP_CONFIG environment variable. This might be a way to get around issues like drush hard-coding the mysqldump binary. This might also be a bad idea.

Autogenerate configs

I’m interested in autogenerating mtk configs for known data hogs, like the body field in drupal etc… Maybe a composer plugin or something like that.

Credit

Shout-out to Nick Schuch and CodeDrop who built and maintain the mtk project.