Backup Verification

Don't discover backup failure during a restore

A backup that's never verified isn't a backup. It's an illusion of security. Too many companies discover their backups failed at the precise moment they need them - during a server crash, ransomware attack, or catastrophic human error.

The nightmare scenario is classic: the production server goes down. The technical team calmly initiates a restore from the latest backup. And then, surprise: the backup is 3 weeks old because the script had been silently failing all that time. Or worse: the file exists but it's corrupted.

Heartbeat monitoring for backups eliminates this risk. By actively verifying that each backup runs, succeeds, and produces a valid file, you have the guarantee that your data is protected. And if something goes wrong, you know immediately.

Risks of Unmonitored Backups

Without proactive monitoring, many problems can go unnoticed:

  • Silent failure: The backup script crashes with an error, but nobody is notified. The backup simply doesn't exist.
  • Disk space saturated: The destination disk is full. The backup starts but can't write. No alert is generated.
  • Corrupted files: The backup writes but a disk or network error corrupts the data. The file exists, but it's unusable.
  • Modified script: Someone "temporarily" modified the backup script and forgot to restore it. The backup no longer does what it should.

What to Verify in a Backup

A reliable backup must be verified at multiple levels:

  • Execution: Did the backup process start and finish? This is the minimum - verifiable via heartbeat.
  • File existence: Does the backup file exist at the expected location? A backup that "succeeds" without creating a file is useless.
  • Consistent size: Is the file a reasonable size? A 10 MB backup when it should be 5 GB indicates a problem.
  • Integrity (checksum): Is the file not corrupted? An MD5 or SHA256 checksum can verify this.
  • Restorability: Can the backup actually be restored? Only a real restore test confirms this.

Types of Backups to Monitor

Each backup type has monitoring particularities:

Database Backups

MySQL, PostgreSQL, MongoDB - database dumps are critical. Verify the dump completed without error, the file isn't empty, and its size is consistent with data volume. For large databases, also monitor execution duration.

File Backups

Directory backups, configurations, media. Use tools like rsync or tar. Verify the number of files copied, total size, and absence of errors. Watch for permissions that may block certain files.

Cloud Backups (S3, GCS, Azure)

Cloud uploads can fail for multiple reasons: expired credentials, exceeded quota, network issues. Verify the file is present in the bucket and accessible.

Complete Monitored Backup Script

Here's an example bash script integrating all recommended verifications:

#!/bin/bash
# backup_mysql.sh - MySQL backup with complete monitoring

HEARTBEAT_URL="https://monitao.com/ping/YOUR_TOKEN"
BACKUP_DIR="/backups/mysql"
DB_NAME="production"
MIN_SIZE_MB=100  # Expected minimum size in MB

# Startup ping
curl -s "$HEARTBEAT_URL/start" > /dev/null

# Execute backup
BACKUP_FILE="$BACKUP_DIR/${DB_NAME}_$(date +%Y%m%d_%H%M%S).sql.gz"
mysqldump -u backup_user -p"$DB_PASSWORD" "$DB_NAME" | gzip > "$BACKUP_FILE" 2>/tmp/backup_error.log

# Verification 1: File exists
if [ ! -f "$BACKUP_FILE" ]; then
    curl -s "$HEARTBEAT_URL/fail?error=file_not_created" > /dev/null
    exit 1
fi

# Verification 2: Size is consistent
FILE_SIZE_MB=$(($(stat -f%z "$BACKUP_FILE" 2>/dev/null || stat -c%s "$BACKUP_FILE") / 1024 / 1024))
if [ "$FILE_SIZE_MB" -lt "$MIN_SIZE_MB" ]; then
    curl -s "$HEARTBEAT_URL/fail?error=file_too_small&size=${FILE_SIZE_MB}MB" > /dev/null
    exit 1
fi

# Verification 3: Gzip integrity test
if ! gzip -t "$BACKUP_FILE" 2>/dev/null; then
    curl -s "$HEARTBEAT_URL/fail?error=corrupted_gzip" > /dev/null
    exit 1
fi

# All good - success ping
curl -s "$HEARTBEAT_URL?size=${FILE_SIZE_MB}MB" > /dev/null
echo "Backup completed: $BACKUP_FILE (${FILE_SIZE_MB} MB)"

This script verifies each critical step: file creation, minimum size, compression integrity. The success ping is only sent if all verifications pass.

Step-by-Step Implementation

Here's how to set up monitoring for your backups:

  1. 1. Identify critical backups
    List all your backups: databases, configuration files, user data, logs. Prioritize by criticality.
  2. 2. Create heartbeats
    In MoniTao, create one heartbeat per backup with an explicit name (backup-mysql-prod, backup-files-media). Configure timeout based on normal backup duration + margin.
  3. 3. Integrate pings
    Modify your backup scripts to send a startup ping and an end ping (success or failure based on result).
  4. 4. Add verifications
    Before sending the success ping, verify the file exists, its size is consistent, and ideally its integrity.
  5. 5. Configure alerts
    Enable email and SMS alerts for critical backups. A failed nightly backup should wake you up.

Alert Configuration

Good alerts notify you at the right time:

  • Backup not started: The backup was supposed to run at 3 AM but no startup ping was received. Possible cause: disabled cron, stopped server.
  • Backup too long: The backup started but hasn't completed after the configured timeout. Possible cause: locked database, network issue.
  • Backup failed: The script explicitly reported a failure via fail ping. The alert contains error details.
  • Invalid file: The backup completed but the file is too small or corrupted. The script must include this verification.

Common Mistakes to Avoid

These mistakes reduce your backup monitoring effectiveness:

  • Ping before verification: Sending the success ping before verifying the file exists and is valid. The backup may have "succeeded" without producing anything.
  • Ignoring size: Not checking that the file is a consistent size. A 0-byte backup or 1 MB instead of 10 GB goes unnoticed.
  • Timeout too short: Configuring timeout based on average duration. The day the backup takes longer (more data), false alert.
  • Never testing restore: Having a verified backup but never testing that you can actually restore it. The backup may be structurally invalid.

Backup Monitoring Checklist

  • Inventory of all critical backups completed
  • Heartbeat created for each backup with appropriate timeout
  • Scripts modified to send start and end pings
  • Post-backup verifications (existence, size, integrity) implemented
  • Alerts configured with proper channels (email, SMS)
  • Restore test scheduled regularly

Frequently Asked Questions

How do I verify my backup isn't corrupted?

For compressed files (gzip, zip), use built-in verification tools (gzip -t, unzip -t). For SQL dumps, you can attempt a dry-run restore. For complete verification, calculate a checksum (sha256sum) and store it with the backup.

My backup takes 4 hours. How do I monitor it effectively?

Configure a 5-6h timeout (normal duration + 50%). Send a startup ping and an end ping. For very long backups, add progress pings (e.g., every hour) to detect blockages before the global timeout.

What if the backup succeeds but the file is abnormally small?

Integrate size verification in your script. Define an acceptable minimum size based on history. If the file is below threshold, send a fail ping with "error=file_too_small&size=XXX".

Should I monitor cloud backups (S3, GCS, Azure)?

Absolutely! Cloud isn't magic. Credentials can expire, quotas can be reached, regions can have incidents. Verify the upload succeeded (return code from aws s3 cp or gsutil cp) and the file is present and accessible.

How do I monitor backups managed by third-party tools (cPanel, Plesk)?

These tools often generate logs or hooks. Create a cron script that checks for the latest backup presence, its size, and sends the ping accordingly. You can also use webhooks from these tools if available.

Should I monitor incremental backups differently from full backups?

Incremental backups produce files of variable size (sometimes very small if few changes). Adjust your thresholds or disable size verification for incrementals. The important thing is verifying the complete chain (full + incrementals) remains consistent.

Protect Your Critical Data

Your backups are your last line of defense against data loss. They deserve monitoring commensurate with their importance. An unverified backup is as dangerous as no backup - you think you're protected when you're not.

With MoniTao, you can set up complete backup monitoring in minutes. Execution, verification, alerts - everything is covered. Never again discover a backup failure at the moment you need it.

Ready to Sleep Soundly?

Start free, no credit card required.