Data Sync Monitoring
Ensure data consistency between systems
Data synchronization between systems is a cornerstone of modern IT infrastructure. CRM to ERP, production database to data warehouse, master server to replicas - these data flows are essential to business operations.
Yet synchronizations are often neglected when it comes to monitoring. We assume they work because no visible errors appear. But a silently failing sync can create data inconsistencies that propagate for days before being detected.
Heartbeat monitoring for synchronizations brings the missing visibility. Each sync signals its execution, result, and metrics. If a signal is missing or metrics are abnormal, the alert is triggered immediately.
Risks of Unmonitored Synchronization
Without proactive monitoring, several problems can go unnoticed:
- Desynchronized data: The CRM shows a modified customer, but the ERP still has the old version. Sales and accounting work with different data.
- Decisions based on stale data: The management dashboard shows yesterday's figures because the sync to the data warehouse failed last night.
- Data loss: A sync that fails repeatedly can accumulate an impossible-to-catch-up delay, or data can be permanently lost.
- Cascading problems: Sync A feeds sync B. If A fails, B works on incomplete data and propagates the error.
Concrete Use Cases
Sync monitoring applies to many scenarios:
Database Replication
Master → replica replication is critical for high availability and distributed reads. Monitor replication lag (how many seconds behind), pending transaction count, and replication errors. Growing lag may indicate a performance issue.
CRM ↔ ERP Synchronization
Customer data, orders, and invoices must be consistent between Salesforce, SAP, or your internal systems. Monitor records synchronized, mapping errors, and processing time. APIs may have rate limits that slow the sync.
Export to Data Warehouse
BigQuery, Snowflake, Redshift - feeding the data warehouse is often critical for reports and dashboards. Monitor that the export ran, row count is consistent, and dependent queries can execute.
Multi-Site E-commerce Sync
Stock, prices, catalog - these data must be synchronized between the central platform and various sales sites. An unsynchronized price can create loss-making sales.
What to Monitor in a Synchronization
A reliable sync should be verified on several aspects:
- Execution: Did the sync run as expected? The heartbeat confirms the process started and finished.
- Volume processed: How many items were synchronized? An abnormally low volume may indicate a problem in source data.
- Errors: How many items failed? Even a small error percentage can be critical depending on the data involved.
- Lag/Delay: What's the delay between source and destination? Growing lag indicates the sync can't keep up.
- Consistency: Are source and destination data identical? Compare counts or checksums to verify.
Monitored Sync Script
Here's an example of monitoring integration in a sync script:
getChangesSince($lastSyncTimestamp);
foreach ($changes as $record) {
try {
$erpApi->upsertCustomer($record);
$synced++;
} catch (ApiException $e) {
logSyncError($record->id, $e->getMessage());
$errors++;
}
}
// Calculate lag
$lag = time() - strtotime($changes[count($changes)-1]->updated_at);
// Send metrics
$duration = round(microtime(true) - $startTime);
$params = http_build_query([
'synced' => $synced,
'errors' => $errors,
'lag' => $lag,
'duration' => $duration
]);
if ($errors > 0) {
file_get_contents($heartbeatUrl . "/fail?" . $params);
} else {
file_get_contents($heartbeatUrl . "?" . $params);
}
} catch (Exception $e) {
file_get_contents($heartbeatUrl . "/fail?error=" . urlencode($e->getMessage()));
}
This script captures essential metrics: sync count, errors, lag, duration. This data is sent with the ping for historical analysis.
Step-by-Step Implementation
Here's how to set up monitoring for your synchronizations:
- 1. Map your syncs
List all your synchronizations: source, destination, frequency, criticality. Identify dependencies between syncs. - 2. Define metrics
For each sync, determine what constitutes success: minimum item count, acceptable error rate, maximum lag. - 3. Create heartbeats
In MoniTao, create one heartbeat per sync with an explicit name. Configure timeout based on normal duration + margin. - 4. Instrument scripts
Add startup and end pings with relevant metrics. Send a fail ping if thresholds are exceeded.
Alert Configuration
Configure alerts tailored to each problem type:
- Sync not executed: No ping received within expected time. The sync didn't start at all - cron or server problem.
- Sync failed: Fail ping received. Check error details in ping parameters.
- Excessive lag: Delay between source and destination exceeds acceptable threshold. Sync can't keep up with the pace.
- Abnormal volume: Far fewer items synchronized than usual. Possible problem in source data.
Sync Monitoring Checklist
- Complete inventory of all synchronizations
- Success metrics defined for each sync
- Heartbeats created with appropriate timeouts
- Scripts instrumented with pings and metrics
- Alert thresholds calibrated (lag, errors, volume)
- Resync procedure documented in case of failure
Frequently Asked Questions
How do I detect growing replication lag?
Send lag as a metric with each ping. Configure an alert if lag exceeds a threshold (e.g., 5 minutes). Analyze history to detect upward trends.
My sync fails partially (some items OK, others not). How to handle?
Define an acceptable error threshold (e.g., 1%). Send error count with the ping. If threshold is exceeded, send a fail ping. Store failed item IDs for reprocessing.
How do I verify data is truly consistent between source and destination?
After sync, compare aggregate metrics: COUNT(*), SUM(amount), MAX(updated_at). If values differ, there's a consistency problem. You can also calculate a checksum on a sample.
Should I monitor real-time syncs (CDC, Change Data Capture)?
Yes, but the model is different. Instead of monitoring point-in-time executions, monitor lag continuously and pending event count. Alert if lag exceeds threshold or queue grows abnormally.
How do I handle syncs that depend on each other?
Create one heartbeat per sync and document dependencies. If sync A must finish before sync B, configure B to verify A succeeded. Use schedulers that support dependencies (Airflow, Luigi).
My sync processes millions of records. How do I avoid overloading monitoring?
Send a single ping at sync end with aggregate metrics (total processed, errors, duration). Don't send one ping per record. For very long syncs (> 1h), add progress pings every 15-30 minutes.
Guarantee Your Data Consistency
Data synchronizations are often the most critical and least monitored processes in an infrastructure. A silently failing sync can create inconsistencies impacting the entire business: wrong decisions based on stale data, unhappy customers, billing errors.
With MoniTao, you can set up complete sync monitoring: execution, metrics, alerts. You know in real-time if your data is consistent across all your systems.
Useful Links
Ready to Sleep Soundly?
Start free, no credit card required.