Server Error Diagnosis and Resolution

Practical guides to quickly identify and resolve common problems.

When an alert arrives, every minute counts. The time between detecting an incident and resolving it determines the impact on your users and business. Quick, methodical diagnosis can make the difference between a micro-interruption and a catastrophic outage.

Server errors are often intimidating: cryptic messages, incomprehensible stack traces, logs scattered across multiple machines. Yet, with the right methodology, most problems can be identified and resolved in minutes.

These diagnostic guides were created by production engineers who have managed thousands of incidents. They follow a systematic approach: identify symptoms, isolate the cause, apply the solution, and prevent recurrence. Combine proactive monitoring with MoniTao and structured diagnosis to minimize the impact of every incident.

The Challenges of Incident Diagnosis

Effectively diagnosing a web incident presents several challenges:

  • Time pressure: Every minute of downtime costs money and reputation. The temptation is to "restart and hope" rather than understand. But without diagnosis, the problem will return.
  • Scattered information: Logs are spread across web server, application, database, external services. Correlating events between these sources requires method and tools.
  • Misleading symptoms: A 503 error can have dozens of different causes: overload, application crash, database saturation, network issue. The symptom doesn't directly indicate the cause.
  • Cascade effect: One problem often leads to others. A database slowdown causes timeouts, which overload the connection pool, which crashes the application. Identifying the root cause requires tracing back the chain.

Common HTTP Errors

Here are the most frequent server errors and their typical causes:

  • 500 Internal Server Error: Internal error generated by the application. Code bug, unhandled exception, configuration problem. Check application logs.
  • 502 Bad Gateway: The reverse proxy (Nginx) cannot communicate with the backend (PHP-FPM, Node). The backend service is probably down or saturated.
  • 503 Service Unavailable: Service temporarily unavailable. Worker overload, maintenance in progress, or rate limiting activated. Often transient.
  • 504 Gateway Timeout: The backend didn't respond within the allotted time. Slow request, saturated database, or external service not responding.
  • Connection Timeout: Connection couldn't be established in time. Server down, DNS issue, blocking firewall, or saturated network.

Detailed Diagnostic Guides

Check our specific guides for each error type:

Diagnostic Methodology

Follow this systematic approach for any incident:

  • 1. Qualify the incident: Is the error global or limited? All users or some? All pages or one? This initial assessment guides the diagnosis.
  • 2. Check the logs: Web server (access.log, error.log), application, database. Look for errors matching the incident timestamp. Correlate between sources.
  • 3. Check the metrics: CPU, memory, disk, network, database connections. A load spike correlated with the incident often indicates the direction.
  • 4. Test hypotheses: Based on symptoms and data, formulate hypotheses and test them. One hypothesis at a time, measure the effect.

Essential Diagnostic Commands

Here are the most useful commands for diagnosing an incident:

# Check recent web server errors
tail -100 /var/log/nginx/error.log

# Search for 5xx errors in access logs
grep " 5[0-9][0-9] " /var/log/nginx/access.log | tail -50

# Check PHP-FPM worker status
systemctl status php-fpm
cat /var/log/php-fpm/error.log | tail -50

# Check system resources
top -bn1 | head -20
free -h
df -h

# Check network connections
ss -tuln | grep LISTEN
netstat -an | grep ESTABLISHED | wc -l

# Database logs (MySQL)
tail -50 /var/log/mysql/error.log

These commands quickly assess the state of the web server, application, and system resources. Start with error logs, then expand your investigation based on clues found.

Incident Diagnostic Checklist

  • Qualify the incident: scope, affected users, concerned pages
  • Check web server logs (Nginx/Apache)
  • Check application logs (errors, exceptions)
  • Check system metrics (CPU, RAM, disk)
  • Check service status (PHP-FPM, database)
  • Document diagnosis and resolution for post-mortem

Frequently Asked Questions

Where to start diagnosing a server error?

Start by qualifying the incident (scope, impact), then check the web server error logs. They usually contain the first clues: error code, timestamp, and often a descriptive message.

What are the most useful diagnostic tools?

Server logs (access.log, error.log) are your first source. Then: top/htop for resources, ss/netstat for network, and specific logs for each service (PHP-FPM, MySQL, etc.).

How to reduce incident resolution time?

Prepare runbooks documenting diagnosis and resolution for common errors. Train the team to use them. Automate what can be automated (diagnostic scripts, enriched alerts).

Can MoniTao help with diagnosis?

MoniTao detects the problem and provides initial context: error code, response time, exact timestamp. Deep diagnosis requires access to server logs, which MoniTao cannot view.

How to differentiate an application problem from an infrastructure problem?

First check system metrics (CPU, memory). If they're normal, the problem is probably application-level. If they're saturated, it's infrastructure. Application logs then give more details.

Should I restart before diagnosing?

No, restarting can erase valuable information for diagnosis. First capture logs, metrics and system state. Only restart after collecting data or if urgency requires it.

Conclusion

Effective diagnosis relies on structured methodology and good knowledge of your tech stack. Minutes invested in understanding a problem are always worthwhile: they prevent recurrence and enrich team documentation.

MoniTao helps you detect incidents quickly and provides initial context. Combined with these diagnostic guides, you have all the tools to minimize the impact of every incident on your users.

Ready to Sleep Soundly?

Start free, no credit card required.