Kubernetes CronJob Monitoring

Effectively monitor your scheduled jobs in Kubernetes

Kubernetes CronJobs allow running pods on a defined schedule, just like classic Unix cron. In modern cloud-native environments, they are used for maintenance tasks, backups, synchronization, and batch processing. Their monitoring is critical because in an active cluster with hundreds of pods, CronJob failures easily go unnoticed.

Kubernetes complexity introduces many potential failure points: container image not found, insufficient resources to schedule the pod, CrashLoopBackOff on the container, exceeded timeout, or simply a CronJob accidentally suspended. These problems are often silent because Kubernetes has no native alerting for CronJobs.

MoniTao perfectly complements the Kubernetes ecosystem by adding an external heartbeat monitoring layer. By integrating a simple curl call into your containers, you are instantly alerted if a CronJob doesn't execute correctly, regardless of the underlying cause.

Essential Kubernetes concepts

Understanding K8s CronJob architecture is essential for configuring effective monitoring.

  • CronJob: resource that defines the schedule (cron format) and the Job template to create. The CronJob itself only orchestrates periodic Job creation.
  • Job: execution instance created by the CronJob at each trigger. A Job can create one or more Pods and ensures they complete successfully.
  • Pod: execution unit containing one or more containers. This is where your script actually runs.
  • Completion: state indicating a Job completed successfully (exit code 0). This is the signal Kubernetes uses to mark a Job as successful.

Kubernetes-specific challenges

Kubernetes CronJobs present unique challenges compared to traditional crons.

  • CrashLoopBackOff: the container restarts in a loop after repeated failures. The Job may appear active but never progresses.
  • Image not found: ImagePullBackOff error if the registry is unreachable or image doesn't exist. The pod stays stuck.
  • Indefinite pending: insufficient cluster resources to schedule the pod. Can last indefinitely without alerting.
  • Concurrency policy: with concurrencyPolicy: Forbid, a new Job isn't created if the previous one is still running. Can mask performance problems.

Monitoring strategies

Several approaches allow integrating heartbeat monitoring into your Kubernetes CronJobs.

  • Curl at script end: add curl as the last container command. Simple and effective, the ping is only sent if the entire script succeeded.
  • Init container: for complex cases, an init container can send a start ping before main execution.
  • Sidecar pattern: a sidecar container can monitor the main container's state and send pings accordingly.
  • Custom operator: for advanced environments, a Kubernetes Operator can automate monitoring of all CronJobs.

CronJob example with heartbeat

Here's an example Kubernetes CronJob manifest with integrated heartbeat monitoring:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-daily
spec:
  schedule: "0 2 * * *"  # 2:00 AM daily
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: my-backup-image:latest
            command:
            - /bin/sh
            - -c
            - |
              /scripts/backup.sh && \
              curl -fsS "https://api.monitao.com/ping/YOUR_TOKEN"
            resources:
              requests:
                memory: "256Mi"
                cpu: "100m"
              limits:
                memory: "512Mi"
                cpu: "500m"

The && ensures curl is only executed if backup succeeds. With restartPolicy: OnFailure, K8s will automatically retry if the container fails, but MoniTao will alert you of the delay.

Alert configuration

Configure your MoniTao alerts to cover different Kubernetes failure scenarios.

  • Job not created: no Job was created at the scheduled time. Check that the CronJob isn't suspended and the schedule is correct.
  • Pod in error: the pod is in Failed or CrashLoopBackOff state. Check logs with kubectl logs.
  • Timeout exceeded: the Job didn't complete in the expected time. May indicate a performance issue or blocking.
  • Missing heartbeat: no ping received within configured delay. Most common cause: silent script failure.

Kubernetes CronJob checklist

  • CronJob deployed and not suspended (suspend: false)
  • Schedule verified with correct timezone
  • Resource requests/limits defined
  • MoniTao heartbeat integrated into container
  • Manual test with kubectl create job --from=cronjob/
  • K8s event monitoring in place

Frequently asked questions about Kubernetes CronJob

My CronJob doesn't create Jobs at the scheduled time. Why?

Check several things: suspend isn't true, the schedule is valid, the CronJob hasn't reached failedJobsHistoryLimit. Use kubectl describe cronjob to see history.

How do I test a CronJob without waiting for the schedule?

Create a manual Job from the CronJob: kubectl create job test-run --from=cronjob/my-cronjob. This immediately runs the Job with the same configuration.

My pod stays in Pending state indefinitely. What should I do?

The cluster doesn't have enough resources to schedule the pod. Check with kubectl describe pod pod-name to see events. Reduce requests or add nodes.

How do I handle timezones in K8s CronJobs?

Since K8s 1.27+, use spec.timeZone: "Europe/Paris". For earlier versions, the schedule uses the kube-controller-manager timezone (often UTC).

How do I prevent concurrent executions of my CronJob?

Set concurrencyPolicy: Forbid in the spec. This prevents creating a new Job if the previous one is still running. Caution: this can mask performance problems.

How do I see logs for a K8s Job that failed?

List the Job's pods with kubectl get pods -l job-name=my-job, then kubectl logs pod-name. For terminated pods, add --previous if the container restarted.

Conclusion

Kubernetes CronJobs are powerful but their native monitoring remains limited. In an active cluster, silent failures can go unnoticed for days, affecting your backups, synchronizations, and batch processing.

By combining Kubernetes best practices with MoniTao heartbeat monitoring, you gain complete visibility into your scheduled jobs. Start with your critical CronJobs and gradually extend coverage across your entire cluster.

Ready to Sleep Soundly?

Start free, no credit card required.