How To Debug or Troubleshoot Kubernetes Cluster ?

In this post, we will explore How To Debug or Troubleshoot Kubernetes Cluster. Kubernetes issues are difficult to detect & identify. We assume that Kubectl CLI is installed and configured in your Kubernetes environment. Let's start with the basic sanity Checks.  

Primitive Checks :

Do the basic categorization of your issue based on below categories. There might be further issues, not covered, but as we drill down with the subsequent troubleshooting, we will try to fix those.

  • Can you find the node using kubectl ?
  • Is kubectl giving required access to all relevant resources?
  • Is the node accessible using the external IP ?
  • Is the pod running or responding ?
  • Does any of the pod have PendingCrashLoopBackOff or Waiting state ?
  • Is the PVC status show Pending ?
  • Does your pod status show ImagePullBackOff or ErrImagePull ?
  • Can you resolve the Kubernetes service names using the DNS ?

Basic Cluster Checks:

  • Check whether all the nodes are all registered correctly. Cross-check if  all the nodes are present and in the Ready state.

kubectl get nodes

kube-1     NotReady     <none>    1h      v1.23.3
kube-2     Ready        <none>    1h      v1.23.3
kube-3     Ready        <none>    1h      v1.23.3

To get even more detailed info about the overall health of the cluster, use below -

kubectl cluster-info dump

  • As a next step, get more detailed info the node which is NotReady state (kubelet not active, disconnected from network etc.).
Get the node details

kubectl describe node kube-1

Get the node yaml file details

kubectl get node kube-1 -o yaml

Once done so, let's move to the next set of process.  

Node Sanity Checks:

We will run some basic kubectl commands to get high-level overview of what's going on.

  • Get the nodes info & status

kubectl get nodes -o wide

  • Check the Persistent Volume and Volume Claim info. Cross-check those with your deployment YAML files.

kubectl describe pv <pv name> -n <namespace>

  • Fetch the info about the pods - post secure proxy containers are deployed.

kubectl get pods -n <namespace> -o wide

  • Fetch the the pod logs

kubectl logs -f <pod\_name> -n <namespace>

  • Get pod details

kubectl describe pods <pod\_name> -n <namespace>

Look at any off-beat details in the above.  

Log Checks:

Look at the log files. The way kubelet and container runtime write logs depends on the operating system that the node uses.

  • Linux -  On Linux nodes that use systemd, the kubelet and container runtime write to journald by default. To to read the systemd journal, use below -

journalctl -u kubelet.

If systemd is absent, the kubelet and container runtime write to .log files in the /var/log directory. To write logs to custom dir, you can indirectly run below  and redirect the helper tool to logs to a different custom dir.


You can also set a logging directory using the deprecated kubelet command line argument --log-dir. However, the kubelet always directs your container runtime to write logs into directories within /var/log/pods.  

  • Windows - By default, the kubelet writes logs to files within the dir C:\var\logs.
Some cluster deployment tools might set up logs to C:\var\log\kubelet . To write logs to custom dir, you can indirectly run below  and redirect the helper tool to logs to a different custom dir.


But kubelet always directs the container runtime to write logs within the directory C:\var\log\pods.   Note that Kubernetes does not provide cluster-level logging Out of the box. Although that can be achieved using some own approaches.   Some of the common node logs are -

  • /var/log/kubelet.log - logs from the kubelet
  • /var/log/kube-proxy.log - logs from kube-proxy
  • /var/log/kube-apiserver.log - API Server
  • /var/log/kube-scheduler.log - Scheduler
  • /var/log/kube-controller-manager.log
  Hope this helps to fix the issue.  

