DevOps | Cloud | Analytics | Open Source | Programming





How To Debug or Troubleshoot Kubernetes Cluster ?



In this post, we will explore How To Debug or Troubleshoot Kubernetes Cluster. Kubernetes issues are difficult to detect & identify. We assume that Kubectl CLI is installed and configured in your Kubernetes environment. Let's start with the basic sanity Checks.  

Primitive Checks :

Do the basic categorization of your issue based on below categories. There might be further issues, not covered, but as we drill down with the subsequent troubleshooting, we will try to fix those.

  • Can you find the node using kubectl ?
  • Is kubectl giving required access to all relevant resources?
  • Is the node accessible using the external IP ?
  • Is the pod running or responding ?
  • Does any of the pod have PendingCrashLoopBackOff or Waiting state ?
  • Is the PVC status show Pending ?
  • Does your pod status show ImagePullBackOff or ErrImagePull ?
  • Can you resolve the Kubernetes service names using the DNS ?
 

Basic Cluster Checks:

  • Check whether all the nodes are all registered correctly. Cross-check if  all the nodes are present and in the Ready state.


kubectl get nodes




NAME       STATUS       ROLES     AGE     VERSION
kube-1     NotReady     <none>    1h      v1.23.3
kube-2     Ready        <none>    1h      v1.23.3
kube-3     Ready        <none>    1h      v1.23.3


To get even more detailed info about the overall health of the cluster, use below -



kubectl cluster-info dump



  • As a next step, get more detailed info the node which is NotReady state (kubelet not active, disconnected from network etc.).
Get the node details



kubectl describe node kube-1


Get the node yaml file details



kubectl get node kube-1 -o yaml


Once done so, let's move to the next set of process.  

Node Sanity Checks:

We will run some basic kubectl commands to get high-level overview of what's going on.

  • Get the nodes info & status


kubectl get nodes -o wide


  • Check the Persistent Volume and Volume Claim info. Cross-check those with your deployment YAML files.


kubectl describe pv <pv name> -n <namespace>


  • Fetch the info about the pods - post secure proxy containers are deployed.


kubectl get pods -n <namespace> -o wide


  • Fetch the the pod logs


kubectl logs -f <pod\_name> -n <namespace>


  • Get pod details


kubectl describe pods <pod\_name> -n <namespace>


Look at any off-beat details in the above.  

Log Checks:

Look at the log files. The way kubelet and container runtime write logs depends on the operating system that the node uses.

  • Linux -  On Linux nodes that use systemd, the kubelet and container runtime write to journald by default. To to read the systemd journal, use below -


journalctl -u kubelet.


If systemd is absent, the kubelet and container runtime write to .log files in the /var/log directory. To write logs to custom dir, you can indirectly run below  and redirect the helper tool to logs to a different custom dir.



kube-log-runner


You can also set a logging directory using the deprecated kubelet command line argument --log-dir. However, the kubelet always directs your container runtime to write logs into directories within /var/log/pods.  

  • Windows - By default, the kubelet writes logs to files within the dir C:\var\logs.
Some cluster deployment tools might set up logs to C:\var\log\kubelet . To write logs to custom dir, you can indirectly run below  and redirect the helper tool to logs to a different custom dir.



kube-log-runner


But kubelet always directs the container runtime to write logs within the directory C:\var\log\pods.   Note that Kubernetes does not provide cluster-level logging Out of the box. Although that can be achieved using some own approaches.   Some of the common node logs are -

  • /var/log/kubelet.log - logs from the kubelet
  • /var/log/kube-proxy.log - logs from kube-proxy
  • /var/log/kube-apiserver.log - API Server
  • /var/log/kube-scheduler.log - Scheduler
  • /var/log/kube-controller-manager.log
  Hope this helps to fix the issue.  

Additional Posts you might want to read from this Blogs-



deployment not ready kubernetes ,kubernetes deployment troubleshooting ,deployment error in kubernetes ,deployment is not ready: . 0 out of 1 expected pods are ready ,kubernetes deployment not creating pods ,troubleshoot deployment issues in kubernetes kodekloud ,kubernetes pod troubleshooting ,troubleshooting kubernetes cluster ,troubleshooting kubernetes networking ,troubleshooting kubernetes services ,troubleshooting kubernetes deployment ,troubleshooting kubernetes ingress ,troubleshooting kubernetes dns ,troubleshooting kubernetes control plane ,troubleshooting kubernetes pdf ,troubleshooting kubernetes nodes ,troubleshooting kubernetes pods ,kubernetes troubleshooting cheat sheet ,kubernetes issues and solutions ,kubernetes troubleshooting flowchart ,kubernetes troubleshooting scenarios ,kubernetes troubleshooting interview questions ,kubernetes cluster ,kubernetes cluster setup ,kubernetes cluster unreachable ,kubernetes cluster architecture ,kubernetes cluster data is stored in which of the following ,kubernetes cluster autoscaler ,kubernetes cluster setup step by step ,kubernetes cluster upgrade ,kubernetes cluster creation ,kubernetes cluster monitoring ,