kubernetes Metrices
- Anand Nerurkar
- Jun 6, 2024
- 4 min read
To understand which metrics to monitor, let’s first break down what type of objects you have in a Kubernetes cluster.
You have two node types. Worker nodes, which run your containerized workloads, and one or more Control Plane nodes.
The Control Plane provides centralized APIs and internal services for cluster management. It also maintains a record of cluster state in an etcd key-value store.
Worker nodes are host VMs. Each node has a kubelet process that monitors the worker node and gives it instructions on how to behave. The kubelet is the connection between the Control Plane and the worker node. It tells the worker node’s runtime environment to create and manage pods to run the workloads.
Kubernetes Cluster & Node Metrics
You first need to monitor the health of your entire Kubernetes cluster. It will help to know how many resources your entire cluster uses, how many applications are running on each node, and if your nodes are working properly and at what capacity.
Here are some of the most useful metrics for each:
Node resource usage metrics like
disk and memory utilization,
CPU
network bandwidth, and many more, enable you to decide if you need to increase or decrease the number and size of each node in the cluster.
Keeping an eye on memory and disk usage at the node level can provide important insight into your cluster’s performance and ability to successfully run workloads. When pods exceed their limits, they will be terminated. If a node runs low on available memory and disk space, the kubelet flags it and begins to reclaim resources.
The number of nodes available shows you what a cluster is used for and what you’re paying for if you’re using cloud providers.
The number of running pods per node shows you if the size of the nodes available is enough and if they could handle the pod workload in case a node fails. This is crucial in case you’re using pod affinity, which allows you to constrain which nodes your pods are eligible to be scheduled on, based on labels on the node.
Memory and CPU requests and limits define the minimum and maximum resources that a node’s kubelet can allocate to containers. Allocatable memory reflects the amount of memory on a node that is available for pods. Specifically, it takes the overall capacity and subtracts memory requirements for OS and Kubernetes system processes to ensure they will not fight user pods for resources. These metrics will inform you if your nodes have enough capacity to meet the memory requirements of all current pods and whether the Control Plane is able to schedule new ones.
Kubernetes Deployments & Pod Metrics
Pod-level monitoring involves looking at three types of metrics: Kubernetes metrics, container metrics, and application metrics.
Kubernetes Metrics
Kubernetes metrics help you ensure all pods in a deployment are running and healthy. They provide information such as how many instances a pod currently has and how many were expected. If the number is too low, your cluster may run out of resources. It’s also important to know how your deployment is progressing, as well as tracking network throughput and data.
Here are some of the most important Kubernetes metrics you should keep track of:
Current Deployment and Daemonset metrics keep track of two important types of controllers in your Kubernetes cluster. Several similar but distinct metrics are available, depending on what type of controller manages those objects. Deployments create a specified number of pods and DaemonSets, which ensure that a particular pod is running on every node.
Missing and failed pods show if pods are running and how many pods are dying.
Pod restarts show how many times pods restarted.
Pods in the CrashLoopBackOff state signal a few different issues such as the application inside the container keeps crashing, or a faulty configuration that causes the pod to crash.
Running vs. desired pods are crucial to see how many instances for each service are actually ready and how many do you expect to be ready.
Pod resource usage vs. requests and limits are important to view if pod limits are set, and what the actual usage of CPU and memory is.
Available and unavailable pods are crucial to track as a pod may be running but not available, meaning it is not ready and able to accept traffic. If you see spikes in the number of unavailable pods, or pods that are consistently unavailable, it might indicate a problem with their configuration.
Container Metrics
Container metrics help you determine how close you are to the resource limits you’ve configured and also allow you to detect pods stuck in a CrashLoopBackoff. You’re interested in monitoring metrics such as:
Container CPU usage helps you see how much CPU usage your containers are using versus the pod limits you set.
Container memory utilization helps you see how much memory is utilized by your containers versus the pod limits you set.
Network usage shows you the sent and received data packets and how much bandwidth you are using.
Application Metrics
Application metrics measure the performance and availability of applications running inside your Kubernetes pods and are usually exposed by the applications themselves. The available metrics depend on the business scope of each application. Below are some of the most common application metrics you should monitor:
Application availability measures the uptime and response times of your application. This is crucial to measure for optimal performance and user experience.
Application health and performance show performance issues, responsiveness, latency, and all the usual horrors you do not want your users to go through. It also surfaces any errors you need to fix in the application layer.
Comments