Troubleshooting Kubernetes Node Disk Usage When a Pod Reports Low Disk Space

After running Kubernetes for a while, I noticed that some systems running inside my Kubernetes cluster started reporting low disk space warnings.

At first, this looked like an application-level disk issue. However, after checking the pod filesystem, I found that the warning was actually showing the disk usage of the Kubernetes node root filesystem.

In my case, the key clue was this line from inside the pod:

/dev/mapper/ubuntu--vg-ubuntu--lv   38G   27G  8.9G  76% /config

/dev/mapper/ubuntu--vg-ubuntu--lv   38G   27G  8.9G  76% /config

The application was checking disk space from a mounted path inside the container. That path was not application data. It was a Kubernetes-managed volume mount backed by the node filesystem.

So the warning was not caused by data inside a PVC. It was caused by the Kubernetes node root disk becoming full.

After checking the node directly, I found that most of the space was used by containerd under /var/lib/containerd.

This post shows how I traced the warning from inside the pod back to the Kubernetes node, how I confirmed that containerd was using most of the space, and how to safely clean unused container images with crictl.

Environment

This issue happened in my existing 3-node bare-metal Kubernetes cluster, which was built in this previous post:

Build a 3-Node HA Kubernetes Cluster with kube-vip, MetalLB, and Traefik on Bare-Metal Ubuntu 24.04

The cluster uses:

Ubuntu Server 24.04
3 bare-metal Kubernetes nodes
All nodes acting as both control-plane and worker nodes
Kubernetes installed with kubeadm
containerd as the container runtime
kube-vip for the Kubernetes API VIP
MetalLB for bare-metal LoadBalancer services
Traefik as the Ingress controller
Calico as the CNI

Ubuntu Server 24.04
3 bare-metal Kubernetes nodes
All nodes acting as both control-plane and worker nodes
Kubernetes installed with kubeadm
containerd as the container runtime
kube-vip for the Kubernetes API VIP
MetalLB for bare-metal LoadBalancer services
Traefik as the Ingress controller
Calico as the CNI

The important part for this post is that the cluster uses containerd, not Docker.

With containerd, container images, unpacked layers, and container snapshots are stored on each node under:

/var/lib/containerd

/var/lib/containerd

So even if application data is stored on PVCs, the Kubernetes node root filesystem can still run out of space because of container runtime data.

Why a Pod Can Report Node Disk Usage

The reason I noticed this issue was that a system running inside Kubernetes reported a low disk space warning.

To understand what the application was seeing, I checked the filesystem from inside the pod:

kubectl -n thingsboard exec -it tb-node-0 -- df -h

kubectl -n thingsboard exec -it tb-node-0 -- df -h

The important lines were:

Filesystem                         Size  Used Avail Use% Mounted on
overlay                             38G   27G  8.9G  76% /
/dev/mapper/ubuntu--vg-ubuntu--lv   38G   27G  8.9G  76% /config

Filesystem                         Size  Used Avail Use% Mounted on
overlay                             38G   27G  8.9G  76% /
/dev/mapper/ubuntu--vg-ubuntu--lv   38G   27G  8.9G  76% /config

The /config line was the key clue.

At first glance, this can be confusing. It looks like the pod is seeing the node disk directly. But this does not mean the whole Kubernetes node root directory is mounted into the container.

It means /config is a mounted path inside the container, and the storage behind that mount comes from the Kubernetes node filesystem.

This is common in Kubernetes.

Some paths inside a pod are not normal application data directories. They may be Kubernetes-managed volume mounts, such as:

ConfigMap volumes
projected volumes
emptyDir volumes
application configuration mounts

ConfigMap volumes
projected volumes
emptyDir volumes
application configuration mounts

Container writable layers are different from Kubernetes volume mounts, but they are also stored on the node filesystem by the container runtime. So both Kubernetes-managed mounts and container runtime storage can make disk usage inside a pod reflect the node filesystem instead of a PVC.

For example, when a ConfigMap is mounted into a pod, the ConfigMap data originally comes from the Kubernetes API, but kubelet prepares the actual files on the node under a path like:

/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~configmap/<volume-name>/

/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~configmap/<volume-name>/

Then Kubernetes mounts that directory into the container, for example:

/config

/config

If /var/lib/kubelet is stored on the node root filesystem, then df -h /config inside the container can show the usage of the node root filesystem.

That does not mean /config itself contains 27 GB of files.

It means the filesystem behind /config is 76% full.

To check the actual size of files inside /config, use du, not df:

kubectl -n thingsboard exec -it tb-node-0 -- du -sh /config

kubectl -n thingsboard exec -it tb-node-0 -- du -sh /config

Output:

16K     /config

16K     /config

This proves the point:

df -h /config  -> shows the filesystem usage behind /config
du -sh /config -> shows the actual size of files inside /config

df -h /config  -> shows the filesystem usage behind /config
du -sh /config -> shows the actual size of files inside /config

So in this case, the warning came from inside the pod, but the real problem was the Kubernetes node root filesystem.

Check Which Node the Pod Was Running On

After confirming that the pod was showing node filesystem usage, I checked which Kubernetes node the pod was running on:

kubectl -n thingsboard get pod tb-node-0 -o wide

kubectl -n thingsboard get pod tb-node-0 -o wide

Example output:

NAME        READY   STATUS    RESTARTS   AGE   IP              NODE
tb-node-0   1/1     Running   0          8h    10.244.173.40   k8s-1.maksonlee.com

NAME        READY   STATUS    RESTARTS   AGE   IP              NODE
tb-node-0   1/1     Running   0          8h    10.244.173.40   k8s-1.maksonlee.com

The pod was running on:

k8s-1.maksonlee.com

k8s-1.maksonlee.com

So the next step was to check disk usage directly on that node.

Confirm What /config Is

To confirm what /config is inside the pod, check the pod volume mounts:

kubectl -n thingsboard get pod tb-node-0 \
  -o jsonpath='{range .spec.containers[*].volumeMounts[*]}{.mountPath}{" <- "}{.name}{"\n"}{end}'

kubectl -n thingsboard get pod tb-node-0 \
  -o jsonpath='{range .spec.containers[*].volumeMounts[*]}{.mountPath}{" <- "}{.name}{"\n"}{end}'

Output:

/config <- tb-node-config
/var/log/thingsboard <- tb-node-logs
/var/run/secrets/kubernetes.io/serviceaccount <- kube-api-access-mp4gk

/config <- tb-node-config
/var/log/thingsboard <- tb-node-logs
/var/run/secrets/kubernetes.io/serviceaccount <- kube-api-access-mp4gk

Then check the volume definitions:

kubectl -n thingsboard get pod tb-node-0 \
  -o jsonpath='{range .spec.volumes[*]}{.name}{" : configMap="}{.configMap.name}{" secret="}{.secret.secretName}{" pvc="}{.persistentVolumeClaim.claimName}{" hostPath="}{.hostPath.path}{" emptyDir="}{.emptyDir}{"\n"}{end}'

kubectl -n thingsboard get pod tb-node-0 \
  -o jsonpath='{range .spec.volumes[*]}{.name}{" : configMap="}{.configMap.name}{" secret="}{.secret.secretName}{" pvc="}{.persistentVolumeClaim.claimName}{" hostPath="}{.hostPath.path}{" emptyDir="}{.emptyDir}{"\n"}{end}'

Output:

tb-node-config : configMap=tb-node-config secret= pvc= hostPath= emptyDir=
tb-node-logs : configMap= secret= pvc= hostPath= emptyDir={}
kube-api-access-mp4gk : configMap= secret= pvc= hostPath= emptyDir=

tb-node-config : configMap=tb-node-config secret= pvc= hostPath= emptyDir=
tb-node-logs : configMap= secret= pvc= hostPath= emptyDir={}
kube-api-access-mp4gk : configMap= secret= pvc= hostPath= emptyDir=

This confirms that:

/config is a ConfigMap volume.

/config is a ConfigMap volume.

It is not a PVC.

It is not a hostPath mount of the whole node root filesystem.

It is a Kubernetes ConfigMap volume mounted into the container.

That explains why df -h /config shows node root filesystem usage, while du -sh /config only shows 16K.

Check Disk Usage on the Kubernetes Node

On the affected node, I checked the root filesystem:

df -h

df -h

Then I looked for large top-level directories:

sudo du -xh --max-depth=1 / 2>/dev/null | sort -h
sudo du -xh --max-depth=1 /var 2>/dev/null | sort -h
sudo du -xh --max-depth=1 /var/lib 2>/dev/null | sort -h

sudo du -xh --max-depth=1 / 2>/dev/null | sort -h
sudo du -xh --max-depth=1 /var 2>/dev/null | sort -h
sudo du -xh --max-depth=1 /var/lib 2>/dev/null | sort -h

In my case, /var was the largest directory:

20G     /var
27G     /

20G     /var
27G     /

Then I checked the most likely Kubernetes-related paths:

sudo du -xh --max-depth=1 /var/lib/containerd /var/lib/kubelet /var/log 2>/dev/null | sort -h

sudo du -xh --max-depth=1 /var/lib/containerd /var/lib/kubelet /var/log 2>/dev/null | sort -h

The result showed that containerd was using most of the space:

29M     /var/lib/kubelet
594M    /var/log
19G     /var/lib/containerd

29M     /var/lib/kubelet
594M    /var/log
19G     /var/lib/containerd

Inside /var/lib/containerd, most of the usage came from image layers and snapshots:

5.1G    /var/lib/containerd/io.containerd.content.v1.content
14G     /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
19G     /var/lib/containerd

5.1G    /var/lib/containerd/io.containerd.content.v1.content
14G     /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
19G     /var/lib/containerd

So the problem was clear.

The node root disk was filling up because of container images and container snapshots stored by containerd.

Why containerd Uses Disk Space

On a Kubernetes node using containerd, images are pulled and stored locally.

The main path is:

/var/lib/containerd

/var/lib/containerd

This directory includes:

image content store
unpacked image layers
overlayfs snapshots
container writable layers
container runtime metadata

image content store
unpacked image layers
overlayfs snapshots
container writable layers
container runtime metadata

It can grow over time because of:

Kubernetes upgrades
application image upgrades
old unused images
temporary workloads
test deployments
large application images
failed or exited containers

Kubernetes upgrades
application image upgrades
old unused images
temporary workloads
test deployments
large application images
failed or exited containers

For example, if Kubernetes was upgraded several times, old images such as previous kube-apiserver, kube-controller-manager, kube-scheduler, and kube-proxy versions may still remain on the node.

Application images can also be large. Some images may be hundreds of MB or even close to 1 GB.

What is crictl?

crictl is a command-line tool for interacting with a Kubernetes container runtime through the CRI interface.

In a Kubernetes cluster using containerd, crictl can be used to inspect containers, images, pods, and runtime information.

It is similar in purpose to some Docker commands, but it talks to the CRI runtime used by Kubernetes.

For example:

Docker environment:
  docker images
  docker ps
  docker rmi

Kubernetes with containerd:
  crictl images
  crictl ps
  crictl rmi

Docker environment:
  docker images
  docker ps
  docker rmi

Kubernetes with containerd:
  crictl images
  crictl ps
  crictl rmi

Where Does crictl Come From?

The crictl command is provided by the cri-tools package.

You can verify it with:

which crictl
dpkg -S $(which crictl)
apt list --installed | grep cri-tools

which crictl
dpkg -S $(which crictl)
apt list --installed | grep cri-tools

Example output:

/usr/bin/crictl
cri-tools: /usr/bin/crictl
cri-tools/unknown,now 1.35.0-1.1 amd64 [installed,automatic]

/usr/bin/crictl
cri-tools: /usr/bin/crictl
cri-tools/unknown,now 1.35.0-1.1 amd64 [installed,automatic]

In my case, cri-tools was installed automatically.

To check which installed package depends on cri-tools:

apt-cache rdepends --installed cri-tools

apt-cache rdepends --installed cri-tools

Example output:

cri-tools
Reverse Depends:
  kubeadm
  kubeadm
  kubeadm
  kubeadm
  kubeadm

cri-tools
Reverse Depends:
  kubeadm
  kubeadm
  kubeadm
  kubeadm
  kubeadm

So the relationship is:

kubeadm
  -> depends on cri-tools
      -> provides crictl

kubeadm
  -> depends on cri-tools
      -> provides crictl

This means crictl was installed automatically because kubeadm depends on cri-tools.

List Images on the Node

List images stored by the container runtime:

sudo crictl images

sudo crictl images

Example output:

IMAGE                                                   TAG                 IMAGE ID            SIZE
docker.io/library/cassandra                             5.0.4               b59644e362a23       177MB
docker.io/library/postgres                              16                  b8c80b87c813a       160MB
docker.io/thingsboard/tb-node                           4.3.1.1             151155f480d54       934MB
quay.io/cephcsi/cephcsi                                 v3.16.1             23949dfd39865       860MB
registry.k8s.io/kube-apiserver                          v1.35.2             66108468ce512       27.7MB
registry.k8s.io/kube-apiserver                          v1.35.3             0f2b96c93465f       27.6MB
registry.k8s.io/kube-proxy                              v1.35.1             6521110cdb017       25.7MB
registry.k8s.io/kube-proxy                              v1.35.2             3c471cf273e44       25.7MB
registry.k8s.io/kube-proxy                              v1.35.3             53ed370019059       25.7MB

IMAGE                                                   TAG                 IMAGE ID            SIZE
docker.io/library/cassandra                             5.0.4               b59644e362a23       177MB
docker.io/library/postgres                              16                  b8c80b87c813a       160MB
docker.io/thingsboard/tb-node                           4.3.1.1             151155f480d54       934MB
quay.io/cephcsi/cephcsi                                 v3.16.1             23949dfd39865       860MB
registry.k8s.io/kube-apiserver                          v1.35.2             66108468ce512       27.7MB
registry.k8s.io/kube-apiserver                          v1.35.3             0f2b96c93465f       27.6MB
registry.k8s.io/kube-proxy                              v1.35.1             6521110cdb017       25.7MB
registry.k8s.io/kube-proxy                              v1.35.2             3c471cf273e44       25.7MB
registry.k8s.io/kube-proxy                              v1.35.3             53ed370019059       25.7MB

This shows images from:

Kubernetes system components
CNI components
CSI drivers
Ingress controllers
application workloads
old application versions
old Kubernetes versions

Kubernetes system components
CNI components
CSI drivers
Ingress controllers
application workloads
old application versions
old Kubernetes versions

Some old images may no longer be used by any running container.

Remove Exited Containers First

Before pruning unused images, check for exited containers:

sudo crictl ps -a --state Exited

sudo crictl ps -a --state Exited

Remove exited containers:

sudo crictl ps -a --state Exited -q | xargs -r sudo crictl rm

sudo crictl ps -a --state Exited -q | xargs -r sudo crictl rm

This removes stopped containers that are no longer running.

Remove Unused Images

To remove images that are not currently used by any container:

sudo crictl rmi --prune

sudo crictl rmi --prune

Then check disk usage again:

df -h
sudo du -xh --max-depth=1 /var/lib/containerd 2>/dev/null | sort -h

df -h
sudo du -xh --max-depth=1 /var/lib/containerd 2>/dev/null | sort -h

This is safer than manually deleting files under /var/lib/containerd.

Did this guide save you time?

Support this site

Leave a Comment Cancel Reply