Skip to content

NimTechnology

Trình bày các công nghệ CLOUD một cách dễ hiểu.

  • Kubernetes & Container
    • Docker
    • Kubernetes
      • Gateway API
      • Ingress
      • Pod
    • Helm Chart
    • OAuth2 Proxy
    • Isito-EnvoyFilter
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Vault
    • Longhorn – Storage
    • VictoriaMetrics
    • MetalLB
    • Kong Gateway
  • CI/CD
    • ArgoCD
    • ArgoWorkflows
    • Argo Events
    • Spinnaker
    • Jenkins
    • Harbor
    • TeamCity
    • Git
      • Bitbucket
  • Coding
    • DevSecOps
    • Terraform
      • GCP – Google Cloud
      • AWS – Amazon Web Service
      • Azure Cloud
    • Golang
    • Laravel
    • Python
    • Jquery & JavaScript
    • Selenium
  • Log, Monitor & Tracing
    • DataDog
    • Prometheus
    • Grafana
    • ELK
      • Kibana
      • Logstash
  • BareMetal
    • NextCloud
  • Toggle search form

When K8s pods are stuck mounting large volumes

Posted on April 7, 2023 By nim No Comments on When K8s pods are stuck mounting large volumes

refer:
https://blog.devgenius.io/when-k8s-pods-are-stuck-mounting-large-volumes-2915e6656cb8

Recently we ran into the following problem with our Loki deployment on AWS/EKS. On every deployment or restart of a Loki Pod, mounting the persistent volume took longer and longer. It started with a few minutes delay and ended up with nearly 25 minutes on our production cluster. Having no solution for this we avoided new deployments if possible, knowing this was not an acceptable workaround.

Events:
Type Reason Age From Message
— — — — — — — — — — — — -
Normal Scheduled 23m50s default-scheduler Successfully assigned default/filecr34t0r-0 to ip-100–64–8–204.eu-central-1.compute.internal
Normal SuccessfulAttachVolume 23m48s attachdetach-controller AttachVolume.Attach succeeded for volume “pvc-ef3366b8-464c-11ed-b878-0242ac120002”
Warning FailedMount 5m43s (x6 over 18m) kubelet Unable to attach or mount volumes: unmounted volumes=[vol], unattached volumes=[vol kube-api-access-7wzcs]: timed out waiting for the condition
Normal Pulled 106s kubelet Container image “grafana/loki:2.6.1” already present on machine
Normal Created 106s kubelet Created container loki
Normal Started 106s kubelet Started container loki

Then I began to investigate the matter. On test and prod we use automatic provisioned gp3 volumes. AWS volume monitor showed heavy I/O activities during the mount time. The volume on test had about 1.3 million files and the mount took about 7 minutes. On prod the volume had 4.3 million and needed 24 minutes to mount. Ok, it seems to correlate to the number of files. With the gp3´s 3000 iOPs we can do the following calculation:

  • Test: 1300000/3000/60 = 7.2 minutes
  • Prod 4300000/3000/60 = 23.8 minutes

By searching K8s docs and blogs I found the solution: Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod’s securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup.

With the fsGroupChangePolicy field inside a securityContext you can control the way that Kubernetes checks and manages ownership and permissions for a volume. Possible values:

  • OnRootMismatch: Only change permissions and ownership if the permission and the ownership of root directory does not match with expected permissions of the volume. This could help shorten the time it takes to change ownership and permission of a volume.
  • Always: Always change permission and ownership of the volume when volume is mounted.
template:
spec:
containers:
...
securityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
fsGroupChangePolicy: "OnRootMismatch"

With this modification the startup of our Loki instance changed back to below two minutes.

This all only applies if your Deployment or StatefulSet has configured a securityContext, which you hopefully have done. 😉

Addendum: The huge number of files resulted from Loki producing many chunks and this was because of a too liberal use of custom labels. We reduced the number of labels in the meantime and as the number of files is shrinking the query in Grafana gets faster too.

Pod

Post navigation

Previous Post: [Argo-Workflows] Lesson6: Output Parameter File
Next Post: [Argo-Workflows] Lesson7: Artifact

More Related Articles

[K8S] Windows pod disk size is insufficient. AWS - Amazon Web Service

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tham Gia Group DevOps nhé!
Để Nim có nhiều động lực ra nhiều bài viết.
Để nhận được những thông báo mới nhất.

Recent Posts

  • [Rancher/EKS] Rancher from v2.12.x can not work on eks cluster. April 15, 2026
  • [Telegram/Openclaw] Configure openclaw bot in a Telegram group. March 31, 2026
  • Tutorial: Gateway API + Traefik + oauth2-proxy (Microsoft Entra ID) March 30, 2026
  • Full + incremental backup: When restoring, do deleted files come back? March 27, 2026
  • [K8S] Create long-lived kubeconfig on k8s March 23, 2026

Archives

  • April 2026
  • March 2026
  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021

Categories

  • AI
    • OpenClaw
  • BareMetal
    • NextCloud
  • CI/CD
    • Argo Events
    • ArgoCD
    • ArgoWorkflows
    • Git
      • Bitbucket
    • Harbor
    • Jenkins
    • Spinnaker
    • TeamCity
  • Coding
    • DevSecOps
    • Golang
    • Jquery & JavaScript
    • Laravel
    • NextJS 14 & ReactJS & Type Script
    • Python
    • Selenium
    • Terraform
      • AWS – Amazon Web Service
      • Azure Cloud
      • GCP – Google Cloud
  • Kubernetes & Container
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Docker
    • Helm Chart
    • Isito-EnvoyFilter
    • Kong Gateway
    • Kubernetes
      • Gateway API
      • Ingress
      • Pod
    • Longhorn – Storage
    • MetalLB
    • OAuth2 Proxy
    • Vault
    • VictoriaMetrics
  • Log, Monitor & Tracing
    • DataDog
    • ELK
      • Kibana
      • Logstash
    • Fluent
    • Grafana
    • Prometheus
  • Uncategorized
  • Admin

Copyright © 2026 NimTechnology.