Skip to content

NimTechnology

Trình bày các công nghệ CLOUD một cách dễ hiểu.

  • Kubernetes & Container
    • Docker
    • Kubernetes
      • Ingress
      • Pod
    • Helm Chart
    • OAuth2 Proxy
    • Isito-EnvoyFilter
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Vault
    • Longhorn – Storage
    • VictoriaMetrics
    • MetalLB
    • Kong Gateway
  • CI/CD
    • ArgoCD
    • ArgoWorkflows
    • Argo Events
    • Spinnaker
    • Jenkins
    • Harbor
    • TeamCity
    • Git
      • Bitbucket
  • Coding
    • DevSecOps
    • Terraform
      • GCP – Google Cloud
      • AWS – Amazon Web Service
      • Azure Cloud
    • Golang
    • Laravel
    • Python
    • Jquery & JavaScript
    • Selenium
  • Log, Monitor & Tracing
    • DataDog
    • Prometheus
    • Grafana
    • ELK
      • Kibana
      • Logstash
  • BareMetal
    • NextCloud
  • Toggle search form

[Monitoring] How to monitor EBS on AWS via Prometheus

Posted on July 11, 2023July 18, 2023 By nim No Comments on [Monitoring] How to monitor EBS on AWS via Prometheus

Như các bạn cũng đã biết thì Prometheus không thể collect trực tiếp metrics của Cloudwatch.

Mình sẽ cài 1 thằng là cloudwatch-exporter để collect metrics.
Bạn có thể sử dụng bài viết bên dưới

[DataDog] How does DataDog collect metrics from the Prometheus exporter endpoint

Contents

Toggle
  • Look into the helpful metrics.
    • Monitoring Latency
    • Find out IOPS and Throughput
  • Dashboard Grafana to monitoring EBS-AWS
  • Look into Durability of EBS.
  • What is Amazon EBS Multi-attach

Look into the helpful metrics.

Bài này mình sẽ tập trung nhiều vào việc là tìm ra các metrics có ích cho việc monitor EBS

VolumeReadOps: This metric gives you the total number of read operations from an EBS volume in a specified period of time. Sudden changes could indicate a change in the behavior of the applications using the volume.

VolumeWriteOps: This provides the total number of write operations to an EBS volume. An increase in this metric might suggest higher usage or possibly an issue with your application writing more data than expected.

VolumeReadBytes and VolumeWriteBytes: These metrics represent the total bytes read or written from an EBS volume. These metrics help you understand the amount of data being read or written over time.
 
VolumeTotalReadTime and VolumeTotalWriteTime: These metrics give you the total time taken for read and write operations. If these values are increasing, it could indicate a performance issue.
 
VolumeQueueLength: This is an important metric that provides the number of read and write operation requests waiting to be completed. A high queue length might indicate that your volume’s performance is being maxed out.
 
VolumeIdleTime: This metric gives you the total time that the volume was idle. This can help you identify volumes that are being under-utilized.

Monitoring Latency

We can measure Amazon EBS storage latency using the metrics VolumeTotalReadTime and VolumeTotalWriteTime. We use a formula to plot the total IO time spent to see changes, especially peaks to isolate the cause of latencies.

The formula: (VolumeTotalReadTime + VolumeTotalWriteTime) / (VolumeReadOps + VolumeWriteOps)

The result of this calculation would give you the average time spent per operation, effectively measuring latency. If you see peaks or fluctuations in this value over time, it could indicate potential issues causing latency in your EBS storage.

Also, remember to take the unit of these metrics into account. VolumeTotalReadTime and VolumeTotalWriteTime are measured in seconds, while VolumeReadOps and VolumeWriteOps are count metrics. So, the calculated average latency will be in seconds per operation.

Một cách dễ hiểu hơn:

“Seconds per operation” is a way of measuring how long, on average, each individual operation (like a read or write operation) takes to complete.

Think of it like this: Imagine you’re a postman delivering letters. You have a bag of 100 letters that you need to deliver. If it takes you 200 seconds to deliver all 100 letters, then the average time it took you to deliver one letter (one “operation”) would be 200 seconds / 100 letters = 2 seconds per letter.

In the context of your EBS storage, each “letter” is a read or write operation, and the “time to deliver a letter” is the time it takes to complete that operation. So if the result of the formula is, say, 0.005 seconds per operation, that means each read or write operation takes, on average, 0.005 seconds to complete.

High values mean operations are taking longer to complete, which can be a sign of latency or performance issues. If this value is low, it means your operations are completing quickly, which generally indicates good performance.

Find out IOPS and Throughput

Bides:

IOPS = (VolumeReadOps + VolumeWriteOps) / Time [seconds]
Throughput KBps = (VolumeReadBytes + VolumeWriteBytes ) / Time [seconds]

thường ở đây thì time thì sẽ thay bằng 60

STG306_Best-practices-for-EBS-volumes-and-performance-monitoring-using-CloudWatch-

Dashboard Grafana to monitoring EBS-AWS

Đây là Grafana chart để monitor EBS trên AWS

https://gist.github.com/mrnim94/c3e7684fb2435267f21a9caed65e0946

Look into Durability of EBS.

Durability in the context of Amazon Elastic Block Store (EBS) refers to how well your data is protected from being lost. When you store data on an EBS volume, Amazon automatically creates multiple copies of that data behind the scenes. These copies are all stored within the same Amazon “availability zone” (think of it as a data center location), to help ensure that if any single piece of hardware fails, your data is still safe and accessible.

The durability of EBS volumes is mainly due to the fact that each EBS volume is automatically replicated within its Availability Zone to protect you from component failure, offering high availability and durability. That said, EBS volumes are not immune to all forms of failure and AWS recommends regular snapshots for even better backup redundancy.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html

Nếu tôi sài GP3 có làm giảm Ability của my product

Amazon EBS volumes are designed to be highly available and reliable. As per the data you provided, the durability of EBS General Purpose SSD (gp3) volumes is between 99.8% and 99.9%. This means that if you have 10,000 EBS volumes running continuously for one year, you should expect a failure in around 10 to 20 volumes (0.1% to 0.2%).

  • A durability of 99.8% suggests an annual failure rate of 0.2%. This means that, statistically speaking, 2 out of every 1000 gp3 volumes could be expected to fail in a year.
  • Similarly, a durability of 99.9% suggests an annual failure rate of 0.1%. In this case, statistically, 1 out of every 1000 gp3 volumes could be expected to fail in a year.

How to recognize the issues regarding the durability

a sudden change in read/write latency or IOPS could indicate a problem with the volume. If an EBS volume fails, it would likely become unresponsive or its attached EC2 instance would experience issues.

What is Amazon EBS Multi-attach

Amazon EBS Multi-Attach is a feature that allows you to attach a single Provisioned IOPS SSD (io1 or io2) volume to multiple EC2 instances in the same Availability Zone. This means that the same EBS volume can be attached to up to 16 EC2 instances.

EBS Multi-Attach can be useful in scenarios where you want to achieve higher availability or need concurrent read/write operations from multiple instances to your data.

However, there are some important considerations while using EBS Multi-Attach:

  1. Your instances and EBS volumes must be within the same Availability Zone.
  2. Only certain EC2 instance types support Multi-Attach.
  3. The operating system of the instances needs to have a cluster-aware file system (like GFS2 for Linux) that manages concurrent access well. Traditional file systems like ext4 or XFS are not designed for multiple servers to write data simultaneously and may corrupt the data.
  4. While Multi-Attach allows you to attach a volume to multiple instances, it doesn’t provide built-in data replication across Availability Zones or regions.
  5. You can’t create snapshots of a volume that’s enabled for Multi-Attach while it’s attached to more than one instance.
  6. Multi-Attach currently only works with Provisioned IOPS SSD (io1 or io2) volumes.

As always, it’s a good practice to review the AWS documentation for the latest information and detailed instructions when working with EBS Multi-Attach.

AWS - Amazon Web Service, Prometheus

Post navigation

Previous Post: [Zeus] Retention Project
Next Post: [Grafana] How to publish Grafana Dashboard for the community.

More Related Articles

[Golang / EKS] Accessing AWS EKS with Go: A Comprehensive Guide to Interacting with Kubernetes APIs AWS - Amazon Web Service
[Kaniko/Bitbucket/ECR] Accomplish the workflow: CI by bitbucket pipeline, Kaniko build image and push image to ECR AWS - Amazon Web Service
[Ec2] How to reset the forgotten password on EC2 AWS - Amazon Web Service
[Prometheus] Relabelling – Đưa thông tin từ Discovered Label sang target label Log, Monitor & Tracing
[Assume Role/KMS] Using Assume Role to make the other AWS Account access all KMS that/this AWS Account. AWS - Amazon Web Service
[AWS] Configure Ingress SSL and SSL Redirect on Load Balancer Controller AWS - Amazon Web Service

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tham Gia Group DevOps nhé!
Để Nim có nhiều động lực ra nhiều bài viết.
Để nhận được những thông báo mới nhất.

Recent Posts

  • [Azure] The subscription is not registered to use namespace ‘Microsoft.ContainerService’ May 8, 2025
  • [Azure] Insufficient regional vcpu quota left May 8, 2025
  • [WordPress] How to add a Dynamic watermark on WordPress. May 6, 2025
  • [vnet/Azure] VNet provisioning via Terraform. April 28, 2025
  • [tracetcp] How to perform a tracert command using a specific port. April 3, 2025

Archives

  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021

Categories

  • BareMetal
    • NextCloud
  • CI/CD
    • Argo Events
    • ArgoCD
    • ArgoWorkflows
    • Git
      • Bitbucket
    • Harbor
    • Jenkins
    • Spinnaker
    • TeamCity
  • Coding
    • DevSecOps
    • Golang
    • Jquery & JavaScript
    • Laravel
    • NextJS 14 & ReactJS & Type Script
    • Python
    • Selenium
    • Terraform
      • AWS – Amazon Web Service
      • Azure Cloud
      • GCP – Google Cloud
  • Kubernetes & Container
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Docker
    • Helm Chart
    • Isito-EnvoyFilter
    • Kong Gateway
    • Kubernetes
      • Ingress
      • Pod
    • Longhorn – Storage
    • MetalLB
    • OAuth2 Proxy
    • Vault
    • VictoriaMetrics
  • Log, Monitor & Tracing
    • DataDog
    • ELK
      • Kibana
      • Logstash
    • Fluent
    • Grafana
    • Prometheus
  • Uncategorized
  • Admin

Copyright © 2025 NimTechnology.