Skip to content

NimTechnology

Trình bày các công nghệ CLOUD một cách dễ hiểu.

  • Kubernetes & Container
    • Docker
    • Kubernetes
      • Ingress
      • Pod
    • Helm Chart
    • OAuth2 Proxy
    • Isito-EnvoyFilter
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Vault
    • Longhorn – Storage
    • VictoriaMetrics
    • MetalLB
    • Kong Gateway
  • CI/CD
    • ArgoCD
    • ArgoWorkflows
    • Argo Events
    • Spinnaker
    • Jenkins
    • Harbor
    • TeamCity
    • Git
      • Bitbucket
  • Coding
    • DevSecOps
    • Terraform
      • GCP – Google Cloud
      • AWS – Amazon Web Service
      • Azure Cloud
    • Golang
    • Laravel
    • Python
    • Jquery & JavaScript
    • Selenium
  • Log, Monitor & Tracing
    • DataDog
    • Prometheus
    • Grafana
    • ELK
      • Kibana
      • Logstash
  • BareMetal
    • NextCloud
  • Toggle search form

[AWS] Using FSx Lustre to enhance disk performance for large-scale applications.

Posted on December 8, 2024December 17, 2024 By nim No Comments on [AWS] Using FSx Lustre to enhance disk performance for large-scale applications.

Contents

Toggle
  • 1) Understand FSx Lustre to support EKS.
  • 2) Provision Dynamic Fsx Lustre Volume.
    • 2.1) Install FSx Lustre for EKS by Terraform.
    • 2.2) Install FSx Lustre by Terraform module:
  • Predict Cost
    • 1. Storage Cost
    • 2. Throughput Cost
      • Cost for 1000 MB/s throughput:
    • Total Monthly Cost
    • Key Notes:
  • How do we increase the throughput?

1) Understand FSx Lustre to support EKS.

Amazon FSx for Lustre is designed for high performance and its throughput and IOPS scale with the storage capacity you provision. This means there aren’t fixed maximums like with EFS. Instead, you get more throughput and IOPS as you increase your storage

cũng như EFS, FSx Lustre cũng là giải pháp Dynamic Volume Provisioning và Hỗ trợ ReadWriteMany trên EKS.

Bạn cần có sẵn 1 subnet để thực hiện provide cho FSx Lustre.

Tiếp theo bạn cần tạo security Group cho FSx Lustre.
Bạn sẽ cần than khảo ở trang này:
https://docs.aws.amazon.com/fsx/latest/LustreGuide/limit-access-security-groups.html

2) Provision Dynamic Fsx Lustre Volume.

2.1) Install FSx Lustre for EKS by Terraform.

data "aws_vpc" "example" {
  id = var.vpc_id  # replace with your VPC ID
}

resource "aws_security_group" "fsx_sg" {
  count = var.fsx_security_group_ids == "" ? 1 : 0

  name        = "${local.name}fsx-lustre-sg"
  description = "Security group for FSx Lustre file system"
  vpc_id      = var.vpc_id  # Ensure the correct VPC ID is passed
}

# Ingress Rules - Allow FSx Lustre traffic (port 988 and 1018-1023)
resource "aws_security_group_rule" "fsx_ingress" {
  count = var.fsx_security_group_ids == "" ? 1 : 0

  type        = "ingress"
  from_port   = 988
  to_port     = 988
  protocol    = "tcp"
  cidr_blocks = [data.aws_vpc.example.cidr_block]
  security_group_id = aws_security_group.fsx_sg[0].id

  description = "Allow FSx Lustre traffic on port 988"
}

resource "aws_security_group_rule" "fsx_ingress_1018_1023" {
  count = var.fsx_security_group_ids == "" ? 1 : 0

  type        = "ingress"
  from_port   = 1018
  to_port     = 1023
  protocol    = "tcp"
  cidr_blocks = [data.aws_vpc.example.cidr_block]
  security_group_id = aws_security_group.fsx_sg[0].id

  description = "Allow FSx Lustre traffic on ports 1018-1023"
}

# Default Egress Rule - Allow all outbound traffic (recommended for most use cases)
resource "aws_security_group_rule" "fsx_egress" {
  count = var.fsx_security_group_ids == "" ? 1 : 0

  type        = "egress"
  from_port   = 0
  to_port     = 0
  protocol    = "-1"
  cidr_blocks = ["0.0.0.0/0"]
  security_group_id = aws_security_group.fsx_sg[0].id

  description = "Allow all outbound traffic"
}

Tiếp theo thì bạn cần tạo 1 cài IAM Role để cung cấp cho FSx Lustre controller trên EKS.

resource "aws_iam_role" "fsx_csi_driver_role" {
  name = "${local.name}-fsx-lustre-csi-iam-role"

  # Terraform's "jsonencode" function converts a Terraform expression result to valid JSON syntax.
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRoleWithWebIdentity"
        Effect = "Allow"
        Sid    = ""
        Principal = {
          Federated = "arn:aws:iam::1111111111111:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/0AF3F4F3F1111111102737E3"
        }
        Condition = {
          StringEquals = {            
            "oidc.eks.us-west-2.amazonaws.com/id/0AF3F4F3F1111111102737E3:sub": "system:serviceaccount:kube-system:fsx-csi-controller-sa"
          }
        }        

      },
    ]
  })
}

resource "aws_iam_policy_attachment" "fsx_full_access" {
  name       = "fsx-full-access"
  policy_arn = "arn:aws:iam::aws:policy/AmazonFSxFullAccess"
  roles      = [aws_iam_role.fsx_csi_driver_role.name]
}

output "fsx_lustre_csi_iam_role_arn" {
  description = "EBS CSI IAM Role ARN"
  value = aws_iam_role.fsx_csi_driver_role.arn
}

Bạn sẽ thấy chúng ta cấu hình IRSA để cấp quyền cho deployment có service account là fsx-csi-controller-sa trong namespace kube-system sẽ có quyền là AmazonFSxFullAccess
-> Mục đích là khi bạn tạo 1 PVC thì fsx controller sẽ tạo cho bạn 1 FSx clustre storage và mount nó vào pod cho bạn.

Tiếp đến là bạn thực hiện cái FSx lustre thông qua helm chart:

# Install EBS CSI Driver using HELM
# Resource: Helm Release 
resource "helm_release" "fsx_lustre_csi_driver" {
  depends_on = [aws_iam_role.fsx_csi_driver_role]
  name       = "fsx-csi-driver-role"
  repository = "https://kubernetes-sigs.github.io/aws-fsx-csi-driver"
  chart      = "aws-fsx-csi-driver"
  namespace = "kube-system"     
      
  set {
    name  = "controller.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = "${aws_iam_role.fsx_csi_driver_role.arn}"
  }
}

Bạn cần chú ý là bạn cần khai bào role name mà bạn đã tạo ở bước trước đó.

sau khi bài cài đặt thành công bạn sẽ thấy 2 thành phần chính đó là
FSx lustre Controller

Và FSX lustre CSI drive

Bược tiếp theo bạn cần apply storage class

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
  subnetId: subnet-0dba56be4dbe20f22
  securityGroupIds: sg-09d12a22b71cb7af5
  deploymentType: PERSISTENT_2
  automaticBackupRetentionDays: "1"
  dailyAutomaticBackupStartTime: "00:00"
  copyTagsToBackups: "true"
  perUnitStorageThroughput: "1000"
  dataCompressionType: "NONE"
  weeklyMaintenanceStartTime: "7:09:00"
  fileSystemTypeVersion: "2.15"
  extraTags: "Service=FSX-Lustre"
mountOptions:
  - flock

Chúng ta cần tìm hiểu 1 số option cần quan trọng trong parameters:


deploymentType: Defines the type of file system deployment. Options include:

  • PERSISTENT_1 or PERSISTENT_2: Persistent storage for longer-term use with data replication.
  • SCRATCH_1 or SCRATCH_2: Temporary storage for high-performance, short-term use.

Bạn có thể tham khảo thêm tài liệu ở đây:
https://docs.aws.amazon.com/fsx/latest/LustreGuide/using-fsx-lustre.html

automaticBackupRetentionDays: Specifies the number of days automatic backups should be retained.
Nếu bạn không muốn backup thì để value là 0

copyTagsToBackups: Determines whether the file system’s tags are copied to its backups
– true: Tags are copied.
– false: Tags are not copied

perUnitStorageThroughput: Specifies the throughput in MB/s per TiB of storage. Options are typically 125, 250, 500, 1000 MB/s per TiB.
Điều này có nghĩa là nếu minimum storage sẽ là 1 TiB
nếu bạn muốn tăng tốc độ của storage thì bạn nhất định phải tăng volume size!

dataCompressionType: Specifies the compression type for data stored in the file system. Options:

  • NONE: No compression.
  • LZ4: Lightweight LZ4 compression.

Tiếp theo bạn cần tạo 1 Persistent Volume Claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fsx-claim
  namespace: default
spec:
  accessModes:
    - ReadWriteMany  # Or your desired access mode
  resources:
    requests:
      storage: 1000Gi  # The storage size you're requesting
  storageClassName: fsx-sc

Tiếp đến bạn có thể tạo 1 pod để kiểm tra PersistentVolumeClaim được tạo trên k8s có work được với FSx Lustre hay không?

apiVersion: v1
kind: Pod
metadata:
  name: fsx-lustre-write-app
spec:
  containers:
    - name: fsx-lustre-write-app
      image: busybox
      command:
        - "/bin/sh"
      args:
        - "-c"
        - "while true; do echo EFS Kubernetes Static Provisioning Test $(date -u) >> /data/fsx-lustre-static.txt; sleep 5; done"
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: fsx-claim

2.2) Install FSx Lustre by Terraform module:

Để đơn giản cho việc deploy FSx Lustre bạn có thể sử dụng terraform module như ví dụ bên dưới:

data "aws_eks_cluster" "eks" {
  name = var.cluster_id
}

data "aws_vpc" "selected" {
  tags = {
    Name = "vpc_name" # Replace with your VPC's tag name
  }
}

data "aws_subnets" "private_networks" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.selected.id]
  }

  filter {
    name   = "tag:Name"
    values = ["staging-nim-engine-private-us-west-2a"]
  }
}

module "eks-fsx-lustre-csi" {
  source  = "aws-terraform-module/eks-fsx-lustre-csi/aws"
  version = "0.0.1"
  aws_region = var.aws_region
  environment = var.environment
  vpc_id = data.aws_vpc.selected.id
  fsx_subnet_id = data.aws_subnets.private_networks.ids[0]
  product_name  = var.product_name
  eks_cluster_certificate_authority_data = data.aws_eks_cluster.eks.certificate_authority[0].data
  eks_cluster_endpoint = data.aws_eks_cluster.eks.endpoint
  eks_cluster_name  = var.cluster_id
  aws_iam_openid_connect_provider_arn = "arn:aws:iam::${element(split(":", "${data.aws_eks_cluster.eks.arn}"), 4)}:oidc-provider/${element(split("//", "${data.aws_eks_cluster.eks.identity[0].oidc[0].issuer}"), 1)}"
}

Predict Cost

To estimate the cost of running an FSx Lustre file system with 1 TiB of storage and 1000 MB/s throughput, you need to consider two key cost components:

  1. Storage cost
  2. Throughput cost

1. Storage Cost

Amazon FSx Lustre charges for storage based on the amount of data stored. Pricing for FSx Lustre storage is typically per TiB/month.

  • FSx Lustre storage cost (persistent storage): As of the most recent data, Amazon charges approximately $0.13 per GB/month for persistent storage (PERSISTENT_2 deployment type).
  • 1 TiB = 1024 GiB = 1024 * 1024 MB

So, for 1 TiB: 1 TiB=1024 GiB=1024×1024 MB=1048576 MB1 \text{ TiB} = 1024 \text{ GiB} = 1024 \times 1024 \text{ MB} = 1048576 \text{ MB}

The storage cost for 1 TiB: Storage cost=1024 GiB×0.13 USD/GB=1024×0.13=133.12 USD/month\text{Storage cost} = 1024 \text{ GiB} \times 0.13 \text{ USD/GB} = 1024 \times 0.13 = 133.12 \text{ USD/month}

2. Throughput Cost

Amazon FSx Lustre also charges for throughput provisioned. In this case, you are requesting 1000 MB/s throughput.

  • The cost for throughput capacity is typically around $0.30 per MB/s/day.

Cost for 1000 MB/s throughput:

Throughput cost/day=1000 MB/s×0.30 USD/MB/s/day=300 USD/day\text{Throughput cost/day} = 1000 \text{ MB/s} \times 0.30 \text{ USD/MB/s/day} = 300 \text{ USD/day}

Since there are approximately 30 days in a month: Throughput cost/month=300 USD/day×30=9000 USD/month\text{Throughput cost/month} = 300 \text{ USD/day} \times 30 = 9000 \text{ USD/month}

Total Monthly Cost

Adding up both the storage and throughput costs:

  • Storage cost: $133.12 per month
  • Throughput cost: $9000 per month

Thus, the total estimated monthly cost would be: Total cost/month=133.12 USD+9000 USD=9133.12 USD/month\text{Total cost/month} = 133.12 \text{ USD} + 9000 \text{ USD} = 9133.12 \text{ USD/month}

Key Notes:

  • These are rough estimates based on standard pricing; actual costs can vary depending on factors like region, discounts, or specific pricing agreements with AWS.
  • FSx Lustre’s pricing might change over time, so always check the AWS FSx Pricing page for the latest rates.

How do we increase the throughput?

FSx for Lustre offers 1000 MB/s throughput trên 1TiB
Điều này nghĩa là khi bạn muốn tăng throughput của FSx Lustre bạn phải mua thêm storages

dưới đây là monitor khi mình mua 1TiB storage FSx Lustre

Và đây là sau khi mình mua 2TiB storage FSx Lustre

Từ đây chúng ta có thể assume rằng FSx Lustre đã gắn thêm 1 ổ đĩa nữa để tăng performance cho FSx Lustre.

AWS - Amazon Web Service

Post navigation

Previous Post: [Rancher/API] Edit k8s resources using the Rancher API.
Next Post: [AWS] Filtering Subnets in Different Availability Zones for EFS Mount Targets with Terraform

More Related Articles

[AWS] Pull images from ECR AWS - Amazon Web Service
[IP/EKS] Add new subnets into the eks on AWS AWS - Amazon Web Service
[Metrics Server] Install metrics-server on Kubernetes. AWS - Amazon Web Service
[AWS] Deploying Redis on AWS AWS - Amazon Web Service
[EKS/Pods] Why can not the pod on EKS call http://169.254.169.254/latest/api/token AWS - Amazon Web Service
[Monitoring] How to monitor EBS on AWS via Prometheus AWS - Amazon Web Service

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tham Gia Group DevOps nhé!
Để Nim có nhiều động lực ra nhiều bài viết.
Để nhận được những thông báo mới nhất.

Recent Posts

  • [Laravel] Laravel Helpful June 26, 2025
  • [VScode] Hướng dẫn điều chỉnh font cho terminal June 20, 2025
  • [WordPress] Hướng dấn gửi mail trên WordPress thông qua gmail. June 15, 2025
  • [Bitbucket] Git Clone/Pull/Push with Bitbucket through API Token. June 12, 2025
  • [Teamcity] How to transfer the value from pipeline A to pipeline B June 9, 2025

Archives

  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021

Categories

  • BareMetal
    • NextCloud
  • CI/CD
    • Argo Events
    • ArgoCD
    • ArgoWorkflows
    • Git
      • Bitbucket
    • Harbor
    • Jenkins
    • Spinnaker
    • TeamCity
  • Coding
    • DevSecOps
    • Golang
    • Jquery & JavaScript
    • Laravel
    • NextJS 14 & ReactJS & Type Script
    • Python
    • Selenium
    • Terraform
      • AWS – Amazon Web Service
      • Azure Cloud
      • GCP – Google Cloud
  • Kubernetes & Container
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Docker
    • Helm Chart
    • Isito-EnvoyFilter
    • Kong Gateway
    • Kubernetes
      • Ingress
      • Pod
    • Longhorn – Storage
    • MetalLB
    • OAuth2 Proxy
    • Vault
    • VictoriaMetrics
  • Log, Monitor & Tracing
    • DataDog
    • ELK
      • Kibana
      • Logstash
    • Fluent
    • Grafana
    • Prometheus
  • Uncategorized
  • Admin

Copyright © 2025 NimTechnology.