Skip to content

NimTechnology

Trình bày các công nghệ CLOUD một cách dễ hiểu.

  • Kubernetes & Container
    • Docker
    • Kubernetes
      • Ingress
      • Pod
    • Helm Chart
    • OAuth2 Proxy
    • Isito-EnvoyFilter
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Vault
    • Longhorn – Storage
    • VictoriaMetrics
    • MetalLB
    • Kong Gateway
  • CI/CD
    • ArgoCD
    • ArgoWorkflows
    • Argo Events
    • Spinnaker
    • Jenkins
    • Harbor
    • TeamCity
    • Git
      • Bitbucket
  • Coding
    • DevSecOps
    • Terraform
      • GCP – Google Cloud
      • AWS – Amazon Web Service
      • Azure Cloud
    • Golang
    • Laravel
    • Python
    • Jquery & JavaScript
    • Selenium
  • Log, Monitor & Tracing
    • DataDog
    • Prometheus
    • Grafana
    • ELK
      • Kibana
      • Logstash
  • BareMetal
    • NextCloud
  • Toggle search form

[Script] Create a large file with a depth of 50 folders and 100,000 child files; the max size is 1GB.

Posted on December 11, 2023 By nim No Comments on [Script] Create a large file with a depth of 50 folders and 100,000 child files; the max size is 1GB.

To achieve a ZIP file size of approximately 1 GB, considering compression, is a bit challenging because the ZIP compression algorithm can significantly reduce the size of files with repetitive or simple content (like files filled with zeros). To get a ZIP file closer to 1 GB, you should use less compressible data. One approach is to use random data for file contents, which typically doesn’t compress well.

Here’s an updated version of the script that uses random data for the file contents:

####### python code
################

import os
import zipfile
import random
import math

def create_nested_directories(base_path, depth):
    for i in range(depth):
        current_path = os.path.join(base_path, *[f"folder_{j}" for j in range(i + 1)])
        os.makedirs(current_path, exist_ok=True)

def create_files_with_random_data(base_path, depth, total_files, total_size_gb):
    total_size_bytes = total_size_gb * 1024 * 1024 * 1024
    file_size = math.ceil(total_size_bytes / total_files)
    file_count = 0

    for i in range(depth):
        current_path = os.path.join(base_path, *[f"folder_{j}" for j in range(i + 1)])
        num_files_in_current_folder = total_files // depth

        for _ in range(num_files_in_current_folder):
            if file_count >= total_files:
                break

            file_path = os.path.join(current_path, f"file_{file_count}.txt")
            with open(file_path, 'wb') as f:
                f.write(bytearray(random.getrandbits(8) for _ in range(file_size)))
            file_count += 1

def zip_directory(zip_filename, dir_name):
    with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(dir_name):
            for file in files:
                zipf.write(os.path.join(root, file),
                           os.path.relpath(os.path.join(root, file),
                           os.path.join(dir_name, '..')))

# Parameters
base_dir = "nested_folders"
depth = 50
total_files = 100000
total_size_gb = 1

# Create nested directories
create_nested_directories(base_dir, depth)
# Create files with random data
create_files_with_random_data(base_dir, depth, total_files, total_size_gb)
# Compress into a ZIP file
zip_filename = "large_nested_structure.zip"
zip_directory(zip_filename, base_dir)

# Optional: Clean up by removing the directory structure
# import shutil
# shutil.rmtree(base_dir)

In this script, the create_files_with_random_data function fills each file with random bytes, which are less compressible. This should make the resulting ZIP file closer to 1 GB in size. However, note that generating random data for a large number of files can be time-consuming and computationally intensive.

Again, ensure you have enough disk space for this operation and remember that the cleanup step is commented out. You can uncomment it if you wish to delete the files and directories after the ZIP file is created.

How to check information on file:

To check the file size, count the depth of nested folders, and the number of child files within a ZIP file without actually unzipping it, you can use the unzip command along with other shell commands. Here’s a command that does that:

unzip -l your_zip_file.zip | awk 'BEGIN { FS = "[ \t\n]+" } { print $1 }' | tail -n 1

To count the depth of nested folders and the number of child files within the ZIP file, you can use the following commands:

unzip -l your_zip_file.zip | awk 'BEGIN { FS = "[/]" } { print NF-1 }' | sort -n | uniq -c

The output will display the depth of nested folders and the number of child files at each depth level within the ZIP archive.

BareMetal

Post navigation

Previous Post: [Coralogix] Filter or Custom log when using Coralogix.
Next Post: [Vector by DataDog] Use Vector to parse and convert logs to anything.

More Related Articles

[Ubuntu/NAT] Configure NAT on Ubuntu Server BareMetal
[WordPress] Fix the “Another Update is Currently in Progress” Error in WordPress BareMetal
[OpenVPN] Why is not Android working with the DNS of the OpenVPN? BareMetal
[Kafka] Kafka Topics CLI Apache Kafka
[Node exporter] Install node_exporter on Linux by a script file! BareMetal
[Postgresql] Install postgresql client and trying a few command postgresql. BareMetal

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tham Gia Group DevOps nhé!
Để Nim có nhiều động lực ra nhiều bài viết.
Để nhận được những thông báo mới nhất.

Recent Posts

  • [Azure] The subscription is not registered to use namespace ‘Microsoft.ContainerService’ May 8, 2025
  • [Azure] Insufficient regional vcpu quota left May 8, 2025
  • [WordPress] How to add a Dynamic watermark on WordPress. May 6, 2025
  • [vnet/Azure] VNet provisioning via Terraform. April 28, 2025
  • [tracetcp] How to perform a tracert command using a specific port. April 3, 2025

Archives

  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021

Categories

  • BareMetal
    • NextCloud
  • CI/CD
    • Argo Events
    • ArgoCD
    • ArgoWorkflows
    • Git
      • Bitbucket
    • Harbor
    • Jenkins
    • Spinnaker
    • TeamCity
  • Coding
    • DevSecOps
    • Golang
    • Jquery & JavaScript
    • Laravel
    • NextJS 14 & ReactJS & Type Script
    • Python
    • Selenium
    • Terraform
      • AWS – Amazon Web Service
      • Azure Cloud
      • GCP – Google Cloud
  • Kubernetes & Container
    • Apache Kafka
      • Kafka
      • Kafka Connect
      • Lenses
    • Docker
    • Helm Chart
    • Isito-EnvoyFilter
    • Kong Gateway
    • Kubernetes
      • Ingress
      • Pod
    • Longhorn – Storage
    • MetalLB
    • OAuth2 Proxy
    • Vault
    • VictoriaMetrics
  • Log, Monitor & Tracing
    • DataDog
    • ELK
      • Kibana
      • Logstash
    • Fluent
    • Grafana
    • Prometheus
  • Uncategorized
  • Admin

Copyright © 2025 NimTechnology.