To achieve a ZIP file size of approximately 1 GB, considering compression, is a bit challenging because the ZIP compression algorithm can significantly reduce the size of files with repetitive or simple content (like files filled with zeros). To get a ZIP file closer to 1 GB, you should use less compressible data. One approach is to use random data for file contents, which typically doesn’t compress well.
Here’s an updated version of the script that uses random data for the file contents:
####### python code
################
import os
import zipfile
import random
import math
def create_nested_directories(base_path, depth):
for i in range(depth):
current_path = os.path.join(base_path, *[f"folder_{j}" for j in range(i + 1)])
os.makedirs(current_path, exist_ok=True)
def create_files_with_random_data(base_path, depth, total_files, total_size_gb):
total_size_bytes = total_size_gb * 1024 * 1024 * 1024
file_size = math.ceil(total_size_bytes / total_files)
file_count = 0
for i in range(depth):
current_path = os.path.join(base_path, *[f"folder_{j}" for j in range(i + 1)])
num_files_in_current_folder = total_files // depth
for _ in range(num_files_in_current_folder):
if file_count >= total_files:
break
file_path = os.path.join(current_path, f"file_{file_count}.txt")
with open(file_path, 'wb') as f:
f.write(bytearray(random.getrandbits(8) for _ in range(file_size)))
file_count += 1
def zip_directory(zip_filename, dir_name):
with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
for root, dirs, files in os.walk(dir_name):
for file in files:
zipf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(dir_name, '..')))
# Parameters
base_dir = "nested_folders"
depth = 50
total_files = 100000
total_size_gb = 1
# Create nested directories
create_nested_directories(base_dir, depth)
# Create files with random data
create_files_with_random_data(base_dir, depth, total_files, total_size_gb)
# Compress into a ZIP file
zip_filename = "large_nested_structure.zip"
zip_directory(zip_filename, base_dir)
# Optional: Clean up by removing the directory structure
# import shutil
# shutil.rmtree(base_dir)
In this script, the create_files_with_random_data function fills each file with random bytes, which are less compressible. This should make the resulting ZIP file closer to 1 GB in size. However, note that generating random data for a large number of files can be time-consuming and computationally intensive.
Again, ensure you have enough disk space for this operation and remember that the cleanup step is commented out. You can uncomment it if you wish to delete the files and directories after the ZIP file is created.
How to check information on file:
To check the file size, count the depth of nested folders, and the number of child files within a ZIP file without actually unzipping it, you can use the unzip command along with other shell commands. Here’s a command that does that:
unzip -l your_zip_file.zip | awk 'BEGIN { FS = "[ \t\n]+" } { print $1 }' | tail -n 1
To count the depth of nested folders and the number of child files within the ZIP file, you can use the following commands:
unzip -l your_zip_file.zip | awk 'BEGIN { FS = "[/]" } { print NF-1 }' | sort -n | uniq -c
The output will display the depth of nested folders and the number of child files at each depth level within the ZIP archive.