To achieve a ZIP file size of approximately 1 GB, considering compression, is a bit challenging because the ZIP compression algorithm can significantly reduce the size of files with repetitive or simple content (like files filled with zeros). To get a ZIP file closer to 1 GB, you should use less compressible data. One approach is to use random data for file contents, which typically doesn’t compress well.
Here’s an updated version of the script that uses random data for the file contents:
####### python code ################ import os import zipfile import random import math def create_nested_directories(base_path, depth): for i in range(depth): current_path = os.path.join(base_path, *[f"folder_{j}" for j in range(i + 1)]) os.makedirs(current_path, exist_ok=True) def create_files_with_random_data(base_path, depth, total_files, total_size_gb): total_size_bytes = total_size_gb * 1024 * 1024 * 1024 file_size = math.ceil(total_size_bytes / total_files) file_count = 0 for i in range(depth): current_path = os.path.join(base_path, *[f"folder_{j}" for j in range(i + 1)]) num_files_in_current_folder = total_files // depth for _ in range(num_files_in_current_folder): if file_count >= total_files: break file_path = os.path.join(current_path, f"file_{file_count}.txt") with open(file_path, 'wb') as f: f.write(bytearray(random.getrandbits(8) for _ in range(file_size))) file_count += 1 def zip_directory(zip_filename, dir_name): with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf: for root, dirs, files in os.walk(dir_name): for file in files: zipf.write(os.path.join(root, file), os.path.relpath(os.path.join(root, file), os.path.join(dir_name, '..'))) # Parameters base_dir = "nested_folders" depth = 50 total_files = 100000 total_size_gb = 1 # Create nested directories create_nested_directories(base_dir, depth) # Create files with random data create_files_with_random_data(base_dir, depth, total_files, total_size_gb) # Compress into a ZIP file zip_filename = "large_nested_structure.zip" zip_directory(zip_filename, base_dir) # Optional: Clean up by removing the directory structure # import shutil # shutil.rmtree(base_dir)
In this script, the create_files_with_random_data
function fills each file with random bytes, which are less compressible. This should make the resulting ZIP file closer to 1 GB in size. However, note that generating random data for a large number of files can be time-consuming and computationally intensive.
Again, ensure you have enough disk space for this operation and remember that the cleanup step is commented out. You can uncomment it if you wish to delete the files and directories after the ZIP file is created.
How to check information on file:
To check the file size, count the depth of nested folders, and the number of child files within a ZIP file without actually unzipping it, you can use the unzip
command along with other shell commands. Here’s a command that does that:
unzip -l your_zip_file.zip | awk 'BEGIN { FS = "[ \t\n]+" } { print $1 }' | tail -n 1
To count the depth of nested folders and the number of child files within the ZIP file, you can use the following commands:
unzip -l your_zip_file.zip | awk 'BEGIN { FS = "[/]" } { print NF-1 }' | sort -n | uniq -c
The output will display the depth of nested folders and the number of child files at each depth level within the ZIP archive.