Skip to content

Script to copy files between places in AWS s3 concurrently

License

Notifications You must be signed in to change notification settings

ReagentX/s3-copy-concurrent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

s3-copy-concurrent

This is a Python 3 script that provides a function to concurrenrly copy files from one location in AWS S3 to another. Concurrent copy operations on multiple directories expedites copy times. When running, it prints:

364_1 -> 37303
Folder doesnt exist
original/364/1 moves to new/37303
Copying 23490 items using <function copy at 0x11eeb2c80> in 32 processes.
23490 copy operations in 1:19

The first line means that we are checking the files under the prefix 364_1 to ensure they are all in 37303. Since 37303 doesn't exist, we copy from original/364/1 to new/37303. This runs in 32 processes, i.e. 32 concurrent copy operations, and completes in 1 minute and 19 seconds.

The only dependency is the AWS s3 library boto3 which can be installed into your venv with pip install boto3.

Analysis

You can analyze the printout by piping the output to a file and running analyze.py to get some interesting numbers:

Total copies:   10,130,334
Total seconds:  117,735
Total minutes:  1962.25
Total hours:    32.70
Total days:     1.36

About

Script to copy files between places in AWS s3 concurrently

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages