Synchronises S3 buckets with local directories
Project description
Greas3
Greas3 is a Python package and command line application for uploading files to Amazon Web Services S3 like a greased thing.
Rather rely on timestamps to detect changes, Greas3 uses SHA256 checksums to avoid re-uploading identical files. Since Git does not record file modification timestamps, this makes Greas3 ideal for CI/CD pipelines that pull large files to upload to S3.
Installation
Greas3 requires Python 3.9 or later and can be installed from PyPI.
pip install greas3
Command line usage
To upload a local file or directory to S3, run:
greas3 LOCAL-PATH S3-URI
For example:
$ greas3 ./clowns.jpg s3://circus/clowns.jpg
/home/cariad/ s3://circus/
clowns.jpg = clowns.jpg
To keep the original filename in S3, provide only the key prefix to upload to:
$ greas3 ./clowns.jpg s3://circus/
/home/cariad/ s3://circus/
clowns.jpg = clowns.jpg
Pass a directory instead of a file to recursively upload its contents:
$ greas3 ./party-time/ s3://circus/inbox/
/home/cariad/party-time/ s3://circus/inbox/
good-clowns/steve.jpg = good-clowns/steve.jpg
evil-clowns/jacob.jpg = evil-clowns/jacob.jpg
group-hug.jpg > group-hug.jpg
=
indicates that a file hasn't changed and so won't be uploaded. >
indicates the file will be uploaded.
Pass --debug
to enable debug logging.
Pass --dry-run
to view the enqueued uploads without performing them.
Python usage
The put()
function accepts a path to a local file or directory and a destination S3 URI to upload to.
The function returns a list of PutOperation
, each of which describes the source file, destination URI and flag to indicate if the files are the same; in which case the upload will not be performed.
To gather the operations without performing them, pass the dry_run=True
flag.
Authorisation
Greas3 will authenticate into Amazon Web Services using the first set of credentials that the SDK finds.
Generally, in order of precedence, credentials are found in:
- (When using the Python package) A Boto3 session with custom credentials can be passed to functions.
- The
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
andAWS_SESSION_TOKEN
environment variables. - A
credentials
orconfig
file created by the AWS CLI. TheAWS_PROFILE
environment variable can prescribe any non-default profile.
Greas3 requires the following actions:
s3:GetObjectAttributes
s3:PutObject
How does it work?
To check if a file should be uploaded, Greas3 calls the S3 GetObjectAttributes API to gather the existing object's file size and SHA256 checksum (both the file's checksum and each individual uploaded chunk's checksum).
If the file doesn't exist in S3, or if the file size is different from the local file, then the files are considered different.
If the file was previously uploaded to S3 in chunks, and if each chunk has a SHA256 checksum, then each chunk's checksum is compared to the local file's equivalent byte range. If the checksums don't match then the files are considered different.
If the object was previously uploaded to S3 in a single chunk then its checksum is compared to the local file. If the checksums don't match then the files are considered different.
Note that S3 will record checksums only if directed to during upload; other tools might not request this but Greas3 does. If the checksums aren't present then Greas3 will have to perform the upload even if the files are identical; however, subsequent use of Greas3 will use the checksums to perform only uploads that are strictly necessary.
Time comparisons
I created a directory containing 10 files with a total of 117 MB, then -- pitching the AWS CLI and Greas3 against each other -- I:
- Synchronised that directory with an S3 bucket.
- Synchronised that directory again without making any changes.
- Synchronised that directory again after touching the local files' modification timestamps.
Upload | AWS CLI (seconds) | Greas3 | Greas3 vs AWS CLI |
---|---|---|---|
Initial | ~51 | ~56 | ~5 seconds slower |
No changes | ~1 | ~2 | ~1 second slower |
Touched modification dates | ~51 | ~2 | ~49 seconds faster |
The AWS CLI is marginally more performant during the initial upload and when timestamps can be trusted.
However, in an environment where timestamps can't be trusted -- for example, in CI/CD pipelines that pull files from Git repositories -- Greas3 is clearly the most performant uploader.
Support
Please submit all your questions, feature requests and bug reports at github.com/cariad/greas3/issues. Thank you!
Licence
Greas3 is open-source and published under the MIT License.
You don't have to give attribution in your project, but -- as a freelance developer with rent to pay -- I appreciate it!
The Author
Hello! 👋 I'm Cariad Eccleston, and I'm a freelance Amazon Web Services architect, DevOps evangelist, CI/CD deployer and backend developer.
You can find me at cariad.earth, github/cariad, linkedin/cariad and on Mastodon at @cariad@tech.lgbt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file greas3-1.0.0b2-py3-none-any.whl
.
File metadata
- Download URL: greas3-1.0.0b2-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4264d5d4bb6e4f26142f4e3e27f4c6a3002fbef19e9a0dcbbcb7cac87fed0769 |
|
MD5 | e34cbb210ec732324a539dd80cdbe26a |
|
BLAKE2b-256 | e3784285d8a9ea9785a99cd88a85c016b95f3f3450d70de8fa1eeacfd620e96d |