Continuously Sync local files to/from S3
Project description
s3sync
- Overview
- Features
- Requirements
- Installation
- Usage
- Configuration
- Performance Tests for aws sync command
Overview
s3sync.py is a utility created to sync files to/from S3 as a continuously running process, without having to manually take care of managing the sync. It internally uses the aws s3 sync
command to do the sync, and uses the python module watchdog to listen to filesystem events on the monitored path and push changes to S3. For pull, there is no listener implemented, and it does simple interval based pull. Therefore for pull it is recommended to use s3fs instead - just mount the s3 bucket on your filesystem.
Features
- Rate limiting using the python tocken-bucket module. You can set
max_syncs_per_minute
in the config yaml and the file system watcher triggered pushes will be throttled as per that limit. - Optional Reporting of runtime stats for the sync operation using the pyformance module
- Ability to filter by
include_patterns
andexclude_patterns
or toexclude_directories
completely, or make the filtercase_sensitive
- Automated setup of AWS CLI config by creating a separate named profile for the utility with ability to tune performance by setting
max_concurrent_requests
,max_queue_size
, etc. - Setuptools integration, python click based command line interface
Requirements
Requires AWS CLI version 2 to be installed and available in the path
Installation
pip install pys3sync
Usage
s3sync --help
Usage: s3sync.py [OPTIONS] COMMAND [ARGS]...
A utility created to sync files to/from S3 as a continuously running
process, without having to manually take care of managing the sync. It
internally uses the aws s3 sync command to do the sync and uses python's
watchdog listener to get notified of any changes to the watched folder.
Options:
--config PATH Path to the config.yaml file containing configuration
params for this utility [required]
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG
--help Show this message and exit.
Commands:
init Initial setup.
pull One-way continuous sync from s3 path to local path (based on
polling...
push One-way continuous sync from localpath to s3 path (uses a file...
s3sync --config config.yaml push --help
Usage: s3sync.py push [OPTIONS]
One-way continuous sync from localpath to s3 path (uses a file watcher
called watchdog)
Options:
--s3path PATH Full s3 path to sync to/from [required]
--localpath PATH Local directory path which you want to sync [required]
--help Show this message and exit.
--url Endpoint url
s3sync --config config.yaml push --help
Usage: s3sync.py pull [OPTIONS]
One-way continuous sync from s3 path to local path (based on polling on an
interval)
Options:
--s3path PATH Full s3 path to sync to/from [required]
--localpath PATH Local directory path which you want to sync [required]
--interval INTEGER S3 polling interval in seconds [required]
--help Show this message and exit.
--url Endpoint url
First run/setup
s3sync --config-yaml config.yaml init
This utility creates a named profile for your AWS CLI so that the parameters required for the S3 cli for the utility are isolated from your regular AWS CLI profile. The first time you nee to run the init
command, which will create the named profile s3sync
in your local aws config (~/.aws/config
), with the parameters configured in config.yaml
and credentials copied from your default AWS credentials file.
Push
You run one instance of this utility per localpath<>s3path combination that you want to continuously sync
s3sync --config config.yaml -v DEBUG push --s3path s3://<bucket>/<path> --localpath ./ --url https://[endpoint-url]
Pull
s3sync --config config.yaml -v DEBUG pull --s3path s3://<bucket>/<path> --localpath ./sync --interval 2 --url https://[endpoint-url]
Configuration
global:
max_syncs_per_minute: 10
report_stats: False
watcher:
include_patterns:
exclude_patterns: ["*.git/*"]
exclude_directories: False
case_sensitive: False
s3:
max_concurrent_requests: 20
max_queue_size: 2000
multipart_threshold: 8MB
multipart_chunksize: 8MB
max_bandwidth:
use_accelerate_endpoint: "false"
region: ap-south-1
Include/excluse patterns
Include/excluse patterns are implemented using pathtools.match_any_path, which ultimately supports unix glob pattern syntax. You can test your patterns using the provided script patternhelper.py
. These patterns are passed to the watchdog as well as aws cli, which also uses the same syntax. Both properties accept a list of patterns.
Advanced Configuration
Please change these values carefully. They depend on your machine and your internet connection. Read more about improving s3 sync transfer speeds here
max_concurrent_requests
Passed through to your ~/.aws/config
via aws configure set default.s3.max_concurrent_requests
command. Read about the parameter here
max_queue_size
Passed through to your ~/.aws/config
via aws configure set default.s3.max_queue_size
command. Read about the parameter here
multipart_threshold
Passed through to your ~/.aws/config
via aws configure set default.s3.multipart_threshold
command. Read about the parameter here
multipart_chunksize
Passed through to your ~/.aws/config
via aws configure set default.s3.multipart_chunksize
command. Read about the parameter here
max_bandwidth
Passed through to your ~/.aws/config
via aws configure set default.s3.max_bandwidth
command. Read about the parameter here
use_accelerate_endpoint
Passed through to your ~/.aws/config
via aws configure set default.s3.use_accelerate_endpoint
command. Read about the parameter here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pys3nfssync-0.0.1.tar.gz
.
File metadata
- Download URL: pys3nfssync-0.0.1.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d987ffd9d23be08e648b2ba539658ed45477e2bf0edace75f46d44315c827d3b |
|
MD5 | 265168dee546a590180ca4401702e1a0 |
|
BLAKE2b-256 | 7d496d7b713f3fe49774145b278fefcb7b05d4bf4f22889c59199e04db60d54d |
File details
Details for the file pys3nfssync-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: pys3nfssync-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6388e11999ee8e8511fbf40c28eb13064d47337c9928886b89e65ad96e108ed |
|
MD5 | 0f04238fbcd43234fdb8e514427e0b1b |
|
BLAKE2b-256 | cb33fd4a56f168601fba95393fa6998ebd91d74e0d6a3a98a292bf7a32e72cd1 |