Skip to main content

Continuously Sync local files to/from S3

Project description

s3sync

PyPi Version Supported versions

Overview

s3sync.py is a utility created to sync files to/from S3 as a continuously running process, without having to manually take care of managing the sync. It internally uses the aws s3 sync command to do the sync, and uses the python module watchdog to listen to filesystem events on the monitored path and push changes to S3. For pull, there is no listener implemented, and it does simple interval based pull. Therefore for pull it is recommended to use s3fs instead - just mount the s3 bucket on your filesystem.

Features

  • Rate limiting using the python tocken-bucket module. You can set max_syncs_per_minute in the config yaml and the file system watcher triggered pushes will be throttled as per that limit.
  • Optional Reporting of runtime stats for the sync operation using the pyformance module
  • Ability to filter by include_patternsand exclude_patterns or to exclude_directories completely, or make the filter case_sensitive
  • Automated setup of AWS CLI config by creating a separate named profile for the utility with ability to tune performance by setting max_concurrent_requests, max_queue_size, etc.
  • Setuptools integration, python click based command line interface

Requirements

Requires AWS CLI version 2 to be installed and available in the path

Installation

pip install pys3sync

Usage

s3sync --help

Usage: s3sync.py [OPTIONS] COMMAND [ARGS]...

  A utility created to sync files to/from S3 as a continuously running
  process, without having to manually take care of managing the sync.  It
  internally uses the aws s3 sync command to do the sync and uses python's
  watchdog listener to get notified of any changes to the watched folder.

Options:
  --config PATH        Path to the config.yaml file containing configuration
                       params for this utility  [required]

  -v, --verbosity LVL  Either CRITICAL, ERROR, WARNING, INFO or DEBUG
  --help               Show this message and exit.

Commands:
  init  Initial setup.
  pull  One-way continuous sync from s3 path to local path (based on
        polling...

  push  One-way continuous sync from localpath to s3 path (uses a file...

s3sync --config config.yaml push --help

Usage: s3sync.py push [OPTIONS]

  One-way continuous sync from localpath to s3 path (uses a file watcher
  called watchdog)

Options:
  --s3path PATH     Full s3 path to sync to/from  [required]
  --localpath PATH  Local directory path which you want to sync  [required]
  --help            Show this message and exit.
  --url             Endpoint url

s3sync --config config.yaml push --help

Usage: s3sync.py pull [OPTIONS]

  One-way continuous sync from s3 path to local path (based on polling on an
  interval)

Options:
  --s3path PATH       Full s3 path to sync to/from  [required]
  --localpath PATH    Local directory path which you want to sync  [required]
  --interval INTEGER  S3 polling interval in seconds  [required]
  --help              Show this message and exit.
  --url             Endpoint url
First run/setup

s3sync --config-yaml config.yaml init

This utility creates a named profile for your AWS CLI so that the parameters required for the S3 cli for the utility are isolated from your regular AWS CLI profile. The first time you nee to run the init command, which will create the named profile s3sync in your local aws config (~/.aws/config), with the parameters configured in config.yaml and credentials copied from your default AWS credentials file.

Push

You run one instance of this utility per localpath<>s3path combination that you want to continuously sync

s3sync --config config.yaml -v DEBUG push --s3path s3://<bucket>/<path> --localpath ./ --url https://[endpoint-url]

Pull

s3sync --config config.yaml -v DEBUG pull --s3path s3://<bucket>/<path> --localpath ./sync --interval 2 --url https://[endpoint-url]

Configuration

global:
  max_syncs_per_minute: 10
  report_stats: False
watcher:
  include_patterns: 
  exclude_patterns: ["*.git/*"]
  exclude_directories: False
  case_sensitive: False
s3:
  max_concurrent_requests: 20
  max_queue_size: 2000
  multipart_threshold: 8MB
  multipart_chunksize: 8MB
  max_bandwidth: 
  use_accelerate_endpoint: "false"
  region: ap-south-1
Include/excluse patterns

Include/excluse patterns are implemented using pathtools.match_any_path, which ultimately supports unix glob pattern syntax. You can test your patterns using the provided script patternhelper.py. These patterns are passed to the watchdog as well as aws cli, which also uses the same syntax. Both properties accept a list of patterns.

Advanced Configuration

Please change these values carefully. They depend on your machine and your internet connection. Read more about improving s3 sync transfer speeds here

max_concurrent_requests

Passed through to your ~/.aws/config via aws configure set default.s3.max_concurrent_requests command. Read about the parameter here

max_queue_size

Passed through to your ~/.aws/config via aws configure set default.s3.max_queue_size command. Read about the parameter here

multipart_threshold

Passed through to your ~/.aws/config via aws configure set default.s3.multipart_threshold command. Read about the parameter here

multipart_chunksize

Passed through to your ~/.aws/config via aws configure set default.s3.multipart_chunksize command. Read about the parameter here

max_bandwidth

Passed through to your ~/.aws/config via aws configure set default.s3.max_bandwidth command. Read about the parameter here

use_accelerate_endpoint

Passed through to your ~/.aws/config via aws configure set default.s3.use_accelerate_endpoint command. Read about the parameter here


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pys3nfssync-0.0.1.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

pys3nfssync-0.0.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file pys3nfssync-0.0.1.tar.gz.

File metadata

  • Download URL: pys3nfssync-0.0.1.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.5

File hashes

Hashes for pys3nfssync-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d987ffd9d23be08e648b2ba539658ed45477e2bf0edace75f46d44315c827d3b
MD5 265168dee546a590180ca4401702e1a0
BLAKE2b-256 7d496d7b713f3fe49774145b278fefcb7b05d4bf4f22889c59199e04db60d54d

See more details on using hashes here.

File details

Details for the file pys3nfssync-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pys3nfssync-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.5

File hashes

Hashes for pys3nfssync-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b6388e11999ee8e8511fbf40c28eb13064d47337c9928886b89e65ad96e108ed
MD5 0f04238fbcd43234fdb8e514427e0b1b
BLAKE2b-256 cb33fd4a56f168601fba95393fa6998ebd91d74e0d6a3a98a292bf7a32e72cd1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page