Skip to main content

S3 File synchronization service

Project description

GitHub Pages
PyPI

s3synchrony

Synchronizing data folders across all team members.

Installation

pip install kabbes_s3synchrony

Usage

s3synchrony.main

To run s3synchrony within a command prompt, perform the following steps:

  1. Navigate to the repository you would like to synchronize
cd C:/Path/to/repo
  1. Make sure you have an "s3synchrony.json" file information on how to sync

  2. Run the package from the command prompt

python -m s3synchrony

Call Python script

import s3synchrony
s3synchrony.run()

Comprehensive Overview

The Data Folder

When using S3Synchrony, you are synchronizing all of the data stored in a local directory with the data stored remotely, for example, an AWS S3 bucket. The S3 directory is referenced through both an AWS bucket, an AWS prefix, and the necessary credentials to access said prefix. The local directory to be used can be a relative or full path, and by default will be a subdirectory named "Data" stored in the same working directory.

  • Project Folder
    • Data -> make sure you place your "Data" folder in your .gitignore
    • code, etc.

smart_sync

The smart_sync function is the premier work of this package, and will perform all of the data synchronization for you. This function will check the passed platform name, and reference a self-contained list of supported platforms to instantiate the proper class. This list of supported platforms can be accessed via a call to get_supported_platforms().

Each connection type will require a different set of keyword arguments. For S3, the minimum arguments are "aws_bkt" and "aws_prfx". Please check the class docstrings for each connection type for more information.

All platform classes should be children of the DataPlatformConnection class which is an interface will all necessary public functions. For S3, a folder named .S3 will be created within your data folder. This .S3 folder will contain CSVs used for monitoring data changes and text files for storing small bits of information.

  • versions_remote.csv: Contains the state of data stored remotely
  • versions_local.csv: Contains the state of data stored locally
  • deleted_remote.csv: Contains all files deleted remotely
  • deleted_local.csv: Contains all files deleted locally
  • ignore_remoet.txt: Contains a list of file paths to be ignored entirely

Using these CSVs, S3Synchrony can determine what files you have newly created, deleted, and modified. It will then prompt you to upload these changes to S3. Once you have done so, it will upload new CSVs as needed. After downloading these new CSVs, your collaborative peers will be prompted to download your own changes as well as upload their own.

In addition, a tmp folder will be utilised within the .S3 folder. This tmp folder contains downloaded files from S3 that are used to compute certain CSVs.

Deletions

When deleting files, the user will be prompted to confirm their deletions. Files that are deleted locally will simply be removed. Files deleted from S3, however, will simply be moved into a "deleted" subfolder of the .S3 folder on S3.

reset_all

Resetting all S3Synchrony services is as simple as deleting the .S3 folders contained locally and on S3. Once these are deleted, synchronization cannot occur until they are recreated, which can be done by simply making a new call to S3Synchrony.

Before resetting, however, a call to reset_confirm must occur. The user will then be prompted to confirm that they would like their .S3 folders removed.

License

GNU GPLv3

Author(s)

Created by
Sevan Brodjian - Ameren Innovation Center Intern

Modified by
James Kabbes - Data Scientist: Ameren Innovation Center

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kabbes_s3synchrony-0.4.0.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kabbes_s3synchrony-0.4.0-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file kabbes_s3synchrony-0.4.0.tar.gz.

File metadata

  • Download URL: kabbes_s3synchrony-0.4.0.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.63.1 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for kabbes_s3synchrony-0.4.0.tar.gz
Algorithm Hash digest
SHA256 892958c3f3a38c773038781c1b1d9232b07fc043f7a75c723288fcf802e5c314
MD5 e1de1c0ce43c6be14dd4f15cd77b6d23
BLAKE2b-256 12477126e5d53c38288146aff7725fd80e2bdfc3122e6a589d3e92197d8ed96e

See more details on using hashes here.

File details

Details for the file kabbes_s3synchrony-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: kabbes_s3synchrony-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.63.1 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for kabbes_s3synchrony-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f7c8f889121284fda33fe1a2389468a91edfd7ce59c3b09068750e1e48995891
MD5 b75748974120968f7fdb80b6f5504d1b
BLAKE2b-256 29f432fc2382684e184dc0018b1551f5a39be076ced67435dc714c320ab3171e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page