Skip to main content

A tool to sync data across data sources

Project description

pyFission

Description

pyFission syncs tables/schemas across databases as defined in the configs/fission.py file. If src_table and src_schema args are not provided, it will sync all tables and schemas specified in the fission.py config file by summoning multiple bash commands in parallel.

The pyfission module has 2 methods of syncing tables -

  1. full: truncate + full sync of table, as name suggests
  2. incremental: syncs changes based on primary key and replication key provided in fission.py config file

pyFission automatically detects and builds the schema definition for tables, including nested and repeated fields for JSON-like schemas e.g:- BigQuery. Also, pyFission patches tables automatically if any new fields are added.

Getting started

Usage

  • Modify configs/custom_config.py and configs/fission.py with DB creds and sync config respectively
  • Add service account private key files/secrets to secrets_storage dir if needed
$ python -m pyfission --src [] --dest [] sync --help
usage: __main__.py pyfission [-h] [--src_table SRC_TABLE]
                          [--src_schema SRC_SCHEMA] [--src_db SRC_DB]
                          [--dest_table DEST_TABLE]
                          [--dest_schema DEST_SCHEMA] [--dest_db DEST_DB]
                          [--method {full,incremental}]
                          [--out_format {csv,json}]

optional arguments:
  -h, --help            show this help message and exit
  --src_table SRC_TABLE
                        Overrides table definition from pyfission configs
  --src_schema SRC_SCHEMA
                        Overrides schema definition from pyfission configs
  --src_db SRC_DB       Overrides database definition from pyfission configs
  --dest_table DEST_TABLE
                        Overrides table definition from pyfission configs
  --dest_schema DEST_SCHEMA
                        Overrides schema definition from pyfission configs
  --dest_db DEST_DB     Overrides database definition from pyfission configs
  --method {full,incremental}
                        Method of Replication
  --out_format {csv,json}
                        Format of output files
  • Recommended out_format:

    • json for BigQuery
    • csv for others
  • To sync a particular table - specify the src_schema and src_table args

    • dest_table will have same name as src_table unless explicitly specified
    • dest_schema will be set from fission.py config file's fission_dest dict
    • src_db and dest_db will derived from src and dest
python -m pyfission --src [] --dest [] sync --src_table [] --src_schema [] --out_format []
  • To sync an entire DB - only provide src and dest args
python -m pyfission --src [] --dest [] sync --out_format []

Installation

pip install pyfission

Contribution/Local installation

  1. Clone the repo
$ git clone https://github.com/nishantnath/pyfission.git
  1. Change the working directory to fission
$ cd pyfission
  1. Install python3 and the requirements if they don't exist
$ pip install -r requirements.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyFission-1.0.2.tar.gz (24.1 kB view hashes)

Uploaded Source

Built Distribution

pyFission-1.0.2-py3-none-any.whl (33.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page