Tools for working with Redshift Spectrum.
Project description
Spectrify
A simple yet powerful tool to move your data from Redshift to Redshift Spectrum.
Free software: MIT license
Documentation: https://spectrify.readthedocs.io.
Features
One-liners to:
Export a Redshift table to S3 (CSV)
Convert exported CSVs to Parquet files in parallel
Create the Spectrum table on your Redshift cluster
Perform all 3 steps in sequence, essentially “copying” a Redshift table Spectrum in one command.
S3 credentials are specified using boto3. See http://boto3.readthedocs.io/en/latest/guide/configuration.html
Redshift credentials are supplied via environment variables, command-line parameters, or interactive prompt.
Install
$ pip install spectrify
Command-line Usage
Export Redshift table my_table to a folder of CSV files on S3:
$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb export my_table \
's3://example-bucket/my_table'
Convert exported CSVs to Parquet:
$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb convert my_table \
's3://example-bucket/my_table'
Create Spectrum table from S3 folder:
$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb create_table \
's3://example-bucket/my_table' my_table my_spectrum_table
Transform Redshift table by performing all 3 steps in sequence:
$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb transform my_table \
's3://example-bucket/my_table'
Python Usage
Export to S3:
from spectrify.export import RedshiftDataExporter
RedshiftDataExporter(sa_engine, s3_config).export_to_csv('my_table')
Convert exported CSVs to Parquet:
from spectrify.convert import ConcurrentManifestConverter
from spectrify.utils.schema import SqlAlchemySchemaReader
sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
ConcurrentManifestConverter(sa_table, s3_config).convert_manifest()
Create Spectrum table from S3 parquet folder:
from spectrify.create import SpectrumTableCreator
from spectrify.utils.schema import SqlAlchemySchemaReader
sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
SpectrumTableCreator(sa_engine, dest_schema, dest_table_name, sa_table, s3_config).create()
Transform Redshift table by performing all 3 steps in sequence:
from spectrify.transform import TableTransformer
transformer = TableTransformer(engine, 'my_table', s3_config, dest_schema, dest_table_name)
transformer.transform()
Contribute
Contributions always welcome! Read our guide on contributing here: http://spectrify.readthedocs.io/en/latest/contributing.html
License
MIT License. Copyright (c) 2017, The Narrativ Company, Inc.
History
1.0.1 (2018-07-12)
Loosen version requirement for PyArrow
Add example script
Update documentation
1.0.0 (2018-04-20)
Move functionality into classes to make customizing behavior easier
Add support for DATE columns
Add support for DECIMAL/NUMERIC columns
Upgrade to pyarrow v0.9.0
0.4.1 (2018-03-25)
Fix exception when source table is not in schema public
0.4.0 (2018-02-25)
Upgrade to pyarrow v0.8.0
Verify Redshift column types are supported before attempting conversion
Bugfix: Properly clean up multiprocessing.pool resource
0.3.0 (2017-10-30)
Support 16- and 32-bit integers
Packaging updates
0.2.1 (2017-09-27)
Fix Readme
0.2.0 (2017-09-27)
First release on PyPI.
0.1.0 (2017-09-13)
Didn’t even make it to PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spectrify-1.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa165cac630e610588e4a6c39e04d9ebbbfced229d92331adf465c976f7a196f |
|
MD5 | 08c5f9ead1da6ff9a9e28c99b5bb87ef |
|
BLAKE2b-256 | ce09f016b38d020a85d82e07cc523d79b87857d54775f1be294aaf1331bd19cc |