spectrify

Tools for working with Redshift Spectrum.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

Spectrify

A simple yet powerful tool to move your data from Redshift to Redshift Spectrum.

Free software: MIT license
Documentation: https://spectrify.readthedocs.io.

Features

One-liners to:

Export a Redshift table to S3 (CSV)
Convert exported CSVs to Parquet files in parallel
Create the Spectrum table on your Redshift cluster
Perform all 3 steps in sequence, essentially “copying” a Redshift table Spectrum in one command.

S3 credentials are specified using boto3. See http://boto3.readthedocs.io/en/latest/guide/configuration.html

Redshift credentials are supplied via environment variables, command-line parameters, or interactive prompt.

Install

$ pip install psycopg2  # or psycopg2-binary
$ pip install spectrify

Command-line Usage

Export Redshift table my_table to a folder of CSV files on S3:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb export my_table \
    's3://example-bucket/my_table'

Convert exported CSVs to Parquet:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb convert my_table \
    's3://example-bucket/my_table'

Create Spectrum table from S3 folder:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb create_table \
    's3://example-bucket/my_table' my_table my_spectrum_table

Transform Redshift table by performing all 3 steps in sequence:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb transform my_table \
    's3://example-bucket/my_table'

Python Usage

Export to S3:

from spectrify.export import RedshiftDataExporter
RedshiftDataExporter(sa_engine, s3_config).export_to_csv('my_table')

Convert exported CSVs to Parquet:

from spectrify.convert import ConcurrentManifestConverter
from spectrify.utils.schema import SqlAlchemySchemaReader
sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
ConcurrentManifestConverter(sa_table, s3_config).convert_manifest()

Create Spectrum table from S3 parquet folder:

from spectrify.create import SpectrumTableCreator
from spectrify.utils.schema import SqlAlchemySchemaReader
sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
SpectrumTableCreator(sa_engine, dest_schema, dest_table_name, sa_table, s3_config).create()

Transform Redshift table by performing all 3 steps in sequence:

from spectrify.transform import TableTransformer
transformer = TableTransformer(engine, 'my_table', s3_config, dest_schema, dest_table_name)
transformer.transform()

Contribute

Contributions always welcome! Read our guide on contributing here: http://spectrify.readthedocs.io/en/latest/contributing.html

License

History

3.1.0 (2020-01-18)

Remove psycopg2 requirement (allows use of either psycopg2 or psycopg2-binary)

3.0.1 (2019-11-26)

Fix changelog

3.0.0 (2019-11-26)

Backwards incompatible changes:

Add REGION parameter to UNLOAD operations
Bugfix: Correctly construct path for S3 bucket in “create-table” command

Other Changes:

Support for obtaining credentials with AWS session token
Upgrade to pytest v4.6.6
Fix Flake8 errors

2.0.0 (2019-03-09)

Default to 256MB files
Flag for unicode support on Python 2.7 (performance implications)
Drop support for Python 3.4
Support for additional CSV format parameters
Support for REAL data type

1.0.1 (2018-07-12)

Loosen version requirement for PyArrow
Add example script
Update documentation

1.0.0 (2018-04-20)

Move functionality into classes to make customizing behavior easier
Add support for DATE columns
Add support for DECIMAL/NUMERIC columns
Upgrade to pyarrow v0.9.0

0.4.1 (2018-03-25)

Fix exception when source table is not in schema public

0.4.0 (2018-02-25)

Upgrade to pyarrow v0.8.0
Verify Redshift column types are supported before attempting conversion
Bugfix: Properly clean up multiprocessing.pool resource

0.3.0 (2017-10-30)

Support 16- and 32-bit integers
Packaging updates

0.2.1 (2017-09-27)

Fix Readme

0.2.0 (2017-09-27)

First release on PyPI.

0.1.0 (2017-09-13)

Didn’t even make it to PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

3.1.0

Jan 18, 2020

3.0.1

Nov 26, 2019

3.0.0

Nov 26, 2019

2.0.0

Mar 9, 2019

1.0.1

Jul 12, 2018

1.0.0

Apr 20, 2018

0.4.1

Mar 25, 2018

0.4.0

Feb 25, 2018

0.3.0

Oct 31, 2017

0.2.1

Sep 27, 2017

0.2.0

Sep 27, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectrify-3.1.0.tar.gz (28.9 kB view details)

Uploaded Jan 18, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spectrify-3.1.0-py2.py3-none-any.whl (17.7 kB view details)

Uploaded Jan 18, 2020 Python 2Python 3

File details

Details for the file spectrify-3.1.0.tar.gz.

File metadata

Download URL: spectrify-3.1.0.tar.gz
Upload date: Jan 18, 2020
Size: 28.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.15

File hashes

Hashes for spectrify-3.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e7121d745fdea8d5a4941664b425a4da08d52f63c7247d3aa364e2ecb98050cf`
MD5	`51602c46a20f3e1771f2a536efbe261e`
BLAKE2b-256	`826049d6f91b7568d60b182d246dadac4b10fdab12a09fafe2b8debbb093b8fc`

See more details on using hashes here.

File details

Details for the file spectrify-3.1.0-py2.py3-none-any.whl.

File metadata

Download URL: spectrify-3.1.0-py2.py3-none-any.whl
Upload date: Jan 18, 2020
Size: 17.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.15

File hashes

Hashes for spectrify-3.1.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`26d6631dd5d81499da0630159353ec1d19bb797cb1773a8318b410c864e31fd9`
MD5	`1270f66ac9d1bd6f9ed6f91bbe0ff074`
BLAKE2b-256	`0cd37de2a0b55f10ae2cb469ca347e90f6382773f67108982769081cfec2d435`

See more details on using hashes here.

spectrify 3.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Spectrify

Features

Install

Command-line Usage

Python Usage

Contribute

License

History

3.1.0 (2020-01-18)

3.0.1 (2019-11-26)

3.0.0 (2019-11-26)

2.0.0 (2019-03-09)

1.0.1 (2018-07-12)

1.0.0 (2018-04-20)

0.4.1 (2018-03-25)

0.4.0 (2018-02-25)

0.3.0 (2017-10-30)

0.2.1 (2017-09-27)

0.2.0 (2017-09-27)

0.1.0 (2017-09-13)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes