Skip to main content

Tools for working with Redshift Spectrum.

Project description

Spectrify

https://img.shields.io/pypi/v/spectrify.svg https://img.shields.io/travis/hellonarrativ/spectrify.svg Documentation Status

A simple yet powerful tool to move your data from Redshift to Redshift Spectrum.

Features

One-liners to:

  • Export a Redshift table to S3 (CSV)

  • Convert exported CSVs to Parquet files in parallel

  • Create the Spectrum table on your Redshift cluster

  • Perform all 3 steps in sequence, essentially “copying” a Redshift table Spectrum in one command.

S3 credentials are specified using boto3. See http://boto3.readthedocs.io/en/latest/guide/configuration.html

Redshift credentials are supplied via environment variables, command-line parameters, or interactive prompt.

Install

$ pip install psycopg2  # or psycopg2-binary
$ pip install spectrify

Command-line Usage

Export Redshift table my_table to a folder of CSV files on S3:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb export my_table \
    's3://example-bucket/my_table'

Convert exported CSVs to Parquet:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb convert my_table \
    's3://example-bucket/my_table'

Create Spectrum table from S3 folder:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb create_table \
    's3://example-bucket/my_table' my_table my_spectrum_table

Transform Redshift table by performing all 3 steps in sequence:

$ spectrify --host=example-url.redshift.aws.com --user=myuser --db=mydb transform my_table \
    's3://example-bucket/my_table'

Python Usage

Export to S3:

from spectrify.export import RedshiftDataExporter
RedshiftDataExporter(sa_engine, s3_config).export_to_csv('my_table')

Convert exported CSVs to Parquet:

from spectrify.convert import ConcurrentManifestConverter
from spectrify.utils.schema import SqlAlchemySchemaReader
sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
ConcurrentManifestConverter(sa_table, s3_config).convert_manifest()

Create Spectrum table from S3 parquet folder:

from spectrify.create import SpectrumTableCreator
from spectrify.utils.schema import SqlAlchemySchemaReader
sa_table = SqlAlchemySchemaReader(engine).get_table_schema('my_table')
SpectrumTableCreator(sa_engine, dest_schema, dest_table_name, sa_table, s3_config).create()

Transform Redshift table by performing all 3 steps in sequence:

from spectrify.transform import TableTransformer
transformer = TableTransformer(engine, 'my_table', s3_config, dest_schema, dest_table_name)
transformer.transform()

Contribute

Contributions always welcome! Read our guide on contributing here: http://spectrify.readthedocs.io/en/latest/contributing.html

License

MIT License. Copyright (c) 2017, The Narrativ Company, Inc.

History

3.1.0 (2020-01-18)

  • Remove psycopg2 requirement (allows use of either psycopg2 or psycopg2-binary)

3.0.1 (2019-11-26)

  • Fix changelog

3.0.0 (2019-11-26)

Backwards incompatible changes:

  • Add REGION parameter to UNLOAD operations

  • Bugfix: Correctly construct path for S3 bucket in “create-table” command

Other Changes:

  • Support for obtaining credentials with AWS session token

  • Upgrade to pytest v4.6.6

  • Fix Flake8 errors

2.0.0 (2019-03-09)

  • Default to 256MB files

  • Flag for unicode support on Python 2.7 (performance implications)

  • Drop support for Python 3.4

  • Support for additional CSV format parameters

  • Support for REAL data type

1.0.1 (2018-07-12)

  • Loosen version requirement for PyArrow

  • Add example script

  • Update documentation

1.0.0 (2018-04-20)

  • Move functionality into classes to make customizing behavior easier

  • Add support for DATE columns

  • Add support for DECIMAL/NUMERIC columns

  • Upgrade to pyarrow v0.9.0

0.4.1 (2018-03-25)

  • Fix exception when source table is not in schema public

0.4.0 (2018-02-25)

  • Upgrade to pyarrow v0.8.0

  • Verify Redshift column types are supported before attempting conversion

  • Bugfix: Properly clean up multiprocessing.pool resource

0.3.0 (2017-10-30)

  • Support 16- and 32-bit integers

  • Packaging updates

0.2.1 (2017-09-27)

  • Fix Readme

0.2.0 (2017-09-27)

  • First release on PyPI.

0.1.0 (2017-09-13)

  • Didn’t even make it to PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectrify-3.1.0.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

spectrify-3.1.0-py2.py3-none-any.whl (17.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file spectrify-3.1.0.tar.gz.

File metadata

  • Download URL: spectrify-3.1.0.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.15

File hashes

Hashes for spectrify-3.1.0.tar.gz
Algorithm Hash digest
SHA256 e7121d745fdea8d5a4941664b425a4da08d52f63c7247d3aa364e2ecb98050cf
MD5 51602c46a20f3e1771f2a536efbe261e
BLAKE2b-256 826049d6f91b7568d60b182d246dadac4b10fdab12a09fafe2b8debbb093b8fc

See more details on using hashes here.

File details

Details for the file spectrify-3.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: spectrify-3.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/2.7.15

File hashes

Hashes for spectrify-3.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 26d6631dd5d81499da0630159353ec1d19bb797cb1773a8318b410c864e31fd9
MD5 1270f66ac9d1bd6f9ed6f91bbe0ff074
BLAKE2b-256 0cd37de2a0b55f10ae2cb469ca347e90f6382773f67108982769081cfec2d435

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page