Skip to main content

Auto-generate Redshift schemas from flat files

Project description

Redshift Auto Schema

Redshift Auto Schema is a Python library that takes a delimited flat file or parquet file as input, parses it, and provides a variety of functions that allow for the creation and validation of tables within Amazon Redshift. For each field, the appropriate Redshift data type is inferred from the contents of the file.

Installation

Use the package manager pip to install Redshift Auto Schema.

pip install redshift-auto-schema

Usage

from redshift_auto_schema import RedshiftAutoSchema
import psycopg2 as pg

redshift_conn = pg.connect()

new_table = RedshiftAutoSchema(file='sample_file.parquet',
                               schema='test_schema',
                               table='test_table',
                               conn=redshift_conn)

if not new_table.check_table_existence():
    ddl = new_table.generate_table_ddl()

    with redshift_conn.cursor() as redshift_cursor:
        redshift_cursor.execute(ddl)

Methods

NAME DESCRIPTION
get_column_list Returns column list based on header of file.
check_schema_existence Checks Redshift for the existence of a schema.
check_table_existence Checks Redshift for the existence of a table.
generate_schema_ddl Returns a SQL statement that creates a Redshift schema.
generate_schema_permissions Returns a SQL statement that grants schema usage to the default group.
generate_table_ddl Returns a SQL statement that creates a Redshift table.
generate_column_ddl Returns SQL statement(s) that adds missing column(s) a Redshift table.
generate_table_permissions Returns a SQL statement that grants table read access to the default group.
evaluate_table_ddl_diffs Returns a dataframe containing differences between generated and existing table DDL.

Contributing

Pull requests are welcome.

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redshift-auto-schema-0.1.4.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

redshift_auto_schema-0.1.4-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file redshift-auto-schema-0.1.4.tar.gz.

File metadata

  • Download URL: redshift-auto-schema-0.1.4.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for redshift-auto-schema-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0a93c5ce1dedfdb5c72bda3eb11c3a32db0280a1ca81c92a0ca03f630cc06e78
MD5 15c56bfffac73983c78afbe2ebb50ec8
BLAKE2b-256 e5f4cda63449cdaa377dd4a5c5658d1c18e55c4f8e1d96e13d77a7a960cc4e5c

See more details on using hashes here.

File details

Details for the file redshift_auto_schema-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: redshift_auto_schema-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for redshift_auto_schema-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 28b22a811b8d68119e4fcce2b0be4961c62c6007e5d7426ae7fbf12dc591005f
MD5 147f562d6ad5c394342653dbe031b23e
BLAKE2b-256 6ebec56ad5fff473c2300cc7375cee2b4e493afb76d4572915478963d15ee9d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page