AWS Redshift Spectrum utilities.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Upload Python Package

[WIP] spectron

Generate AWS Athena and Spectrum DDL from JSON

Install:

pip install spectron[json]

CLI Usage:

spectron nested_big_data.json > nested_big_data.sql

positional arguments:
  infile                JSON to convert

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -c, --case_map        disable case insensitivity and map field with
                        uppercase chars to lowercase
  -l, --lowercase       DDL: enable case insensitivity and force all fields to
                        lowercase - applied before field lookup in mapping
  -r, --retain_hyphens  disable auto convert hypens to underscores
  -e, --error_nested_arrarys
                        raise exception for nested arrays
  -f IGNORE_FIELDS, --ignore_fields IGNORE_FIELDS
                        Comma separated fields to ignore
  -j, --ignore_malformed_json
                        DDL: ignore malformed json
  -m MAPPING_FILE, --mapping MAPPING_FILE
                        JSON filepath to use for mapping field names e.g.
                        {field_name: new_field_name}
  -y TYPE_MAP_FILE, --type_map TYPE_MAP_FILE
                        JSON filepath to use for mapping field names to known
                        data types e.g. {column: dtype}
  -p PARTITIONS_FILE, --partitions_file PARTITIONS_FILE
                        DDL: JSON filepath to map parition column(s) e.g.
                        {column: dtype}
  -s SCHEMA, --schema SCHEMA
                        DDL: schema name
  -t TABLE, --table TABLE
                        DDL: table name
  --s3 S3_KEY           DDL: S3 Key prefix e.g. bucket/dir

Options:

TODO

Programmatic Usage:

In [1]: from spectron import ddl                                                

In [2]: %paste                                                                  
d = {
    "uuid": 1234567,
    "events": [
        {"ts": 0, "status": True, "avg": 0.123},
        {"ts": 1, "status": False, "avg": 1.234}
    ]
}

In [3]: sql = ddl.from_dict(d)                                                  

In [4]: print(sql)                                                              
CREATE EXTERNAL TABLE {schema}.{table} (
    uuid INT,
    events array<
        struct<
            ts: SMALLINT,
            status: BOOL,
            "avg": FLOAT4
        >
    >
)
ROW FORMAT SERDE
    'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
    'case.insensitive'='FALSE',
    'ignore.malformed.json'='TRUE'
)
STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://{bucket}/{prefix}';

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.8

Sep 25, 2020

0.4.7

Aug 10, 2020

0.4.6

Aug 8, 2020

0.4.5

Aug 2, 2020

0.4.4

Jul 25, 2020

0.4.3

Jul 23, 2020

0.4.2

Jul 20, 2020

0.4.1

Jul 12, 2020

0.3.1

Jun 4, 2020

0.3.0

May 29, 2020

0.2.4

May 24, 2020

This version

0.2.3

May 23, 2020

0.1.7

May 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectron-0.2.3.tar.gz (14.8 kB view hashes)

Uploaded May 23, 2020 Source

Built Distribution

spectron-0.2.3-py3-none-any.whl (15.5 kB view hashes)

Uploaded May 23, 2020 Python 3

Hashes for spectron-0.2.3.tar.gz

Hashes for spectron-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`3bda5a126b016cdf1fbe2cc7844aaccdddd005e31b8fe3ed21244b315eede7bc`
MD5	`fa3213b85e731a9c8d71d45ba3f788b5`
BLAKE2b-256	`a23e28433fd12d94ce7eb14748f56488bdd292ca5a54da76c00f78de1529eab8`

Hashes for spectron-0.2.3-py3-none-any.whl

Hashes for spectron-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`98924f4aaf38a34717deb7b9b43567fb162590cccfa6e335eff623ec314c4d27`
MD5	`ed5de01689186e8c22e0e7fe6d3722b8`
BLAKE2b-256	`493c5ec3f1a8aad6af9680a0cf38a722d93a60f0da2a2521de7e294215849538`