AWS Redshift Spectrum utilities.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Upload Python Package pip

[WIP] spectron

Generate AWS Athena and Spectrum DDL from JSON

Install:

pip install spectron[json]

CLI Usage:

# single input file:
spectron nested_big_data.json > nested_big_data.sql

# multiple input files to summarize all key structures seen:
spectron dataz/*.json > big_data.sql

usage: spectron [-h] [-V] [-v] [-c | -l] [-n] [-d] [-r] [-e]
                [-f col1,col2,...] [-m filepath] [-y filepath] [-p filepath]
                [-j] [-s schema] [-t table] [--s3 s3://bucket/key]
                infile [infile ...]

Generate Athena and Spectrum DDL from JSON

positional arguments:
  infile                JSON file(s) to convert

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -v, --verbose         increase logging level
  -c, --case_map        disable case insensitivity and map field with
                        uppercase chars to lowercase
  -l, --lowercase       DDL: enable case insensitivity and force all fields to
                        lowercase - applied before field lookup in mapping
  -n, --numeric_overflow
                        raise exception on numeric overflow
  -d, --infer_date      infer date string types - supports ISO 8601 for date,
                        datetime[TZ]
  -r, --retain_hyphens  disable auto convert hypens to underscores
  -e, --error_nested_arrarys
                        raise exception for nested arrays
  -f col1,col2,..., --ignore_fields col1,col2,...
                        Comma separated fields to ignore
  -m filepath, --mapping filepath
                        JSON filepath to use for mapping field names e.g.
                        {field_name: new_field_name}
  -y filepath, --type_map filepath
                        JSON filepath to use for mapping field names to known
                        data types e.g. {key: value}
  -p filepath, --partitions_file filepath
                        DDL: JSON filepath to map parition column(s) e.g.
                        {column: dtype}
  -j, --ignore_malformed_json
                        DDL: ignore malformed json
  -s schema, --schema schema
                        DDL: schema name
  -t table, --table table
                        DDL: table name
  --s3 s3://bucket/key  DDL: S3 Key prefix

Options:

TODO

Programmatic Usage:

In [1]: from spectron import ddl                                                

In [2]: %paste                                                                  
d = {
    "uuid": 1234567,
    "events": [
        {"ts": 0, "status": True, "avg": 0.123},
        {"ts": 1, "status": False, "avg": 1.234}
    ]
}

In [3]: sql = ddl.from_dict(d)                                                  

In [4]: print(sql)                                                              
CREATE EXTERNAL TABLE {schema}.{table} (
    uuid INT,
    events array<
        struct<
            ts: SMALLINT,
            status: BOOL,
            "avg": FLOAT4
        >
    >
)
ROW FORMAT SERDE
    'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
    'case.insensitive'='FALSE',
    'ignore.malformed.json'='TRUE'
)
STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://{bucket}/{prefix}';

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.4.8

Sep 25, 2020

0.4.7

Aug 10, 2020

0.4.6

Aug 8, 2020

0.4.5

Aug 2, 2020

0.4.4

Jul 25, 2020

0.4.3

Jul 23, 2020

0.4.2

Jul 20, 2020

0.4.1

Jul 12, 2020

0.3.1

Jun 4, 2020

0.3.0

May 29, 2020

0.2.4

May 24, 2020

0.2.3

May 23, 2020

0.1.7

May 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectron-0.4.8.tar.gz (22.3 kB view hashes)

Uploaded Sep 25, 2020 Source

Built Distribution

spectron-0.4.8-py3-none-any.whl (24.5 kB view hashes)

Uploaded Sep 25, 2020 Python 3

Hashes for spectron-0.4.8.tar.gz

Hashes for spectron-0.4.8.tar.gz
Algorithm	Hash digest
SHA256	`1813b3168668b737b3ff8d905e72605d9cd8b4bec390d41aa9a23a20cda78a2a`
MD5	`c15e04b15f4298117ea8f3b5fb2239a4`
BLAKE2b-256	`82f7038637e86345cecd9beb4b95707173784886e4d56f1037771e61c6f1123c`

Hashes for spectron-0.4.8-py3-none-any.whl

Hashes for spectron-0.4.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c52bb88d772a915049c225a2acdc66253a254106801bbfbe140ba42d94e0367`
MD5	`2182fc6315bbb38e5b01d4f9e13cbbb5`
BLAKE2b-256	`4b204c69b7d387a3ca1059b40ccb4b4e3fd3832c94fbdccf47ce6b1e1f8a61d6`