Skip to main content

AWS Redshift Spectrum utilities.

Project description

Upload Python Package pip

[WIP] spectron

Generate AWS Athena and Spectrum DDL from JSON

Install:

pip install spectron[json]

CLI Usage:

# single input file:
spectron nested_big_data.json > nested_big_data.sql

# multiple input files to summarize all key structures seen:
spectron dataz/*.json > big_data.sql

usage: spectron [-h] [-V] [-v] [-c | -l] [-n] [-d] [-r] [-e]
                [-f col1,col2,...] [-m filepath] [-y filepath] [-p filepath]
                [-j] [-s schema] [-t table] [--s3 s3://bucket/key]
                infile [infile ...]

Generate Athena and Spectrum DDL from JSON

positional arguments:
  infile                JSON file(s) to convert

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -v, --verbose         increase logging level
  -c, --case_map        disable case insensitivity and map field with
                        uppercase chars to lowercase
  -l, --lowercase       DDL: enable case insensitivity and force all fields to
                        lowercase - applied before field lookup in mapping
  -n, --numeric_overflow
                        raise exception on numeric overflow
  -d, --infer_date      infer date string types - supports ISO 8601 for date,
                        datetime[TZ]
  -r, --retain_hyphens  disable auto convert hypens to underscores
  -e, --error_nested_arrarys
                        raise exception for nested arrays
  -f col1,col2,..., --ignore_fields col1,col2,...
                        Comma separated fields to ignore
  -m filepath, --mapping filepath
                        JSON filepath to use for mapping field names e.g.
                        {field_name: new_field_name}
  -y filepath, --type_map filepath
                        JSON filepath to use for mapping field names to known
                        data types e.g. {key: value}
  -p filepath, --partitions_file filepath
                        DDL: JSON filepath to map parition column(s) e.g.
                        {column: dtype}
  -j, --ignore_malformed_json
                        DDL: ignore malformed json
  -s schema, --schema schema
                        DDL: schema name
  -t table, --table table
                        DDL: table name
  --s3 s3://bucket/key  DDL: S3 Key prefix

Options:

TODO


Programmatic Usage:

In [1]: from spectron import ddl                                                

In [2]: %paste                                                                  
d = {
    "uuid": 1234567,
    "events": [
        {"ts": 0, "status": True, "avg": 0.123},
        {"ts": 1, "status": False, "avg": 1.234}
    ]
}

In [3]: sql = ddl.from_dict(d)                                                  

In [4]: print(sql)                                                              
CREATE EXTERNAL TABLE {schema}.{table} (
    uuid INT,
    events array<
        struct<
            ts: SMALLINT,
            status: BOOL,
            "avg": FLOAT4
        >
    >
)
ROW FORMAT SERDE
    'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
    'case.insensitive'='FALSE',
    'ignore.malformed.json'='TRUE'
)
STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://{bucket}/{prefix}';

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectron-0.4.8.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

spectron-0.4.8-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file spectron-0.4.8.tar.gz.

File metadata

  • Download URL: spectron-0.4.8.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.6

File hashes

Hashes for spectron-0.4.8.tar.gz
Algorithm Hash digest
SHA256 1813b3168668b737b3ff8d905e72605d9cd8b4bec390d41aa9a23a20cda78a2a
MD5 c15e04b15f4298117ea8f3b5fb2239a4
BLAKE2b-256 82f7038637e86345cecd9beb4b95707173784886e4d56f1037771e61c6f1123c

See more details on using hashes here.

File details

Details for the file spectron-0.4.8-py3-none-any.whl.

File metadata

  • Download URL: spectron-0.4.8-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.6

File hashes

Hashes for spectron-0.4.8-py3-none-any.whl
Algorithm Hash digest
SHA256 7c52bb88d772a915049c225a2acdc66253a254106801bbfbe140ba42d94e0367
MD5 2182fc6315bbb38e5b01d4f9e13cbbb5
BLAKE2b-256 4b204c69b7d387a3ca1059b40ccb4b4e3fd3832c94fbdccf47ce6b1e1f8a61d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page