AWS Redshift Spectrum utilities.
Project description
[WIP] spectron
Generate AWS Athena and Spectrum DDL from JSON
Install:
pip install spectron[json]
CLI Usage:
# single input file:
spectron nested_big_data.json > nested_big_data.sql
# multiple input files to summarize all key structures seen:
spectron dataz/*.json > big_data.sql
usage: spectron [-h] [-V] [-v] [-c | -l] [-n] [-d] [-r] [-e]
[-f col1,col2,...] [-m filepath] [-y filepath] [-p filepath]
[-j] [-s schema] [-t table] [--s3 s3://bucket/key]
infile [infile ...]
Generate Athena and Spectrum DDL from JSON
positional arguments:
infile JSON file(s) to convert
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-v, --verbose increase logging level
-c, --case_map disable case insensitivity and map field with
uppercase chars to lowercase
-l, --lowercase DDL: enable case insensitivity and force all fields to
lowercase - applied before field lookup in mapping
-n, --numeric_overflow
raise exception on numeric overflow
-d, --infer_date infer date string types - supports ISO 8601 for date,
datetime[TZ]
-r, --retain_hyphens disable auto convert hypens to underscores
-e, --error_nested_arrarys
raise exception for nested arrays
-f col1,col2,..., --ignore_fields col1,col2,...
Comma separated fields to ignore
-m filepath, --mapping filepath
JSON filepath to use for mapping field names e.g.
{field_name: new_field_name}
-y filepath, --type_map filepath
JSON filepath to use for mapping field names to known
data types e.g. {key: value}
-p filepath, --partitions_file filepath
DDL: JSON filepath to map parition column(s) e.g.
{column: dtype}
-j, --ignore_malformed_json
DDL: ignore malformed json
-s schema, --schema schema
DDL: schema name
-t table, --table table
DDL: table name
--s3 s3://bucket/key DDL: S3 Key prefix
Options:
TODO
Programmatic Usage:
In [1]: from spectron import ddl
In [2]: %paste
d = {
"uuid": 1234567,
"events": [
{"ts": 0, "status": True, "avg": 0.123},
{"ts": 1, "status": False, "avg": 1.234}
]
}
In [3]: sql = ddl.from_dict(d)
In [4]: print(sql)
CREATE EXTERNAL TABLE {schema}.{table} (
uuid INT,
events array<
struct<
ts: SMALLINT,
status: BOOL,
"avg": FLOAT4
>
>
)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'case.insensitive'='FALSE',
'ignore.malformed.json'='TRUE'
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://{bucket}/{prefix}';
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spectron-0.4.8.tar.gz
(22.3 kB
view details)
Built Distribution
spectron-0.4.8-py3-none-any.whl
(24.5 kB
view details)
File details
Details for the file spectron-0.4.8.tar.gz
.
File metadata
- Download URL: spectron-0.4.8.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1813b3168668b737b3ff8d905e72605d9cd8b4bec390d41aa9a23a20cda78a2a |
|
MD5 | c15e04b15f4298117ea8f3b5fb2239a4 |
|
BLAKE2b-256 | 82f7038637e86345cecd9beb4b95707173784886e4d56f1037771e61c6f1123c |
File details
Details for the file spectron-0.4.8-py3-none-any.whl
.
File metadata
- Download URL: spectron-0.4.8-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c52bb88d772a915049c225a2acdc66253a254106801bbfbe140ba42d94e0367 |
|
MD5 | 2182fc6315bbb38e5b01d4f9e13cbbb5 |
|
BLAKE2b-256 | 4b204c69b7d387a3ca1059b40ccb4b4e3fd3832c94fbdccf47ce6b1e1f8a61d6 |