AWS Redshift Spectrum utilities.
Project description
[WIP] spectron
Generate AWS Athena and Spectrum DDL from JSON
Install:
pip install spectron[json]
CLI Usage:
spectron nested_big_data.json > nested_big_data.sql
positional arguments:
infile JSON to convert
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-c, --case_map disable case insensitivity and map field with
uppercase chars to lowercase
-l, --lowercase DDL: enable case insensitivity and force all fields to
lowercase - applied before field lookup in mapping
-d, --infer_date infer date string types - supports ISO 8601 for date,
datetime[TZ]
-r, --retain_hyphens disable auto convert hypens to underscores
-e, --error_nested_arrarys
raise exception for nested arrays
-f col1,col2,..., --ignore_fields col1,col2,...
Comma separated fields to ignore
-m filepath, --mapping filepath
JSON filepath to use for mapping field names e.g.
{field_name: new_field_name}
-y filepath, --type_map filepath
JSON filepath to use for mapping field names to known
data types e.g. {key: value}
-p filepath, --partitions_file filepath
DDL: JSON filepath to map parition column(s) e.g.
{column: dtype}
-j, --ignore_malformed_json
DDL: ignore malformed json
-s schema, --schema schema
DDL: schema name
-t table, --table table
DDL: table name
--s3 s3://bucket/key DDL: S3 Key prefix
Options:
TODO
Programmatic Usage:
In [1]: from spectron import ddl
In [2]: %paste
d = {
"uuid": 1234567,
"events": [
{"ts": 0, "status": True, "avg": 0.123},
{"ts": 1, "status": False, "avg": 1.234}
]
}
In [3]: sql = ddl.from_dict(d)
In [4]: print(sql)
CREATE EXTERNAL TABLE {schema}.{table} (
uuid INT,
events array<
struct<
ts: SMALLINT,
status: BOOL,
"avg": FLOAT4
>
>
)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'case.insensitive'='FALSE',
'ignore.malformed.json'='TRUE'
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://{bucket}/{prefix}';
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spectron-0.4.6.tar.gz
(21.7 kB
view hashes)
Built Distribution
spectron-0.4.6-py3-none-any.whl
(24.1 kB
view hashes)