Hive date ranges simplified.
Project description
When your Hive partitions are YYYY, MM, DD, HH and not YYYYMMDDHH, composing date ranges with them can get out of control.
hpdr solves this problem.
>>> from hpdr import api
>>> rng = api.build('2016102612', '2017122612').partition_range
>>> print(rng.build_display())
((YYYY=2016 AND MM=10 AND DD=26 AND HH>=12) OR (YYYY=2016 AND MM=10 AND DD>26)
OR (YYYY=2016 AND MM>10) OR (YYYY=2017 AND MM<12) OR (YYYY=2017 AND MM=12 AND DD<26)
OR (YYYY=2017 AND MM=12 AND DD=26 AND HH<12))
>>> print(rng.build_display(pretty=True))
(
(YYYY=2016 AND MM=10 AND DD=26 AND HH>=12)
OR (YYYY=2016 AND MM=10 AND DD>26)
OR (YYYY=2016 AND MM>10)
OR (YYYY=2017 AND MM<12)
OR (YYYY=2017 AND MM=12 AND DD<26)
OR (YYYY=2017 AND MM=12 AND DD=26 AND HH<12)
)
Maybe you think in local time but store your data in UTC?
>>> from hpdr import api
>>> rng = api.build('2016102612', '2017122612',
... dzone='America/Los_Angeles',
... qzone='UTC').partition_range
>>> print(rng.build_display(pretty=True))
(
(YYYY=2016 AND MM=10 AND DD=26 AND HH>=19)
OR (YYYY=2016 AND MM=10 AND DD>26)
OR (YYYY=2016 AND MM>10)
OR (YYYY=2017 AND MM<12)
OR (YYYY=2017 AND MM=12 AND DD<26)
OR (YYYY=2017 AND MM=12 AND DD=26 AND HH<20)
Or maybe your date range is too large to run in one query, and it’s a pain to break it down?
import subprocess, os, os.path, tempfile, datetime
from hpdr import api
QUERY_FILE = 'myquery.hql'
OUT_FILE = 'out.txt'
begin = datetime.datetime(2016, 11, 1)
end = datetime.datetime(2016, 11, 30)
step = '5days'
with open(QUERY_FILE) as f:
template = f.read()
specs = api.build_with_steps(begin=begin, end=end, step=step)
if os.path.isfile(OUT_FILE):
os.remove(OUT_FILE)
for spec in specs:
query = spec.substitute(template)
with tempfile.NamedTemporaryFile() as f:
f.write(query)
f.flush()
cmd = ['/usr/bin/hive', '-f', f.name]
print(spec.partition_range.build_display())
with open(OUT_FILE, 'a') as outfile:
subprocess.check_call(cmd, stdout=outfile)
This prints:
(YYYY=2016 AND MM=11 AND DD>=01 AND DD<06) (YYYY=2016 AND MM=11 AND DD>=06 AND DD<11) (YYYY=2016 AND MM=11 AND DD>=11 AND DD<16) (YYYY=2016 AND MM=11 AND DD>=16 AND DD<21) (YYYY=2016 AND MM=11 AND DD>=21 AND DD<26) (YYYY=2016 AND MM=11 AND DD>=26 AND DD<30)
It runs 6 Hive queries built from a template containing HPDR_ variables. Something like this:
SELECT YEAR(event_timestamp), MONTH(event_timestamp), DAY(event_timestamp), FROM my_table WHERE event_timestamp >= '${HPDR_begin_ts}' AND event_timestamp < '${HPDR_end_ts}' AND ${HPDR_range}
The first query looks like this.
SELECT YEAR(event_timestamp), MONTH(event_timestamp), DAY(event_timestamp), FROM my_table WHERE event_timestamp >= '2016-11-01 00:00:00' AND event_timestamp < '2016-11-06 00:00:00' AND (YYYY=2016 AND MM=11 AND DD>=01 AND DD<06)
The full list of HPDR_ variables available for that first query is:
variable value --------------------------- ------------------- HPDR_dzone UTC HPDR_qzone UTC HPDR_begin_ts 2016-11-01 00:00:00 HPDR_end_ts 2016-11-06 00:00:00 HPDR_slop_begin_ts 2016-11-01 00:00:00 HPDR_slop_end_ts 2016-11-06 00:00:00 HPDR_begin_unixtime 1477983600 HPDR_begin_unixtime_ms 1477983600000 HPDR_begin_yyyymmdd 20161101 HPDR_begin_yyyy 2016 HPDR_begin_mm 11 HPDR_begin_dd 01 HPDR_begin_hh 00 HPDR_begin_min 00 HPDR_begin_sec 00 HPDR_end_unixtime 1478415600 HPDR_end_unixtime_ms 1478415600000 HPDR_end_yyyymmdd 20161106 HPDR_end_yyyy 2016 HPDR_end_mm 11 HPDR_end_dd 06 HPDR_end_hh 00 HPDR_end_min 00 HPDR_end_sec 00 HPDR_slop_begin_unixtime 1477983600 HPDR_slop_begin_unixtime_ms 1477983600000 HPDR_slop_begin_yyyymmdd 20161101 HPDR_slop_begin_yyyy 2016 HPDR_slop_begin_mm 11 HPDR_slop_begin_dd 01 HPDR_slop_begin_hh 00 HPDR_slop_begin_min 00 HPDR_slop_begin_sec 00 HPDR_slop_end_unixtime 1478415600 HPDR_slop_end_unixtime_ms 1478415600000 HPDR_slop_end_yyyymmdd 20161106 HPDR_slop_end_yyyy 2016 HPDR_slop_end_mm 11 HPDR_slop_end_dd 06 HPDR_slop_end_hh 00 HPDR_slop_end_min 00 HPDR_slop_end_sec 00
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hpdr-1.0.0.tar.gz
(13.0 kB
view details)
File details
Details for the file hpdr-1.0.0.tar.gz
.
File metadata
- Download URL: hpdr-1.0.0.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0680e463d45291dd949d28eb1901bc6644bf680ff817c2daa422e6a6438f791 |
|
MD5 | 53e3f2db8b736fa496ed47d03ab8bb55 |
|
BLAKE2b-256 | a2abd3e7c1a6ff25f9697e89d85889473dd6ab178babb02d62fd9574d51074b4 |