An ETL library for Google BigQuery
Project description
Popelines
This is a simple ETL tool for BigQuery, named for the author's surname.
Popelines provides some basic functionality, such as writing to line-delimited JSON, writing to BigQuery, chunking dates, and other tools that are often needed when writing an ETL. It's sparse now, but I plan to expand it to include other Google Cloud functionalities.
Install
To install popelines:
$ pip install popelines
Usage
To get started:
import popelines
pope = popelines.popeline(dataset_id='', service_key_file_loc=None, directory='.', verbose=False)
Providing a dataset_id
is required. Everything else is optional - a service key will be inferred from your GOOGLE_ACCOUNT_CREDENTIALS env variable if not provided in service_key_file_loc
, and directory
defaults to the current directory if not provided.
Popelines does some big handy things like you might expect:
# write a dict to line-delimited JSON, perfect for uploading to BQ
pope.write_to_json(file_name=file_name, jayson=your_dict, mode='w')
# then you can turn around and upload that line-delimtited JSON...
pope.write_to_bq(table_name=table_name, file_name=file_name, append=True,
ignore_unknown_values=False, bq_schema_autodetect=False)
# or you can write it to GCS! leave bucket_name=None and popelines
# will try to upload to a bucket with the dataset_id you gave when you
# first initialized your pope object!
pope.write_to_gcs(gcs_path='folder/file.py', file_name='file.py', bucket_name=None)
# you can even call your API endpoints! This method returns a dict of data.
data = pope.call_api(url=url, method='GET', headers=None, params=None, data=None)
Popelines also does small handy things:
# get a logger at your chosen verbosity and use it to log things
log = pope.log
log.info('Does the code get to this point?')
# chunk a date range into chunks n-days large
start_datetime = datetime.datetime(2018, 3, 1)
end_datetime = datetime.datetime(2018, 9, 1)
for day in pope.chunk_date_range(start_datetime=start, end_datetime=end, chunk_size=1):
print(f"I think I may have been drunk on {day}, can you name another date?")
# find the last entry in a table - basically, query for the MAX() of a column
latest_day = pope.find_last_entry(table_name='my_table', date_column='day')
Finally, Popelines even does weird experimental things:
# messed up JSON keys? fix_json_keys takes your dict obj and a callback
# function and applies the callback to each key recursively!
my_good_json = pope.fix_json_keys(obj=my_bad_json, callback=key_fixing_function)
# if your JSON values are messed up, have no fear! There is a similar
# function for that!
my_good_json = pope.fix_json_values(obj=my_bad_json, callback=value_fixing_function)
Note that key_fixing_function
should take one argument (the key) while value_fixing_function
must handle both a value and a key as arguments.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file popelines-0.1.22.tar.gz
.
File metadata
- Download URL: popelines-0.1.22.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4a8ea0301a3b5f7d79d0156b37da4ffb27945de39769f1feda84c298ab883d7 |
|
MD5 | d28c5aad30f932be6ac065e378426094 |
|
BLAKE2b-256 | ebcf005bf12c10636f1c2b1b3e65a390f00815db7c4087d788ff6928419f22c6 |
File details
Details for the file popelines-0.1.22-py3-none-any.whl
.
File metadata
- Download URL: popelines-0.1.22-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe2d3a44f4018677462abdf3789ed1a9a419afd95dfbf1abb7d60cc57d3c837d |
|
MD5 | 56b106812d38ae5e25109d867af59f16 |
|
BLAKE2b-256 | 463107b465e97cb589b3e1a08dd9d353cc90b90eef0937bb5cba317023c33ff0 |