Skip to main content

An ETL library for Google BigQuery

Project description

Popelines

This is a simple ETL tool for BigQuery, named for the author's surname.

Popelines provides some basic functionality, such as writing to line-delimited JSON, writing to BigQuery, chunking dates, and other tools that are often needed when writing an ETL. It's sparse now, but I plan to expand it to include other Google Cloud functionalities.

Install

To install popelines:

$ pip install popelines

Usage

To get started:

import popelines

pope = popelines.popeline(dataset_id='', service_key_file_loc=None, directory='.', verbose=False)

Providing a dataset_id is required. Everything else is optional - a service key will be inferred from your GOOGLE_ACCOUNT_CREDENTIALS env variable if not provided in service_key_file_loc, and directory defaults to the current directory if not provided.

Popelines does some big handy things like you might expect:

# write a dict to line-delimited JSON, perfect for uploading to BQ
pope.write_to_json(file_name=file_name, jayson=your_dict, mode='w')

# then you can turn around and upload that line-delimtited JSON...
pope.write_to_bq(table_name=table_name, file_name=file_name, append=True, 
    ignore_unknown_values=False, bq_schema_autodetect=False)

# or you can write it to GCS! leave bucket_name=None and popelines
# will try to upload to a bucket with the dataset_id you gave when you
# first initialized your pope object!
pope.write_to_gcs(gcs_path='folder/file.py', file_name='file.py', bucket_name=None)

# you can even call your API endpoints! This method returns a dict of data.
data = pope.call_api(url=url, method='GET', headers=None, params=None, data=None)

Popelines also does small handy things:

# get a logger at your chosen verbosity and use it to log things
log = pope.log
log.info('Does the code get to this point?')

# chunk a date range into chunks n-days large
start_datetime = datetime.datetime(2018, 3, 1)
end_datetime = datetime.datetime(2018, 9, 1)
for day in pope.chunk_date_range(start_datetime=start, end_datetime=end, chunk_size=1):
    print(f"I think I may have been drunk on {day}, can you name another date?")

# find the last entry in a table - basically, query for the MAX() of a column
latest_day = pope.find_last_entry(table_name='my_table', date_column='day')

Finally, Popelines even does weird experimental things:

# messed up JSON keys? fix_json_keys takes your dict obj and a callback
# function and applies the callback to each key recursively!
my_good_json = pope.fix_json_keys(obj=my_bad_json, callback=key_fixing_function)

# if your JSON values are messed up, have no fear! There is a similar 
# function for that!
my_good_json = pope.fix_json_values(obj=my_bad_json, callback=value_fixing_function)

Note that key_fixing_function should take one argument (the key) while value_fixing_function must handle both a value and a key as arguments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

popelines-0.1.22.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

popelines-0.1.22-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file popelines-0.1.22.tar.gz.

File metadata

  • Download URL: popelines-0.1.22.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.8.0

File hashes

Hashes for popelines-0.1.22.tar.gz
Algorithm Hash digest
SHA256 e4a8ea0301a3b5f7d79d0156b37da4ffb27945de39769f1feda84c298ab883d7
MD5 d28c5aad30f932be6ac065e378426094
BLAKE2b-256 ebcf005bf12c10636f1c2b1b3e65a390f00815db7c4087d788ff6928419f22c6

See more details on using hashes here.

File details

Details for the file popelines-0.1.22-py3-none-any.whl.

File metadata

  • Download URL: popelines-0.1.22-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.8.0

File hashes

Hashes for popelines-0.1.22-py3-none-any.whl
Algorithm Hash digest
SHA256 fe2d3a44f4018677462abdf3789ed1a9a419afd95dfbf1abb7d60cc57d3c837d
MD5 56b106812d38ae5e25109d867af59f16
BLAKE2b-256 463107b465e97cb589b3e1a08dd9d353cc90b90eef0937bb5cba317023c33ff0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page