Skip to main content

A highly-opinionated Python library for creating ELT data pipelines on Google Cloud.

Project description

konekuta

Read the Docs GitHub Actions Codecov.io Python Package Index

konekuta is a highly-opinionated Python library for creating ELT data pipelines with Google Cloud.

It is developed to reduce boilerplate code and improve code maintainability across multiple data pipelines leveraging on the same data architecture. Though efforts are made to improve reusability, this data pipeline library currently only supports a highly-opinionated data architecture and is only compatible on Google Cloud Platform.

konekuta supports the following Python versions:

  • Python 3.7
  • Python 3.8

Documentation

Documentation is available at https://konekuta.readthedocs.io.

Changelog

All notable changes to this project will be documented in this file.

0.6.0 - 2020-09-02

Added

  • persist_state decorator for persisting state across pipeline stages (#163)

Fixed

  • Outdated example not using updated schema format (#164)

0.5.0 - 2020-08-07

Added

  • parse_timestamp utility function for parsing raw timestamps (#136)
  • Support for advanced backfilling via scheduled_timestamp parameter (#136)
  • Support for advanced date ranges via offset parameter (#155)

Fixed

  • Incompatible sed expression on MacOS environments (#138) (#156)

Removed

  • Drop Python 3.6 support (#137)

0.4.0 - 2020-06-08

Added

  • format_dates flag to standardise date format in data files (#92)
  • File sizes will now be logged (#93)
  • get_unix_timestamp utility function for converting dates to unix time (#94)
  • remove_leading_rows and remove_trailing_rows transformer functions (#97) (#120)
  • get_row_matching_prefix function to return row number with matching prefix (#98) (#124)

0.3.0 - 2020-05-18

Added

  • Calling the Pipeline.run() method automatically cleans up files when done (#83)
  • Pipeline.extract() method now supports multiple raw file uploads (#85)
  • split_data() function will now log an error if no files were generated (#86)
  • Added strict and debug runtime modes (#87)

Fixed

  • Fixed empty variable in log message (#82)

0.2.0 - 2020-04-28

Added

  • Added schema helper functions (#51)
  • Added function for updating BigQuery table attributes (#52)
  • Added function for checking Google Compute instance (#57)
  • Added JSON log formatter for GKE (#62)
  • Added dynamic schema feature (#66)

Changed

  • Changed Google API authentication helpers (#49)
  • Improved date range derivation logic (#61)
  • Changed Cloud Storage directory names and structure (#68)
  • Changed supported schema format (#75) (#77)

Fixed

  • Fixed deprecation warnings from BigQuery Python SDK (#52)
  • Fixed silenced Read the Docs build errors (#55)

0.1.0 - 2020-03-09

Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for konekuta, version 0.6.0
Filename, size File type Python version Upload date Hashes
Filename, size konekuta-0.6.0-py3-none-any.whl (23.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size konekuta-0.6.0.tar.gz (21.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page