Pyplatform provides wrapper functions for using Google BigQuery as datawarehouse and creating data pipelines involving Google Cloud, Microsoft Azure, O365, and Tableau Server as source and destination.
Project description
Pyplatform provides wrapper functions for using Google BigQuery as datawarehouse and creating data pipelines involving Google Cloud, Microsoft Azure, O365, and Tableau Server as source and destination.
the platorm architecture:
- enables fast and scalable SQL datawarehousing service
- abstracts away the infrastuture by builiding data pipelines with serverless compute solutions in python runtime environments
- simplifies development environment by using jupyter lab as the main tool
Installation
pip install pyplatform
Setting up development environment
git clone https://github.com/mhadi813/pyplatform
cd pyplatform
conda env create -f pyplatform_dev.yml
Environment variables
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/default_service_account.json'
os.environ['DATASET'] = 'default_bigquery_dataset_name'
os.environ['STORAGE_BUCKET'] = 'default_storage_bucket_id'
Usage
common data pipeline architectures:
- Http sources
- On-prem servers
- Bigquery integration with Azure Logic Apps
- Event driven ETL process
- Streaming pipelines
Exploring modules
import pyplatform as pyp
pyp.show_me()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyplatform-2020.7.2.tar.gz
(2.5 kB
view hashes)
Built Distribution
Close
Hashes for pyplatform-2020.7.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c6054d451d3dbf84a9cad1db82a9aa93981a7aaef0cf64a4ab4bf8cfe428339 |
|
MD5 | 70bde897d8b00ffb7281fa327156aa52 |
|
BLAKE2b-256 | 40271f94e0faccee3a90cc4d15e17787fea10f1a39a6fa7d77f091b850aa8fcf |