A package to ease interaction with cloud services, DB connections and commonly used functionalities in data analytics.
Project description
Instackup
This Python library is an open source way to standardize and simplify connections with cloud-based tools, databases and commonly used tools in data manipulation and analysis. It can help BI teams by having a unified source code for local development and testing as well as remote production (automated scheduled run) environments.
This package is compatible with Google Cloud Composer image composer-1.11.1-airflow-1.10.9
.
Index
Current release
Version 0.1.0 (beta)
Prerequisites
- Have a Python 3.6 version or superior installed;
- Create a YAML (or JSON) file with credentials information;
- [Optional but recommended] Configure an Environment Variable that points where the Credentials file is.
1. Have a Python 3.6 version or superior installed
Got to this link e download the most current version that is compatible with this package.
2. Create a YAML (or JSON) file with credentials information
Use the files secret_template.yml or secret_blank.yml as a base or copy and paste the code bellow and modify its values to the ones in your credentials/projects:
#################################################################
# #
# ACCOUNTS CREDENTIALS. DO NOT SHARE THIS FILE. #
# #
# Specifications: #
# - For the credentials you don't have, leave it blank. #
# - Keep Google's secret file in the same folder as this file. #
# - BigQuery project_ids must be strings, i.e., inside quotes. #
# #
# Recommendations: #
# - YAML specification: https://yaml.org/spec/1.2/spec.html #
# - Keep this file in a static path like a folder within the #
# Desktop. Ex.: C:\Users\USER\Desktop\Credentials\secret.yml #
# #
#################################################################
Location: local
Google:
secret_filename: file.json
BigQuery:
project_id:
project_name: "000000000000"
AWS:
access_key: AWSAWSAWSAWSAWSAWSAWS
secret_key: CcasldUYkfsadcSDadskfDSDAsdUYalf
RedShift:
cluster_credentials:
dbname: db
user: masteruser
host: blablabla.random.us-east-2.redshift.amazonaws.com
cluster_id: cluster
port: 5439
master_password:
dbname: db
user: masteruser
host: blablabla.random.us-east-2.redshift.amazonaws.com
password: masterpassword
port: 5439
PostgreSQL:
default:
dbname: postgres
user: postgres
host: localhost
password:
port: 5432
Save this file with .yml
extension in a folder where you know the path won't be modified, like the Desktop folder (Example: C:\Users\USER\Desktop\Credentials\secret.yml
).
If you prefer, you can follow this step using a JSON file instead. Follow the same instructions but using .json
instead of .yml
.
3. [Optional but recommended] Configure an Environment Variable that points where the Credentials file is.
To configure the Environment Variable, follow the instructions bellow, based on your Operating System.
Windows
- Place the YAML (or JSON) file in a folder you won't change its name or path later;
- In Windows Search, type
Environment Variables
and click in the Control Panel result; - Click on the button
Environment Variables...
; - In Environment Variables, click on the button
New
; - In Variable name type
CREDENTIALS_HOME
and in Variable value paste the full path to the recently created YAML (or JSON) file; - Click Ok in the 3 open windows.
Linux/MacOS
- Place the YAML (or JSON) file in a folder you won't change its name or path later;
- Open the file
.bashrc
. If it doesn't exists, create one in theHOME
directory. If you don't know how to get there, open the Terminal, typecd
and then ENTER; - Inside the file, in a new line, type the command:
export CREDENTIALS_HOME="/path/to/file"
, replacing the content inside quotes by the full path to the recently created YAML (or JSON) file; - Save the file and restart all open Terminal windows.
Note: If you don't follow this last prerequisite, you need to set the environment variable manually inside the code. To do that, inside your python code, after the imports, type the command (replacing the content inside quotes by the full path to the recently created YAML (or JSON) file):
os.environ["CREDENTIALS_HOME"] = "/path/to/file"
Installation
Go to the Terminal and type:
pip install instackup
Documentation
Check the documentation by clicking in each topic.
- bigquery_tools
- Global Variables
- BigQueryTool
- __init__(self, authenticate=True)
- query(self, sql_query)
- query_and_save_results(self, sql_query, dest_dataset, dest_table, writing_mode="TRUNCATE", create_table_if_needed=False)
- list_datasets(self)
- create_dataset(self, dataset, location="US")
- list_tables_in_dataset(self, dataset, get=None, return_type="dict")
- get_table_schema(self, dataset, table)
- convert_postgresql_table_schema(self, dataframe, parse_json_columns=True)
- convert_multiple_postgresql_tables_schema(self, dataframe, parse_json_columns=True)
- convert_dataframe_to_numeric(dataframe, exclude_columns=[], **kwargs)
- clean_dataframe_column_names(dataframe, allowed_chars="abcdefghijklmnopqrstuvwxyz0123456789", special_treatment={})
- upload(self, dataframe, dataset, table, **kwargs)
- create_empty_table(self, dataset, table, schema)
- upload_from_gcs(self, dataset, table, gs_path, file_format="CSV", header_rows=1, delimiter=",", encoding="UTF-8", writing_mode="APPEND", create_table_if_needed=False, schema=None)
- upload_from_file(self, dataset, table, file_location, file_format="CSV", header_rows=1, delimiter=",", encoding="UTF-8", writing_mode="APPEND", create_table_if_needed=False, schema=None)
- start_transfer(self, project_path=None, project_name=None, transfer_name=None)
- gcloudstorage_tools
- GCloudStorageTool
- __init__(self, gs_path=None, bucket=None, subfolder="", filename=None, authenticate=True)
- bucket(self) @property
- blob(self) @property
- set_bucket(self, bucket)
- set_subfolder(self, subfolder)
- select_file(self, filename)
- set_by_path(self, gs_path)
- get_gs_path(self)
- list_all_buckets(self)
- get_bucket_info(self, bucket=None)
- get_file_info(self, filename=None, info=None)
- list_contents(self, yield_results=False)
- rename_file(self, new_filename, old_filename) (Not Yet Implemented)
- rename_subfolder(self, new_subfolder) (Not Yet Implemented)
- upload_file(self, filename, remote_path=None)
- upload_subfolder(self, folder_path) (Not Yet Implemented)
- upload_from_dataframe(self, dataframe, file_format='CSV', filename=None, overwrite=False, **kwargs)
- download_file(self, download_to=None, remote_filename=None, replace=False)
- download_subfolder(self) (Not Yet Implemented)
- download_on_dataframe(self, **kwargs)
- download_as_string(self, remote_filename=None, encoding="UTF-8")
- delete_file(self) (Not Yet Implemented)
- delete_subfolder(self) (Not Yet Implemented)
- GCloudStorageTool
- general_tools
- gsheets_tools
- GSheetsTool
- __init__(self, sheet_url=None, sheet_key=None, sheet_gid=None, auth_mode='secret_key', read_only=False, scopes=['https://www.googleapis.com/auth/spreadsheets', 'https://www.googleapis.com/auth/drive'])
- set_spreadsheet_by_url(self, sheet_url)
- set_spreadsheet_by_key(self, sheet_key)
- set_worksheet_by_id(self, sheet_gid)
- download(self)
- upload(self, dataframe, write_mode="TRUNCATE") (Not Yet Implemented)
- GSheetsTool
- heroku_tools
- redshift_tools
- RedShiftTool
- __init__(self, connect_by_cluster=True)
- connect(self, fail_silently=False)
- commit(self)
- rollback(self)
- close_connection(self)
- execute_sql(self, command, fail_silently=False)
- query(self, sql_query, fetch_through_pandas=True, fail_silently=False)
- unload_to_S3(self, redshift_query, s3_path, filename, unload_options="MANIFEST GZIP ALLOWOVERWRITE REGION 'us-east-2'")
- RedShiftTool
- s3_tools
- S3Tool
- __init__(self, bucket=None, subfolder="", s3_path=None)
- bucket(self) @property
- set_bucket(self, bucket)
- set_subfolder(self, subfolder)
- set_by_path(self, s3_path)
- get_s3_path(self)
- rename_file(self, new_filename, old_filename)
- rename_subfolder(self, new_subfolder)
- list_all_buckets(self)
- list_contents(self, yield_results=False)
- upload_file(self, filename, remote_path=None)
- upload_subfolder(self, folder_path) (Not Yet Implemented)
- download_file(self, remote_path, filename=None)
- download_subfolder(self) (Not Yet Implemented)
- delete_file(self, filename, fail_silently=False)
- delete_subfolder(self)
- S3Tool
- sql_tools
Version logs
See what changed in every version.
- Beta releases
- Version 0.1.0 (current release)
- Alpha releases
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for instackup-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb147b9a61b6e20eee24a329759cf5d4d345ee5561755df414ef1ba2a037e76e |
|
MD5 | 5c93594e48a8d28870699f59818a4afd |
|
BLAKE2b-256 | bbfe8db519029c6a9998c38287fe938b043332549013cc413b3fba64b22c3c59 |