Accessor to Google Cloud Storage and Big Query
Project description
This software is released under the MIT License, see LICENSE.txt.
gcp-accessor
is a wrapper of Google Cloud Storage API and Google BigQuery API for more simply accesseing both functions.
While there are google-cloud-storage and google-cloud-bigquery library provided by Google which have several sophisticated features, frequently used features can be limited in python application development.
gcp-accessor
focuses on the simplicity of usage pattern and includes the selected features as described below.
- bq_accessor
- get_dataset
- get_table_name
- check_if_dataset_exists
- check_if_table_exists
- create_table_from_json
- load_data_from_gcs
- execute_query
- gcs_accessor
- get_blob
- get_blob_list
- get_uris_list
- upload_csv_gzip
- download_csv_gzip
Setup
Installation
pip install gcp-accessor
Set GOOGLE_APPLICATION_CREDENTIALS
export GOOGLE_APPLICATION_CREDENTIALS='full path to credential key json file'
Usage of Google Cloud Storage Accessor
Import gcp_accessor and create google cloud storage client
import gcp_accessor
gcs = gcp_accessor.GoogleCloudStorageAccessor()
Get Binary Large Object (blob) from google cloud storage (gcs)
gcs.get_blob('bucket_name', 'full path to file on gcs')
Get Binary Large Object (blob) lists as HTTPIterator object from gcs
gcs.get_blob_list('bucket_name', prefix=None, delimiter=None)
Get uris lists
gcs.get_uris_list('bucket_name', prefix=None, delimiter=None)
Upload gzip object on gcs
gcs.upload_csv_gzip('bucket_name', 'full path to file on gcs', 'texts')
Download gzip file from gcs and load it through in-memory
gcs.download_csv_gzip('bucket_name', 'full path to file on gcs')
Usage of Big Query Accessor
Import gcp_accessor and create big query client
import gcp_accessor
bq = gcp_accessor.BigQueryAccessor()
Get datasets if datasets do not exist, then return empty list
bq.get_dataset()
Get table names if table names do not exist, then return exception error message
bq.get_table_name('dataset_name')
Check if dataset exsists in bigquery and then return True or False
bq.check_if_dataset_exists('dataset_name')
Check if table exsists in bigquery and then return True or False
bq.check_if_table_exists('table_name')
Create table on bigquery based on schema json file
bq.create_table_from_json('path_schema_file', 'dataset', 'table_name')
Load data from gcs (You have to upload file on gcs.).
bq.load_data_from_gcs(
'dataset_name',
'uris',
'table_name',
location="US",
skip_leading_rows=0,
source_format="CSV",
create_disposition="CREATE_NEVER",
write_disposition="WRITE_EMPTY",
)
Execute a simple query or query with the below optipons
bq.execute_query(
'query',
location="US",
timeout=30,
page_size=0,
project=None,
allow_large_results=False,
destination=None,
destination_encryption_configuration=None,
dry_run=False,
labels=None,
priority=None,
query_parameters=None,
schema_update_options=None,
table_definitions=None,
time_partitioning=None,
udf_resources=None,
use_legacy_sql=False,
use_query_cache=False,
write_disposition=None
)
Note
Some argument names and descriptions about each argument are cited and referred from the documents of 'Google Cloud Client Libraries for Python' The explanations about each argument are written in the code.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gcp_accessor-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 017d31d83da2dda76b79262fd6a90514f206b7d3612fb8f3fba134f6c0b494e5 |
|
MD5 | 6980665f24b151879246cf701be3c150 |
|
BLAKE2b-256 | fae691ed6f23216a8389a3e5265e0a5a95012fa905d61a7ee4fb79a80d5e3d38 |