Skip to main content

A package for working with GCS and BigQuery

Project description

MOOVAI TOOLBOX

This repository contains reusable code to expedite development. To use repository, ensure you are using Python 3.6.

Installation:

To use this package, install using pip:

pip install moovai

Google Cloud:

This folder contains code to handle GCP resources. Prior to using the methods here, you need to be authenticated to GCP. If you are using a service account, and have the key JSON file:

  • On Linux or MacOS
export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
  • On Windows:
set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

cloud-storage

Before using the code here, first make sure that you have a bucket on GCS. If you don't, create one.

upload_to_gcs:

Uploads a local file to a folder on google cloud storage. The file is then DELETED locally.

Parameters:
  • file: REQUIRED: [STRING] Local file path to upload to google cloud storage.
  • bucket_name: REQUIRED: [STRING] Name of your bucket.
  • folder: REQUIRED: [STRING] Folder path to where you want your file to be uploaded to on GCS.

returns: GCS URI where the file was uploaded

Sample usage:
from moovai.google_cloud import cloud_storage

file = "/path/to/my/file.txt"
bucket_name = "my_bucket"
folder = "Folder/Subfolder/"

cloud_storage.upload_to_gcs(file, bucket_name, folder)
# returns "gs://my_bucket/Folder/Subfolder/file.txt"
download_file_gcs:

Downloads file from google cloud storage to local disk.

Parameters:
  • gcs_uri: REQUIRED. [STRING] URI of file located on google cloud storage to be downloaded

returns: None. GCS file is downloaded locally.

Sample usage:
from moovai.google_cloud import cloud_storage

gcs_uri = "gs://my_bucket/Folder/Subfolder/my_file.txt"

cloud_storage.download_file_gcs(gcs_uri)
# file "my_file.txt" downloaded locally

bigquery

get_schema_from_json:

Takes a schema.json file and converts it into a schema file compatible with BigQuery.

Parameters:
  • schema_path: REQUIRED. [STRING] Path to your_schema.json file.

returns: schema to plug into BigQuery upload job.

Sample usage:
from moovai.google_cloud import bigquery

schema_file = "/path/to/my/schema.json"
bigquery.get_schema_from_json(schema_file)
get_schema_from_csv:

Takes a csv file containing your data and extracts the schema from your file. It is recommended to simply use BigQuery's auto-detect schema feature. Use this as a last resort.

Parameters:
  • csv_file_path: REQUIRED. [STRING] Path to csv file.

returns: schema to plug into BigQuery upload job.

Sample usage:
from moovai.google_cloud import bigquery

csv_file = "/path/to/my/file.csv"
bigquery.get_schema_from_csv(csv_file)
upload_local_file_to_bq:

Uploads local file to BigQuery. schema_path and schema are optional arguments. They are exclusive of one another, provide only one if you want to.

Parameters:
  • file: REQUIRED. [STRING] Path to local CSV file to upload
  • dataset_id: REQUIRED. [STRING] Name of your BigQuery dataset.
  • table_id: REQUIRED. [STRING] Name of your BigQuery table to query.
  • schema_path: OPTIONAL. [STRING] Path to schema.json file.
  • schema: OPTIONAL. [STRING] schema compatible with BigQuery.
  • overwrite: OPTIONAL. [BOOL] Defaults to False. if set to True, BigQuery will overwrite table, else, it will append new data to table.

returns: None.

Sample usage:
from moovai.google_cloud import bigquery

file = "/path/to/my_file.csv"
dataset_id = "my_dataset_id"
table_id = "my_table_id"
bigquery.upload_local_file_to_bq(file, dataset_id, table_id)
upload_gcs_file_to_bq:

Uploads file from cloud storage to BigQuery. schema_path and schema are optional arguments. They are exclusive of one another, provide only one if you want to.

Parameters:
  • gcs_uri: REQUIRED. [STRING] Path to CSV file located on cloud storage to upload to BigQuery.
  • dataset_id: REQUIRED. [STRING] Name of your BigQuery dataset.
  • table_id: REQUIRED. [STRING] Name of your BigQuery table to query.
  • schema_path: OPTIONAL. [STRING] Path to schema.json file.
  • schema: OPTIONAL. [STRING] schema compatible with BigQuery.
  • overwrite: OPTIONAL. [BOOL] Defaults to False. if set to True, BigQuery will overwrite table, else, it will append new data to table.

returns: None.

Sample usage:
from moovai.google_cloud import bigquery

gcs_uri = "gs://my_bucket/Path/To/my_file.csv"
dataset_id = "my_dataset_id"
table_id = "my_table_id"
bigquery.upload_local_file_to_bq(gcs_uri, dataset_id, table_id)
generate_sql_query:

Generates a SQL query string to use to query a BigQuery table.

Parameters:
  • project: REQUIRED. [STRING] Project ID
  • dataset_id: REQUIRED. [STRING] Name of your BigQuery dataset.
  • table_id: REQUIRED. [STRING] Name of your BigQuery table to query.
  • columns: OPTIONAL. [ARRAY] List of column names you want to select.
  • conditions: OPTIONAL. [ARRAY] List of conditions to satisfy.

returns: STRING. Standard SQL query string.

Sample usage:
from moovai.google_cloud import bigquery

project = "my_project_id"
dataset_id = "my_dataset_id"
table_id = "my_table_id"
bigquery.generate_sql_query(project, dataset_id, table_id)
#returns "SELECT * FROM `project.dataset_id.table_id`" (return everything from table)

columns = ["col1", "col2"]
conditions = ["Date >= TIMESTAMP('2019-05-01')", "col3 < 3"]

bigquery.generate_sql_query(project, dataset_id, table_id, columns=columns, conditions=conditions)
#returns "SELECT col1, col2 FROM `project.dataset_id.table_id` WHERE Date >= TIMESTAMP('2019-05-01') AND col3 < 3" (returns col1 and col2 that meet the specified conditions)
get_data_from_bq:

Takes a SQL query string (Standard SQL) and returns pandas dataframe

Parameters:
  • sql_query: REQUIRED. [STRING] string representing the query you want to make.

returns: pandas Dataframe containg the result of your query.

Sample usage:
from moovai.google_cloud import bigquery

sql_query = "SELECT * FROM `my_project.my_dataset.my_table`"
bigquery.get_data_from_bq(sql_query)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moovai-0.0.9.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

moovai-0.0.9-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file moovai-0.0.9.tar.gz.

File metadata

  • Download URL: moovai-0.0.9.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for moovai-0.0.9.tar.gz
Algorithm Hash digest
SHA256 efb75910d3f30c46817bfd5dad48dd092354c757207a79d17ed444df5fc1c43b
MD5 590a6433cbd0b466bdc94d363f0f2c5a
BLAKE2b-256 3d60597c66c91f9cbcee74927b2964656c458bbc3da176846da6e56e77cce768

See more details on using hashes here.

File details

Details for the file moovai-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: moovai-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for moovai-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 17bcd52431597cd468cb85fadc90b4144f5e418372455f4264322f35d6e28f89
MD5 71c0164b837e92994136321d66bd7f15
BLAKE2b-256 281870787fcc932f1da45183693470e56b07bdc0bc72396f22528ba085d3ba92

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page