Skip to main content

Functionalities to interact with Google and Azure, and clean data

Project description

do-data-utils

This package provides you the functionalities to connect to different cloud sources and data cleaning functions. Package repo on PyPI: do-data-utils - PyPI

Installation

Commands

To install the latest version from main branch, use the following command:

pip install do-data-utils

You can install a specific version, for example,

pip install do-data-utils==2.0.0

Install in requirements.txt

You can also put this source in the requirements.txt.

# requirements.txt

do-data-utils==2.0.0

Available Subpackages

  • google – Utilities for Google Cloud Platform.
  • azure – Utilities for Azure services.

For a full list of functions, see the overview documentation.

Example Usage

The concept of using this revolves around the idea that:

  1. You keep service account JSON secrets (for cloud services) in GCP secret manager
  2. You have local JSON secret file for accessing the GCP secret manager
  3. Retrive the secret you want to interact with cloud platform from GCP secret manager
  4. Do your stuff...

Google

GCS

Download
from do_data_utils.google import get_secret, gcs_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gcs-secret-id-dev')

# Download a csv file to DataFrame
gcspath = 'gs://my-ai-bucket/my-path-to-csv.csv'
df = gcs_to_df(gcspath, secret, polars=False)
from do_data_utils.google import get_secret, gcs_to_dict


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gcs-secret-id-dev')

# Download the content from GCS
gcspath = 'gs://my-ai-bucket/my-path-to-json.json'
my_dict = gcs_to_dict(gcspath, secret=secret)
Upload
from do_data_utils.google import get_secret, dict_to_json_gcs


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gcs-secret-id-dev')

my_setting_dict = {
    'param1': 'abc',
    'param2': 'xyz',
}

gcspath = 'gs://my-bucket/my-path-to-json.json'
dict_to_json_gcs(dict_data= my_setting_dict, gcspath=gcspath, secret=secret)

GBQ

from do_data_utils.google import get_secret, gbq_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gbq-secret-id-dev')

# Query
query = 'select * from my-project.my-dataset.my-table'
df = gbq_to_df(query, secret, polars=False)

Azure/Databricks

from do_data_utils.azure import databricks_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='databricks-secret-id-dev')

# Download from Databricks sql
query = 'select * from datadev.dsplayground.my_table'
df = databricks_to_df(query, secret, polars=False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

do_data_utils-2.0.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

do_data_utils-2.0.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file do_data_utils-2.0.0.tar.gz.

File metadata

  • Download URL: do_data_utils-2.0.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for do_data_utils-2.0.0.tar.gz
Algorithm Hash digest
SHA256 268031c3d9cd34f161736c30918e0dba3031ab07a740da090eecf045c00dd774
MD5 11eb225b5a363a38b38e545caa7fe3d5
BLAKE2b-256 663d52512f41945c72c4163f95a07c5c0e91744982c2d9dc9906918e0f37fa6e

See more details on using hashes here.

File details

Details for the file do_data_utils-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: do_data_utils-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for do_data_utils-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71797812c93cb36e66261f6ba824f06fc50453df607ed261d1efb641c648d7f4
MD5 d2d2da4557f077511032c72e5bdbf920
BLAKE2b-256 c99a419fada15db4e8d3df767fa70877fdc7bbcf9d32e3f6771f053f52e30f59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page