Skip to main content

Functionalities to interact with Google and Azure, and clean data

Project description

datautils

This package provides you the functionalities to connect to different cloud sources and data cleaning functions.

Installation

Commands

To install the latest version from main branch, use the following command:

pip install "git+https://github.com/anuponwa/datautils.git"

You can install a specific version like so:

pip install "git+https://github.com/anuponwa/datautils.git@<version>"

For example,

pip install "git+https://github.com/anuponwa/datautils.git@1.1.0"

Extra options can be inspected in setup.py in the extras_require option.

Install in requirements.txt

You can also put this source in the requirements.txt.

# requirements.txt
git+https://github.com/anuponwa/datautils.git@1.1.0

Available Subpackages

  • google – Utilities for Google Cloud Platform.
  • azure – Utilities for Azure services.

For a full list of functions, see the overview documentation.

Example Usage

Google

GCS

from datautils.google import get_secret, gcs_to_file


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gcs-secret-id-dev')

# Download the content from GCS
gcspath = 'gs://my-ai-bucket/my-path-to-json.json'
f = gcs_to_file(gcspath, secret=secret)
my_dict = json.load(f)

GBQ

from datautils.google import get_secret, gbq_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gbq-secret-id-dev')

# Query
query = 'select * from my-project.my-dataset.my-table'
df = gbq_to_df(query, secret, polars=False)

Azure/Databricks

from datautils.azure import databricks_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='databricks-secret-id-dev')

# Download from Databricks sql
query = 'select * from datadev.dsplayground.my_table'
df = databricks_to_df(query, secret, polars=False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

do_data_utils-1.1.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

do_data_utils-1.1.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file do_data_utils-1.1.1.tar.gz.

File metadata

  • Download URL: do_data_utils-1.1.1.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for do_data_utils-1.1.1.tar.gz
Algorithm Hash digest
SHA256 6732c1a490ca105c14c130d65f0ca38ea8869fd90223a077976713c14bc85ab1
MD5 fc3e846665bf46459cdb1964e11fdc12
BLAKE2b-256 d543aee52c4eb2705e7da723d1fbcac9ee01cf2ffcac72ebe39bf2bbffd72b46

See more details on using hashes here.

File details

Details for the file do_data_utils-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: do_data_utils-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for do_data_utils-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7714de06cf288319f50ad80825eea64e5b1a6f03e98ecdc37f17bc42bf6731ad
MD5 af44ca7e7345300271fec4c22d8aaf51
BLAKE2b-256 deec97f8b7005f8a928cc9f797228e7080561b72f22ae177dc1d7edd8620abe3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page