Skip to main content

Functionalities to interact with Google and Azure, and clean data

Project description

datautils

This package provides you the functionalities to connect to different cloud sources and data cleaning functions.

Installation

Commands

To install the latest version from main branch, use the following command:

pip install do-data-utils

You can install a specific version like so:

pip install do-data-utils==<version>

For example,

pip install do-data-utils==1.1.2

Extra options can be inspected in setup.py in the extras_require option.

Install in requirements.txt

You can also put this source in the requirements.txt.

# requirements.txt
do-data-utils==1.1.2

Available Subpackages

  • google – Utilities for Google Cloud Platform.
  • azure – Utilities for Azure services.

For a full list of functions, see the overview documentation.

Example Usage

Google

GCS

from do_data_utils.google import get_secret, gcs_to_file


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gcs-secret-id-dev')

# Download the content from GCS
gcspath = 'gs://my-ai-bucket/my-path-to-json.json'
f = gcs_to_file(gcspath, secret=secret)
my_dict = json.load(f)

GBQ

from do_data_utils.google import get_secret, gbq_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gbq-secret-id-dev')

# Query
query = 'select * from my-project.my-dataset.my-table'
df = gbq_to_df(query, secret, polars=False)

Azure/Databricks

from do_data_utils.azure import databricks_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='databricks-secret-id-dev')

# Download from Databricks sql
query = 'select * from datadev.dsplayground.my_table'
df = databricks_to_df(query, secret, polars=False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

do_data_utils-1.1.2.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

do_data_utils-1.1.2-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file do_data_utils-1.1.2.tar.gz.

File metadata

  • Download URL: do_data_utils-1.1.2.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for do_data_utils-1.1.2.tar.gz
Algorithm Hash digest
SHA256 d2f1ace0dc949095150a12e36f81464ea01434cacfbc7d97e3800da78c446399
MD5 4c4bcb4f28a903e32ac2a01fff19b81b
BLAKE2b-256 e5eff168ad951c3e616ae55eb9d2b1c772109e4b40b6fec6acc50b80fcd068dd

See more details on using hashes here.

File details

Details for the file do_data_utils-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: do_data_utils-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for do_data_utils-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 90f8e7fc06396789c8aafe3ef8e699a31c22d9b669a5832c81624a4e335cd875
MD5 a33456d66f6cbfe867bb8f82d55ff1f3
BLAKE2b-256 6a6563c89cf8197ee46bee09a0b4cab7096e3fac6c39bf4c4b32a2f3b84f4a1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page