Functionalities to interact with Google and Azure, and clean data
Project description
datautils
This package provides you the functionalities to connect to different cloud sources and data cleaning functions.
Installation
Commands
To install the latest version from main branch, use the following command:
pip install do-data-utils
You can install a specific version like so:
pip install do-data-utils==<version>
For example,
pip install do-data-utils==1.1.2
Extra options can be inspected in setup.py in the extras_require option.
Install in requirements.txt
You can also put this source in the requirements.txt.
# requirements.txt
do-data-utils==1.1.2
Available Subpackages
google– Utilities for Google Cloud Platform.azure– Utilities for Azure services.
For a full list of functions, see the overview documentation.
Example Usage
GCS
from do_data_utils.google import get_secret, gcs_to_file
# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
secret_info = json.load(f)
secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gcs-secret-id-dev')
# Download the content from GCS
gcspath = 'gs://my-ai-bucket/my-path-to-json.json'
f = gcs_to_file(gcspath, secret=secret)
my_dict = json.load(f)
GBQ
from do_data_utils.google import get_secret, gbq_to_df
# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
secret_info = json.load(f)
secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gbq-secret-id-dev')
# Query
query = 'select * from my-project.my-dataset.my-table'
df = gbq_to_df(query, secret, polars=False)
Azure/Databricks
from do_data_utils.azure import databricks_to_df
# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
secret_info = json.load(f)
secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='databricks-secret-id-dev')
# Download from Databricks sql
query = 'select * from datadev.dsplayground.my_table'
df = databricks_to_df(query, secret, polars=False)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file do_data_utils-1.1.2.tar.gz.
File metadata
- Download URL: do_data_utils-1.1.2.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2f1ace0dc949095150a12e36f81464ea01434cacfbc7d97e3800da78c446399
|
|
| MD5 |
4c4bcb4f28a903e32ac2a01fff19b81b
|
|
| BLAKE2b-256 |
e5eff168ad951c3e616ae55eb9d2b1c772109e4b40b6fec6acc50b80fcd068dd
|
File details
Details for the file do_data_utils-1.1.2-py3-none-any.whl.
File metadata
- Download URL: do_data_utils-1.1.2-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90f8e7fc06396789c8aafe3ef8e699a31c22d9b669a5832c81624a4e335cd875
|
|
| MD5 |
a33456d66f6cbfe867bb8f82d55ff1f3
|
|
| BLAKE2b-256 |
6a6563c89cf8197ee46bee09a0b4cab7096e3fac6c39bf4c4b32a2f3b84f4a1c
|