No project description provided
Project description
neoval-py-utils
Python Utilities
Development
All development must take place on a feature branch and a pull request is required; a user is not allowed to commit directly to main
. The automated workflow in this repo (using python-semantic-release
) requires the use of angular style commit messages to update the package version and CHANGELOG
. All commits must be formatted in this way before a user is able to merge a PR; a user who may want to develop without using this format for all commits can simply squash non-angular commit messages prior to merge. A PR may only be merged by the rebase and merge
method. This is to ensure that only angular style commits end up on main
.
Upon merge to main
, the deploy
workflow will facilitate the following:
- bump the version in
pyproject.toml
- update the
CHANGELOG
using all commits added - tag and release, if required
- publish to PyPi
Getting Started
Prerequisites
TODO
Tests
For the integration tests to pass you will need to be authenticated with a Google project. With storage admin and bigquery job permissions.
You can auth with GOOGLE_APPLICATION_CREDENTIALS
as an environment variable or by
running gcloud auth application-default login
.
Specify gcp project with gcloud config set project <project-id>
.
Run unit and integration tests with poetry run task test
.
To run with coverage tests with poetry run task test-with-coverage
.
Usage
TODO installation with pipy
Assuming that installed neoval-py-utils
is successfully as a dependency and have permissions to gcp storage and bigquery.
Examples of usage
Export BQ datasets or Queries >> Dataframe or GCS
from neoval_py_utils.exporter import Exporter
# To query a bigquery table and return a polar dataframe. Caches results, keeps for default 12 hours.
exporter = Exporter() # To use cache, pass path to the constructor. Eg Exporter(cache_dir=./cache)
pl_df = exporter.export("SELECT word FROM `bigquery-public-data.samples.shakespeare` GROUP BY word ORDER BY word DESC LIMIT 3")
# `export` is aliased by `<` operator. Will give same results as above.
pl_df = exporter < "SELECT word FROM `bigquery-public-data.samples.shakespeare` GROUP BY word ORDER BY word DESC LIMIT 3"
# To export a whole table
al_pl_df = exporter.export("bigquery-public-data.samples.shakespeare")
# To export bigquery table to a parquet file in a gcp storage bucket. Returns a list of blobs.
blobs = exporter.bq_to_gcs("my-dataset.my-table")
Create In-process(Embedded) Databases
# Pythong cli example to build in-process db
poetry run python ipdb build <DBT_DATASET> <GCLOUD_PROJECT_ID> <DB_PATH> <CONFIG_PATH> --upload-bucket <UPLOAD_BUCKET>
# If you would like to run it in locally in this repo, you can run
# Upload bucket is optional, this will upload the in-process db to the specified bucket.
poetry run python neoval_py_utils/ipdb.py build samples bigquery-public-data tests/artifacts/in_process_db tests/resources/good.config.yaml
# To apply sql templates after the in-process db is built
poetry run python ipdb prepare <DBT_DATASET> <GCLOUD_PROJECT_ID> <DB_PATH> <TEMPLATES_PATH>
# If you would like to run it in locally in this repo, you can run
poetry run python neoval_py_utils/ipdb.py samples bigquery-public-data tests/artifacts/in_process_db tests/resources/templates
# For more info you can run
poetry run python neoval_py_utils/ipdb.py --help # which will return
Usage: ipdb.py [OPTIONS] COMMAND [ARGS]...
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ build Build the in process database(s). │
│ make-config Prints a default configuration to be used with the build command. │
│ prepare Run scripts to add views/virtual tables/etc. to the database(s). │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for neoval_py_utils-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84f885b1323517fda1aac3193f5ae126c464f6c9199011d0fb47ef76eb411e65 |
|
MD5 | 3c31834f109fb6139fb68d12dbf68a85 |
|
BLAKE2b-256 | fd1ab454144e92b29b8eedf901f0596ed1031c19392311c0a46b94cd8343def2 |