Python module for use within Jupyterlab notebooks, specifically aimed for Statistics Norway's data platform called Dapla

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

dapla-toolbelt

Python module for use within Jupyterlab notebooks, specifically aimed for Statistics Norway's data platform called Dapla. It contains support for authenticated access to Google Services such as Google Cloud Storage (GCS) and custom Dapla services such as Maskinporten Guardian. The authentication process is based on the TokenExchangeAuthenticator for Jupyterhub.

Unit tests Code coverage Code style: black

These operations are supported:

List contents of a bucket
Open a file in GCS
Copy a file from GCS into local
Load a file (CSV, JSON or XML) from GCS into a pandas dataframe
Save contents of a data frame into a file (CSV, JSON, XML) in GCS

When the user gives the path to a resource, they do not need to give the GCS uri, only the path. This just means users don't have to prefix a path with "gs://". It is implicitly understood that all resources accessed with this tool are located in GCS, with the first level of the path being a GCS bucket name.

Installation

pip install dapla-toolbelt

Usage Examples

from dapla import FileClient
from dapla import GuardianClient
import pandas as pd

# Load data using the Maskinporten Guardian client
response = GuardianClient.call_api("https://data.udir.no/api/kag", "88ace991-7871-4ccc-aaec-8fb6d78ed04e", "udir:datatilssb")
data_json = response.json()

raw_data_df = pd.DataFrame(data_json)  # create pandas data frame from json
raw_data_df.head()  # show first rows of data frame

FileClient.ls("bucket-name/folder")  # list contents of given folder

# Save data into different formats
path_base = "bucket-name/folder/raw_data"
FileClient.save_pandas_to_json(raw_data_df, f"{path_base}.json")  # generate json from data frame, and save to given path
FileClient.save_pandas_to_csv(raw_data_df, f"{path_base}.csv")  # generate csv from data frame, and save to given path
FileClient.save_pandas_to_xml(raw_data_df, f"{path_base}.xml")  # generate xml from data frame, and save to given path

FileClient.cat(f"{path_base}.json")  # print contents of file

# Load data from different formats
# All these data frames should contain the same data:
df = FileClient.load_json_to_pandas(f"{path_base}.json")  # read json from path and load into pandas data frame
df.head()  # show first rows of data frame
df = FileClient.load_csv_to_pandas(f"{path_base}.csv")  # read csv from path and load into pandas data frame
df.head()  # show first rows of data frame
df = FileClient.load_xml_to_pandas(f"{path_base}.xml")  # read xml from path and load into pandas data frame
df.head()  # show first rows of data frame

Development

Prerequisites

Python >3.8 (3.10 is preferred)
Poetry, install via curl -sSL https://install.python-poetry.org | python3 -

You can also execute make in the project folder to see available make commands.

Dependency Management

Poetry is used for dependency management. Poe the Poet is used for running poe tasks within poetry's virtualenv. Upon cloning this project first install necessary dependencies, then run the tests to verify everything is working.

Install all dependencies

poetry install

Add dependencies

Main

poetry add <python package name>

Dev

poetry add --group dev <python package name>

Run tests

poetry run poe test

Run project locally in Jupyter

To run the project locally in Jupyter run:

poetry run poe jupyter

A Jupyter instance should open in your browser. Open and run the cells in the demo.ipynb file.

Bumping version

Use make to bump the patch, minor version or major version before creating a pull request to the main GIT branch. Or run a poe task like this:

poetry run poe bump-patch-version

You can use either bump-version-patch, bump-version-minor, or bump-version-major. Bumping must be done with a clean git working space, and automatically commits with the new version number.

Then just run git push origin --tags to push the changes and trigger the release process.

Building and releasing

Before merging your changes into the main branch, make sure you have bumped the version like outlined above.

An automatic release process will build dapla-toolbelt upon pull request-creation, merges, and direct commits to the main GIT branch. It will also release a new version of the package to pypi.org automatically when a commit is tagged, for example by a GitHub release.

Building and releasing manually

Run make build to build a wheel and a source distribution.

Run make release-validate to do all that AND validate it for release.

Run this (replacing with your current version number) to check the contents of your wheel: tar tzf dist/dapla-toolbelt-<SEMVER>.tar.gz

Test release

You have to bump the version of the package (see documentation on "Bumping version" above) before releasing, because even test.pypi.org does not allow re-releases of a previously released version.

Run the following command in order to build, validate, and test package publication by uploading to TestPyPI: make release-test

You will have to manually enter a username and password for a user registered to test.pypi.org in order for this to work.

Production release

NB: A manual production release should only be done as a last resort, if the regular CI/CD pipeline does not work, and it's necessary to release regardless.

You have to bump the version of the package (see documentation on "Bumping version" above) to something different from the last release before releasing.

In order to publish a new version of the package to PyPI for real, run make release. Authenticate by manually entering your pypi.org username and password.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

2.0.19

Jun 24, 2024

2.0.18

Jun 19, 2024

2.0.17

Jun 13, 2024

2.0.16

Jun 11, 2024

2.0.15

Jun 11, 2024

2.0.14

Jun 6, 2024

2.0.13

May 7, 2024

2.0.12

Mar 22, 2024

2.0.11

Mar 14, 2024

2.0.10

Mar 12, 2024

2.0.9

Mar 12, 2024

2.0.8

Mar 3, 2024

2.0.6

Feb 2, 2024

2.0.5

Jan 31, 2024

2.0.4

Jan 29, 2024

2.0.3

Jan 29, 2024

2.0.2

Jan 25, 2024

2.0.1

Jan 18, 2024

2.0.0

Jan 18, 2024

1.8.6

Jan 2, 2024

1.8.5

Dec 21, 2023

1.8.4

Oct 11, 2023

1.8.3

Sep 26, 2023

1.8.2

Aug 25, 2023

1.8.1

Aug 7, 2023

1.8.0

Jul 3, 2023

This version

1.7.1

Jun 13, 2023

1.7.0

May 3, 2023

1.6.3

Jun 13, 2023

1.6.2

Feb 24, 2023

1.6.1

Feb 21, 2023

1.6.0

Feb 17, 2023

1.5.0

Feb 17, 2023

1.4.0

Feb 15, 2023

1.3.12

Jan 16, 2023

1.3.11

Jan 16, 2023

1.3.10

Jan 16, 2023

1.3.9

Jan 13, 2023

1.3.8

Jan 4, 2023

1.3.7

Nov 28, 2022

1.3.6

Nov 14, 2022

1.3.5

Nov 4, 2022

1.3.4

Oct 20, 2022

1.3.3

Sep 26, 2022

1.3.2

Sep 6, 2022

1.3.1

Jun 10, 2022

1.3.0

May 11, 2022

1.2.3

Apr 20, 2022

1.2.2

Apr 6, 2022

1.2.1

Apr 4, 2022

1.2.0

Apr 4, 2022

1.1.8

Apr 1, 2022

1.1.7

Mar 22, 2022

1.1.6

Mar 21, 2022

1.1.5

Mar 21, 2022

1.1.4

Mar 21, 2022

1.1.3

Mar 17, 2022

1.1.2

Mar 17, 2022

1.1.1

Mar 15, 2022

1.1.0

Mar 15, 2022

1.0.0

Mar 15, 2022

0.0.10

Mar 14, 2022

0.0.9

Mar 7, 2022

0.0.8

Mar 2, 2022

0.0.7

Mar 2, 2022

0.0.6

Mar 2, 2022

0.0.5

Mar 2, 2022

0.0.4

Mar 2, 2022

0.0.3

Feb 25, 2022

0.0.2

Feb 24, 2022

0.0.1

Feb 23, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dapla_toolbelt-1.7.1.tar.gz (17.2 kB view hashes)

Uploaded Jun 13, 2023 Source

Built Distribution

dapla_toolbelt-1.7.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded Jun 13, 2023 Python 3

Hashes for dapla_toolbelt-1.7.1.tar.gz

Hashes for dapla_toolbelt-1.7.1.tar.gz
Algorithm	Hash digest
SHA256	`94e46feee8a949638eafa6de60b470031fd513243a1da5e3f1f8d864ff6f7b87`
MD5	`7c100dc106b7e1a318c84a1d4545e964`
BLAKE2b-256	`05ed65cd9357465cd35d74c8e6f85a8fd956faabf201c9fc3ea4c7da6a538a7b`

Hashes for dapla_toolbelt-1.7.1-py3-none-any.whl

Hashes for dapla_toolbelt-1.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e38abaa8292027811ecd614f010217630dc3c5bb51e4750e41d973eebb4150d3`
MD5	`87455072a04d0a2aca86644af0b0c5b2`
BLAKE2b-256	`3c18afc76b46bc398e183e927911b48efe0927eeaf6dcb55fc6d312dc9e16831`