Skip to main content

Document dataset metadata. For use in Statistics Norway's metadata system.

Project description

Datadoc

Datadoc Unit tests Code coverage PyPI version Code style: black

Document datasets in Statistics Norway

Usage

DataDoc in use

From Jupyter

  1. Open https://jupyter.dapla-staging.ssb.no or another Jupyter Lab environment
  2. Run pip install ssb-datadoc[gcs] in the terminal
  3. Upload a dataset to your Jupyter server (e.g. https://github.com/statisticsnorway/datadoc/blob/master/klargjorte_data/person_data_v1.parquet)
  4. Run from datadoc import main; main("./person_data_v1.parquet") in a notebook
  5. Datadoc will open in a new tab

Contributing

Prerequisites

  • Python >3.8 (3.10 is preferred)
  • Poetry, install via curl -sSL https://install.python-poetry.org | python3 -

Dependency Management

Poetry is used for dependency management. Poe the Poet is used for running poe tasks within poetry's virtualenv. Upon cloning this project first install necessary dependencies, then run the tests to verify everything is working.

Install all dependencies

poetry install --all-extras

Add dependencies

Main

poetry add <python package name>

Dev

poetry add --group dev <python package name>

Run tests

poetry run poe test

Run project locally

To run the project locally:

poetry run poe datadoc "gs://ssb-staging-dapla-felles-data-delt/datadoc/klargjorte_data/person_data_v1.parquet"

Run project locally in Jupyter

To run the project locally in Jupyter run:

poetry run poe jupyter

A Jupyter instance should open in your browser. Open and run the cells in the .ipynb file to demo datadoc.

Bump version

poetry run poe bump-patch-version

:warning: Run this on the default branch

This command will:

  1. Increment version strings in files
  2. Commit the changes
  3. Tag the commit with the new version

Then just run git push origin --tags to push the changes and trigger the release process.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssb-datadoc-0.2.1.tar.gz (321.5 kB view hashes)

Uploaded Source

Built Distribution

ssb_datadoc-0.2.1-py3-none-any.whl (332.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page