Skip to main content

SDK for origo dataplatform

Project description

PyPI

okdata-sdk: Python SDK for Origo Dataplatform

okdata-sdk is on PyPI: pip install okdata-sdk

Configuration

When calling any classes interacting with the Origo Dataplatform API and there are no Config params passed to the constructor, a config object will be automaticly created for you based on environment variables

Environment variables

Default, will pick up configuration from current environment. The credentials is resolved automatically if you do not set a specific Auth config, in the following order:

  1. Client Credentials: If you have added client_id / client_secret to the config. Or if you use the environment variable equivalent: OKDATA_CLIENT_ID / OKDATA_CLIENT_SECRET.
  2. Username And Password: If you have added username / password to the config. Or if you use the environment variable equivalent: OKDATA_USERNAME / OKDATA_PASSWORD.
# keycloak user
export OKDATA_USERNAME=my-user

# keycloak password for OKDATA_USERNAME
export OKDATA_PASSWORD=my-password

# keycloak client
export OKDATA_CLIENT_ID=my-machine-client

# keycloak secret for OKDATA_CLIENT_ID
export OKDATA_CLIENT_SECRET=some-generated-secure-string


# overrides default environment (dev), but will be trumped by --env=<environment> on the commandline
export OKDATA_ENVIRONMENT=dev|prod

# If you are sending events and have been assigned a API key
export OKDATA_API_KEY=your-api-key

Getting Credentials:

username/password are synced with Oslo municipalities Active Directory so any user with an association can use their personal account to access the SDK.

For client credentials please contact the data platform team. dataplattform[at]oslo.kommune.no

TODO: Named profiles

If environment variables are not available, the system will try to load from a default profile: Located in ~/.okdata/configuration

Usage

Table of contents:

Upload data

When uploading data you need to refer to an existing dataset that you own, a version and an edition. If these are non existent then you can create them yourself. This can be achieved using the sdk, or you can use our command line interface.

from okdata.sdk.data.upload import Upload
from okdata.sdk.config import Config

okdata_config = Config()

# If necessary you can override default values
okdata_config.config["cacheCredentials"] = False

data_uploader = Upload(config=okdata_config)

# Upload file 'data.json' to dataset-id/version/edition
dataset_id = "my-dataset-id"
version = "my-version"  # example value: 1
edition = "my-edition"  # example value: 20200618T114038

filename = "/path-to-file/data.json"

# Note! filename must be pointing to an existing file on your disk
upload_response = data_uploader.upload(filename, dataset_id, version, edition)
print(upload_response)
# {
#     "result": True,
#     "trace_id": "my-dataset-id-54a3c78e-86a3-4631-8f28-0252fe1c7c13"
# }

The trace_id returned by the upload method can be used to "trace" the steps involved in the upload process:

from okdata.sdk.status import Status
...
status = Status(config=okdata_config)
trace_events = status.get_status(trace_id)
print(trace_events)
# [
#     {
#         "trace_id": "my-dataset-1a2bc345-6789-1234-567d-8912ef34a567",
#         "trace_status": "STARTED",
#         "trace_event_id": "1a2b3cd4-eef5-6aa7-bccd-e889912334f5",
#         "trace_event_status": "OK",
#         "component": "data-uploader",
#         ...
#     },
#     {
#         "trace_id": "my-dataset-1a2bc345-6789-1234-567d-8912ef34a567",
#         "trace_status": "CONTINUE",
#         ...
#     },
#     {
#         "trace_id": "my-dataset-1a2bc345-6789-1234-567d-8912ef34a567",
#         "trace_event_id": "1aa2b345-678c-9de1-f2a3-4566bcd78912",
#         "trace_status": "FINISHED",
#         "trace_event_status": "OK",
#         ...
#     }
# ]

Download data

To download data you need to refer to a dataset that you have access to. This could be a public dataset, a restricted dataset you've been given access to, or a dataset that you own yourself. If the dataset is public, authenticating yourself is not necessary.

You will also need to refer to the specific version and edition of the dataset that you want to download. If this is your own dataset, make sure to create a version and edition before attempting to download it.

from okdata.sdk.data.download import Download
from okdata.sdk.config import Config

okdata_config = Config(env="dev")

# If necessary you can override default config values
okdata_config.config["cacheCredentials"] = False

data_downloader = Download(config=okdata_config)

dataset_id = "your-dataset-id"
version = "1"
edition = "latest"

# Downloading a file
res1 = data_downloader.download(dataset_id, version, edition, "my/preferred/output/path")
print(res1)
# {
#     "downloaded_files": ["my/preferred/output/path/file_name.csv"]
# }

Creating datasets with versions and editions

from okdata.sdk.data.dataset import Dataset
from okdata.sdk.config import Config

okdata_config = Config()

# If necessary you can override default values
okdata_config.config["cacheCredentials"] = False

# Create a new dataset
dataset = Dataset(config=okdata_config)

dataset_metadata = {
    "title": "Precise Descriptive Title",
    "description": "Describe your dataset here",
    "keywords": ["some-keyword"],
    "accessRights": "public",
    "objective": "Exemplify how to create a new dataset",
    "contactPoint": {
        "name": "Your name",
        "email": "your_email@domain.com",
        "phone": "999555111"
    },
    "publisher": "name of organization or person responsible for publishing the data"
}

new_dataset = dataset.create_dataset(data=dataset_metadata)

# new_dataset:
# { 'Id': 'precise-descriptive-title',
#   'Type': 'Dataset',
#   '_links': {'self': {'href': '/datasets/precise-descriptive-title'}},
#   'accessRights': 'public',
#   'contactPoint': { 'email': 'your_email@domain.com',
#                     'name': 'Your name',
#                     'phone': '999555111'},
#   'description': 'Describe your dataset here',
#   'keywords': ['some-keyword'],
#   'objective': 'Exemplify how to create a new dataset',
#   'publisher': 'name of organization or person responsible for publishing the '
#                'data',
#   'title': 'Precise Descriptive Title'}


# create version for new dataset:
version_data = {"version": "1"}
new_version = dataset.create_version(new_dataset["Id"], data=version_data)

# new_version:
# { 'Id': 'precise-descriptive-title/1',
#   'Type': 'Version',
#   '_links': { 'self': { 'href': '/datasets/precise-descriptive-title/versions/1'}},
#   'version': '1'}

# create edition for new_dataset/new_version:
import datetime

# Note! edition-field must be ISO 8601 with utc offset
edition_data = {
    "edition": str(datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat()),
    "description": "My edition description",
    "startTime": "2019-01-01",
    "endTime": "2019-12-31"
}
new_edition = dataset.create_edition(new_dataset["Id"], new_version["version"], data=edition_data)

# new_edition
# { 'Id': 'precise-descriptive-title/1/20200115T130439',
#   'Type': 'Edition',
#   '_links': { 'self': { 'href': '/datasets/precise-descriptive-title/versions/1/editions/20200115T130439'}},
#   'description': 'My edition description',
#   'edition': '2020-01-15T13:04:39.041778+00:00',
#   'endTime': '2019-12-31',
#   'startTime': '2019-01-01'}

Updating dataset metadata

Similarly to creating datasets, metadata for any given dataset, version etc., can also be updated by using the methods listed below. These methods accept an updated version of the JSON document posted when creating the same resource:

dataset.update_dataset(datasetid, data={ ... })
dataset.update_version(datasetid, versionid, data={ ... })
dataset.update_edition(datasetid, versionid, editionid, data={ ... })
dataset.update_distribution(datasetid, versionid, editionid, distributionid, data={ ... })

# Example: Update dataset metadata
dataset.update_dataset(
    datasetid="precise-descriptive-title",
    data={
        "title": "Precise Descriptive Title",
        "description": "Describe your dataset here",
        "keywords": ["some-keyword", "another-keyword"], # Add another keyword
        "accessRights": "public",
        "license": "http://data.norge.no/nlod/", # Add licensing information
        "objective": "Exemplify how to update an existing dataset", # Update objective text
        "contactPoint": {
            "name": "Your name",
            "email": "your_email@domain.com",
            "phone": "999555111"
        },
        "publisher": "name of organization or person responsible for publishing the data"
    }
)

The update_dataset method also supports an optional partial keyword, enabling partial updates when true:

dataset.update_dataset(
    "my-dataset-id", {"description": "Only update description"}, partial=True
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okdata_sdk-3.2.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

okdata_sdk-3.2.0-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file okdata_sdk-3.2.0.tar.gz.

File metadata

  • Download URL: okdata_sdk-3.2.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for okdata_sdk-3.2.0.tar.gz
Algorithm Hash digest
SHA256 35bf97697f1331e707cfb8caccbaa8a9ca9704eaee6a3536a67cb506312fdcc9
MD5 2c1fecce9a4f700b10bb5326bfef9f23
BLAKE2b-256 a8bea6e10e4734619d7a4b1d2d1251da2ae460e80460205980b4bbba1c3c3bed

See more details on using hashes here.

File details

Details for the file okdata_sdk-3.2.0-py3-none-any.whl.

File metadata

  • Download URL: okdata_sdk-3.2.0-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for okdata_sdk-3.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc4e3e9e99e50a78b65b2d6a6a57fcde0c0dc9e852daf219850cf6d0ff4025be
MD5 ea848d65a7347992ecb5880fcbc43220
BLAKE2b-256 c808d8623fcd57d51d18bda23db6232e490d1d0bb7f2fe67a9e44ed209d74e62

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page