Skip to main content

Utilities for editing CKAN using its API.

Project description

Introduction

This library assists CKAN editors with doing batch edits and pairs well with a library like pandas.

Installation

pip install ckan-editor-utils

The requests package is used for all the underlying API calls.
The boto3 AWS SDK package is used for accessing and uploading files from S3.

Features and examples

>>> import ckan_editor_utils

Simple API commands

For the basic API commands, much of the requests boilerplate code is done for you. However, the URL must already have a suffix like /api/action/.

>>> url = 'https://horizon.uat.gsq.digital/api/action/'
>>> api_key = os.environ.get('CKAN_API_KEY')
>>> dataset_id = 'my-test-dataset'

>>> res_create = ckan_editor_utils.package_create(
...            url, 
...            api_key, 
...            {
...                'name': dataset_id,                   'extra:identifier': dataset_id,
...                'notes': 'my description',
...                'owner_org': 'geological-survey-of-queensland',
...            })
>>> res_create
<Response [200]>
# This requests response can be viewed by using the .text or .json() methods
>>> res_create.json()
{"help": "https://uat-external.dnrme-qld.links.com.au/api/3/action/help_show?name=package_create", "success": true, "result": {...

The response text shows the entire package as CKAN has recorded it. It will populate additional items like the Organisation description automatically.

We can use package_show to get the metadata for an existing dataset:

>>> res_show = ckan_editor_utils.package_show(url, api_key, dataset_id)
>>> res_show.json()
{'help': 'https://uat-external.dnrme-qld.links.com.au/api/3/action/help_show?name=package_show', 'success': True, 'result': {'extra:theme': []...

Always check the HTTP status before interacting with the data payload. For example, a 409 code will be received if it already exists or if we did not provide enough information for the type of dataset we want to be created, among other reasons. If we have the dataset ID wrong, we will get a 404 from CKAN. This is the default requests response:

>>> res_missing = ckan_editor_utils.package_show(url, api_key, 'missingdataset')
>>> res_missing
<Response [404]>
>>> res_missing.json()
{'help': 'https://uat-external.dnrme-qld.links.com.au/api/3/action/help_show?name=package_show', 'success': False, 'error': {'message': 'Not found', '__type': 'Not Found Error'}}

The next section helps simplify the response using a CKANResponse object, which is particularly useful when errors occur.

More examples of basic API usage can be found at the GSQ Open Data API GitHub page, and the official documentation page at docs.ckan.org

Simplified CKAN Responses

When interacting with the CKAN API, it can be difficult to get a consistent result. Some errors are text not JSON, and the JSON errors sometimes contain different attributes depending on the context. Managing the variety of these responses means a lot of extra logic is needed, which clutters up your script.

This library offers a new CKANReponse object that can convert requests responses from CKAN into something more consistent and manageable. To use it, simply pass it a CKAN response you received when using requests.

>>> check_res_show = ckan_editor_utils.CKANResponse(res_show)  # (response from earlier example)
>>> print(check_res_show)
Response 200 OK
>>> check_res_show.ok
True
>>> check_res_show.result
{'extra:theme': [], 'license_title': None, 'maintainer': None, ...

A JSON response will always be present in the result attribute of the CKAN response. This means you can reliably use result to capture output and it will always be relevant. Furthermore the API action made will be logged to stdout/the console, so you can easily track progress.

Continuing the 404 example from above, the response can be changed to something easier to manage:

>>> cr = ckan_editor_utils.CKANResponse(res_missing)
Response 404 not OK: {"message": "Not found"}
>>> cr.result
{'message': 'Not found'}
>>> cr.ok
False

These simplified CKAN responses are included in the managed actions described in the next section.

Managed API actions

Some common workflows have been developed and make it easier to do simple actions.

Requirements

  • A sysadmin user API key is required
  • The CKAN instance must have the latest ckanext-cloudstorage plugin installed

Introduction

The following managed actions are available via the CKANEditorSession context manager class:

  • put_dataset (create or update)
  • delete_dataset (delete and purge)
  • put_resource_from_s3 (automatically does multipart uploads)

Additionally, the CKANEditorSession will fix up the provided CKAN URL if it is missing the required api/action/ path.

with ckan_editor_utils.CKANEditorSession(url, api_key) as ckaneu:
    return ckaneu.delete_dataset(dataset_id).result

Here we are able to get the result attribute without any extra logic or coding because the response object has been simplified.

Adding a dataset using put_dataset()

As an editor doing bulk changes, you might not be sure if every package already exists before you can safely call package_update(). Instead, you can just call put_dataset(), and the managed session will either create or update the dataset depending on what it finds.

data = {
    'name': 'ds000001',
    'extra:identifier': 'DS000001',
    'notes': 'Some description about this dataset.',
    'owner_org': 'my-organisation-lowercase-with-dashes',
    # include any other required and known fields
}
with ckan_editor_utils.CKANEditorSession(url, api_key) as ckaneu:
    res = ckaneu.put_dataset(data, skip_existing=True)
    print(res.result)

Including skip_existing=True means if a dataset exists, it will not be modified. Passing False will update the existing dataset with any attributes you pass in, leaving all others intact.

Adding a resource from S3 using put_resource_from_s3()

This tool helps you upload a data object located in S3 to CKAN. The following fields are required:

resource = {
    'name': 'ds000001',
    'resource:name': 'My New Resource to Share',
    'resource:description': 'Some description about the particular resource.'
}

The size and format are automatically calculated for you. We use resource:name because a common workflow is to load data from a CSV that includes both the dataset name and the resource name, which have the same label in CKAN.

You also need an s3_path value to pass in, like so:

's3_path' = 's3://mybucket/myprefix/myfile1.zip'

Then you can call the function using the same context manager session:

with ckan_editor_utils.CKANEditorSession(url, api_key) as ckaneu:
    res = ckaneu.put_resource_from_s3(resource, s3_path, skip_existing=True)
    print(res.result)

Including skip_existing=True means if a resource exists, it will not be modified. Passing False will update the existing resource with any attributes and data objects you pass in, leaving all others intact.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckan-editor-utils-0.1.9.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

ckan_editor_utils-0.1.9-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file ckan-editor-utils-0.1.9.tar.gz.

File metadata

  • Download URL: ckan-editor-utils-0.1.9.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.3 CPython/2.7.16 Darwin/19.6.0

File hashes

Hashes for ckan-editor-utils-0.1.9.tar.gz
Algorithm Hash digest
SHA256 a72053d340b198940093c43a5d5ad42fa84241d777ff41f47b4db84c00fd0d5c
MD5 52493460844d36aea2d496c413afd612
BLAKE2b-256 e427ff4eb04b8f26433cf986c75e3a43be71c17d70116de2d8db46359acdb608

See more details on using hashes here.

File details

Details for the file ckan_editor_utils-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for ckan_editor_utils-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ffe18ab3f44a72f4ba6701e11a102513127ca7a378cb4f9f16f71cd9a511a9a4
MD5 2c96b6921996761bb101d2d7ef2a9891
BLAKE2b-256 a5201fdfc7e5675f17b104ebe0f6fafdfc0b827f02a9511fbabb9fba325bf054

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page