Skip to main content

Converts a dataset based on a specific schema

Project description

ckanext-transmute

The extension helps to validate and converts a dataset based on a specific schema.

Working with transmute

ckanext-transmute provides an action tsm_transmute It helps us to transmute data with the provided convertion scheme. The action doesn't change the original data, but creates a new data dict. There are two mandatory arguments - data and schema. data is a data dict you have and schema helps you to validate/change data in it.

Example: We have a data dict:

{
            "title": "Test-dataset",
            "email": "test@test.ua",
            "metadata_created": "",
            "metadata_modified": "",
            "metadata_reviewed": "",
            "resources": [
                {
                    "title": "test-res",
                    "extension": "xml",
                    "web": "https://stackoverflow.com/",
                    "sub-resources": [
                        {
                            "title": "sub-res",
                            "extension": "csv",
                            "extra": "should-be-removed",
                        }
                    ],
                },
                {
                    "title": "test-res2",
                    "extension": "csv",
                    "web": "https://stackoverflow.com/",
                },
            ],
        }

And we want to achieve this:

{
            "name": "test-dataset",
            "email": "test@test.ua",
            "metadata_created": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "metadata_modified": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "metadata_reviewed": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "attachments": [
                {
                    "name": "test-res",
                    "format": "XML",
                    "url": "https://stackoverflow.com/",
                    "sub-resources": [{"name": "SUB-RES", "format": "CSV"}],
                },
                {
                    "name": "test-res2",
                    "format": "CSV",
                    "url": "https://stackoverflow.com/",
                },
            ],
        }

Then, our schema must be something like that:

{
        "root": "Dataset",
        "types": {
            "Dataset": {
                "fields": {
                    "title": {
                        "validators": [
                            "tsm_string_only",
                            "tsm_to_lowercase",
                            "tsm_name_validator",
                        ],
                        "map": "name",
                    },
                    "resources": {
                        "type": "Resource",
                        "multiple": True,
                        "map": "attachments",
                    },
                    "metadata_created": {
                        "validators": ["tsm_isodate"],
                        "default": "2022-02-03T15:54:26.359453",
                    },
                    "metadata_modified": {
                        "validators": ["tsm_isodate"],
                        "default_from": "metadata_created",
                    },
                    "metadata_reviewed": {
                        "validators": ["tsm_isodate"],
                        "replace_from": "metadata_modified",
                    },
                }
            },
            "Resource": {
                "fields": {
                    "title": {
                        "validators": ["tsm_string_only"],
                        "map": "name",
                    },
                    "extension": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "format",
                    },
                    "web": {
                        "validators": ["tsm_string_only"],
                        "map": "url",
                    },
                    "sub-resources": {
                        "type": "Sub-Resource",
                        "multiple": True,
                    },
                },
            },
            "Sub-Resource": {
                "fields": {
                    "title": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "name",
                    },
                    "extension": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "format",
                    },
                    "extra": {
                        "remove": True,
                    },
                }
            },
        },
    }

There is an example of schema with nested types. The root field is mandatory, it's must contain a main type name, from which the scheme starts. As you can see, Dataset type contains Resource type which contans Sub-Resource.

Transmutators

There are few default transmutators you can use in your schema. Of course, you can define a custom transmutator with the CKAN IValidators interface.

  • tsm_name_validator - Wrapper over CKAN default name_validator validator
  • tsm_to_lowercase - Casts string value to a lowercase
  • tsm_to_uppercase - Casts string value to a uppercase
  • tsm_string_only - Validates if field.value is string
  • tsm_isodate - Wrapper over CKAN default isodate validator. Mutates an iso-like string to datetime object
  • tsm_to_string - Casts a field.value to str
  • tsm_get_nested - Allows you to pick up a value from a nested structure. Example:
data = "title_translated": [
    {"nested_field": {"en": "en title", "ar": "العنوان ar"}},
]

schema = ...
    "title": {
        "replace_from": "title_translated",
        "validators": [
            ["tsm_get_nested", 0, "nested_field", "en"],
            "tsm_to_uppercase",
        ],
    },
    ...

This will take a value for a title field from title_translated field. Because title_translated is an array with nested objects, we are using the tsm_get_nested transmutator to achieve the value from it.

The default transmutator must receive at least one mandatory argument - field object. Field contains few properties: field_name, value and type.

There is a possibility to provide more arguments to a validator like in tsm_get_nested. For this use a nested array with first item transmutator and other - arguments to it.

Installation

To install ckanext-transmute:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate

  2. Clone the source and install it on the virtualenv

    git clone https://github.com/mutantsan/ckanext-transmute.git cd ckanext-transmute pip install -e . pip install -r requirements.txt

  3. Add transmute to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/ckan.ini).

  4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:

    sudo service apache2 reload

Developer installation

To install ckanext-transmute for development, activate your CKAN virtualenv and do:

git clone https://github.com/mutantsan/ckanext-transmute.git
cd ckanext-transmute
python setup.py develop
pip install -r dev-requirements.txt

Tests

I've used TDD to write this extension, so if you changing something be sure that all the tests are valid. To run the tests, do:

pytest --ckan-ini=test.ini

License

AGPL

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckanext-transmute-1.1.5.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ckanext_transmute-1.1.5-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file ckanext-transmute-1.1.5.tar.gz.

File metadata

  • Download URL: ckanext-transmute-1.1.5.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.63.0 importlib-metadata/4.8.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for ckanext-transmute-1.1.5.tar.gz
Algorithm Hash digest
SHA256 9f5eb0e53e3ce24f698f18b8c24dbdc4fae460182a742c25fa897bf0d458ead4
MD5 6097ae7fa255abf19a011833d9ab4843
BLAKE2b-256 78c4f057c87394d1403b3ba81a400621dcfb0f3c21265cdd7c2109846ee4fd4f

See more details on using hashes here.

File details

Details for the file ckanext_transmute-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: ckanext_transmute-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.63.0 importlib-metadata/4.8.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.10

File hashes

Hashes for ckanext_transmute-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2b342287ce2e20597d21497e30bac2f18c2a7b028cbbe7abbb120699e06a5574
MD5 3a0df1cdf62f072f90795b981478afa1
BLAKE2b-256 2d6ada957f0f49ae3d0ab3e0b08e3eab1675dd2f65a4905b9bda381ae90ebc05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page