Skip to main content

Apply stricter validation to CKAN resource formats

Project description

Tests

ckanext-resource-type-validation

Overview

A CKAN extension that performs stricter validation of resource formats for uploaded files, ensuring that the file extension, file contents, and selected resource format are all compatible with each other.

  1. Reduces workload on back of house staff in fixing up format selection on miscategorised files.
  2. Better restrictions on allowed formats by also running them through magic/type sniffing systems. This ensures that an invalid file can't be uploaded by selecting a random format and changing the file type ending.

It is also possible to specify whitelists of allowed file extensions and/or allowed MIME types. Future development may allow a blacklist, but this is harder to make reliable.

This affects only uploaded resources. URL resources are not validated.

See the configuration file for more details.

Installation

To install ckanext-resource-type-validation:

  1. Install CKAN 2.9+.

  2. Activate your CKAN virtual environment, eg:

    . /usr/lib/ckan/default/bin/activate
    
  3. Install the extension into your virtual environment:

    git clone https://github.com/qld-gov-au/ckanext-resource-type-validation.git
    cd ckanext-resource-type-validation
    pip install -e .
    pip install -r requirements.txt
    
  4. Add resource_type_validation to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/production.ini).

  5. Restart CKAN. Eg if you've deployed CKAN with Apache on Ubuntu:

    sudo service apache2 reload

Configuration

ckan.plugins = resource_type_validation

Optional

# Path to the configuration file for specifying file types and their
# relationships. Defaults to built-in
# ckanext/resource_type_validation/resources/resource_types.json
ckanext.resource_validation.types_file = /path/to/file.json

# Support contact to list in any error messages
ckanext.resource_validation.support_contact = webmaster@example.com

# Whitelist of allowed mimetypes
ckan.mimetypes_allowed = application/pdf,text/plain,text/xml

The configuration file can contain the following, all optional and in any order:

  • allowed_extensions: A list of allowed file extensions, case-insensitive. If this is not specified, any extension is allowed.

  • allowed_overrides: A dictionary specifying which MIME types are treated as subtypes of others, eg application/xml is a subtype of text/plain, and anything is a subtype of application/octet-stream. So, a file named example.xml with content that looks like text/plain, and a specified resource format of "XML", would be accepted. The format of each entry is "parent-type": ["sub-type1", "sub-type2"]. Wildcards are partially supported; an override can be a single asterisk to allow any other type to be a subtype (typically used for application/octet-stream), or it can have the form prefix/* to allow any type with that prefix to be a subtype (eg text/* can override text/plain).

  • equal_types: A list of lists of types that are interchangeable, eg text/xml is the same as application/xml. This can be used in a similar manner to allowed_overrides, but is bidirectional, and will affect the resulting displayed format. Overrides will attempt to use the most specific subtype, whereas equal types take whichever is encountered first. For example, a file named example.rdf and containing XML data, with application/rdf+xml as an override for application/xml, would have a resource mimetype of application/rdf+xml, but if application/xml and application/rdf+xml are configured as equal types, then the resource mimetype might be simply application/xml.

  • archive_types: A list of types that are archives and require special handling, eg application/zip. Archives can specify any resource format (since the format might refer to the archive contents), so long as the archive is well-formed (file extension and contents match).

  • generic_types: A list of types that are 'generic' ie supertype to many others (eg text/plain and application/octet-stream). File contents of these types can be overridden with a subtype, but if the file extension or format matches them, then that cannot be overridden. Eg a file with text/plain content could specify a CSV extension and format, but a file with .txt extension could not specify a "CSV" format. Similarly, a resource with "TXT" format could not have a .xml extension. This is intended to prevent browser-based content-sniffing attacks, where a file with an innocuous extension like .txt may be handled in a different way by the browser based on the apparent type of its contents.

  • extra_mimetypes: A dictionary of additional mappings to add to the Python mimetypes library for guessing types based on file extensions. The format of each entry is ".extension": "mime-type". For example, a site that expects to upload Quartus Tabular Text Files might define the .ttf extension to have text/plain MIME type:

    "extra_mimetypes": {
      ".ttf": "text/plain"
    }
    

Testing

To run the tests:

  1. Activate your CKAN virtual environment, eg . /usr/lib/ckan/default/bin/activate

  2. Switch to the extension directory, eg cd /usr/lib/ckan/default/src/ckanext-resource-type-validation

  3. Install test requirements: pip install -r dev-requirements.txt

  4. Run the tests. This can be done in multiple ways.

    1. Execute the test class directly:

      python ckanext/resource_type_validation/test_mime_type_validation.py
      
    2. Run pytest

Alternative testing with Docker

The Docker-based test environment currently relies on *nix shell scripts.

  1. Install Docker Compose and Ahoy.

  2. Build the test containers: CKAN_VERSION=<version eg 2.11> bin/build.sh

  3. Run unit tests: ahoy test-unit

  4. Set up test data: ahoy install-site

  5. Run scenario tests: ahoy test-bdd

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckanext_resource_type_validation-1.0.11.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file ckanext_resource_type_validation-1.0.11.tar.gz.

File metadata

File hashes

Hashes for ckanext_resource_type_validation-1.0.11.tar.gz
Algorithm Hash digest
SHA256 10bd1442a73d68fa42fd68e74f16912fdee15d7fa77076329b2e9efb80d8ba72
MD5 8d381885cca8b8276e73fa8ec7f9f2c5
BLAKE2b-256 e271a7ebb151c1a9931206a330ef91ebb8bda305b260bb5612a3f06dbf2fefbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckanext_resource_type_validation-1.0.11.tar.gz:

Publisher: publish.yml on qld-gov-au/ckanext-resource-type-validation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ckanext_resource_type_validation-1.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for ckanext_resource_type_validation-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 c134e60054b8d801323465066baa411da74be21ec08284dbc2719c1160b540dc
MD5 6e3b643fb13e8dbfcacc1accd035c2ee
BLAKE2b-256 0fec3b64525a86be7ae5a1058d4c53eca1750a54def6ee3431a5f68038e1a0b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckanext_resource_type_validation-1.0.11-py3-none-any.whl:

Publisher: publish.yml on qld-gov-au/ckanext-resource-type-validation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page