Apply stricter validation to CKAN resource formats
Project description
ckanext-resource-type-validation
Overview
A CKAN extension that performs stricter validation of resource formats for uploaded files, ensuring that the file extension, file contents, and selected resource format are all compatible with each other.
- Reduces workload on back of house staff in fixing up format selection on miscategorised files.
- Better restrictions on allowed formats by also running them through magic/type sniffing systems. This ensures that an invalid file can't be uploaded by selecting a random format and changing the file type ending.
It is also possible to specify whitelists of allowed file extensions and/or allowed MIME types. Future development may allow a blacklist, but this is harder to make reliable.
This affects only uploaded resources. URL resources are not validated.
See the configuration file for more details.
Installation
To install ckanext-resource-type-validation:
-
Install CKAN 2.9+.
-
Activate your CKAN virtual environment, eg:
. /usr/lib/ckan/default/bin/activate -
Install the extension into your virtual environment:
git clone https://github.com/qld-gov-au/ckanext-resource-type-validation.git cd ckanext-resource-type-validation pip install -e . pip install -r requirements.txt -
Add
resource_type_validationto theckan.pluginssetting in your CKAN config file (by default the config file is located at/etc/ckan/default/production.ini). -
Restart CKAN. Eg if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
Configuration
ckan.plugins = resource_type_validation
Optional
# Path to the configuration file for specifying file types and their
# relationships. Defaults to built-in
# ckanext/resource_type_validation/resources/resource_types.json
ckanext.resource_validation.types_file = /path/to/file.json
# Support contact to list in any error messages
ckanext.resource_validation.support_contact = webmaster@example.com
# Whitelist of allowed mimetypes
ckan.mimetypes_allowed = application/pdf,text/plain,text/xml
The configuration file can contain the following, all optional and in any order:
-
allowed_extensions: A list of allowed file extensions, case-insensitive. If this is not specified, any extension is allowed. -
allowed_overrides: A dictionary specifying which MIME types are treated as subtypes of others, egapplication/xmlis a subtype oftext/plain, and anything is a subtype ofapplication/octet-stream. So, a file namedexample.xmlwith content that looks liketext/plain, and a specified resource format of "XML", would be accepted. The format of each entry is"parent-type": ["sub-type1", "sub-type2"]. Wildcards are partially supported; an override can be a single asterisk to allow any other type to be a subtype (typically used forapplication/octet-stream), or it can have the formprefix/*to allow any type with that prefix to be a subtype (egtext/*can overridetext/plain). -
equal_types: A list of lists of types that are interchangeable, egtext/xmlis the same asapplication/xml. This can be used in a similar manner toallowed_overrides, but is bidirectional, and will affect the resulting displayed format. Overrides will attempt to use the most specific subtype, whereas equal types take whichever is encountered first. For example, a file namedexample.rdfand containing XML data, withapplication/rdf+xmlas an override forapplication/xml, would have a resource mimetype ofapplication/rdf+xml, but ifapplication/xmlandapplication/rdf+xmlare configured as equal types, then the resource mimetype might be simplyapplication/xml. -
archive_types: A list of types that are archives and require special handling, egapplication/zip. Archives can specify any resource format (since the format might refer to the archive contents), so long as the archive is well-formed (file extension and contents match). -
generic_types: A list of types that are 'generic' ie supertype to many others (egtext/plainandapplication/octet-stream). File contents of these types can be overridden with a subtype, but if the file extension or format matches them, then that cannot be overridden. Eg a file withtext/plaincontent could specify a CSV extension and format, but a file with.txtextension could not specify a "CSV" format. Similarly, a resource with "TXT" format could not have a.xmlextension. This is intended to prevent browser-based content-sniffing attacks, where a file with an innocuous extension like.txtmay be handled in a different way by the browser based on the apparent type of its contents. -
extra_mimetypes: A dictionary of additional mappings to add to the Pythonmimetypeslibrary for guessing types based on file extensions. The format of each entry is".extension": "mime-type". For example, a site that expects to upload Quartus Tabular Text Files might define the.ttfextension to havetext/plainMIME type:"extra_mimetypes": { ".ttf": "text/plain" }
Testing
To run the tests:
-
Activate your CKAN virtual environment, eg
. /usr/lib/ckan/default/bin/activate -
Switch to the extension directory, eg
cd /usr/lib/ckan/default/src/ckanext-resource-type-validation -
Install test requirements:
pip install -r dev-requirements.txt -
Run the tests. This can be done in multiple ways.
-
Execute the test class directly:
python ckanext/resource_type_validation/test_mime_type_validation.py -
Run
pytest
-
Alternative testing with Docker
The Docker-based test environment currently relies on *nix shell scripts.
-
Install Docker Compose and Ahoy.
-
Build the test containers:
CKAN_VERSION=<version eg 2.11> bin/build.sh -
Run unit tests:
ahoy test-unit -
Set up test data:
ahoy install-site -
Run scenario tests:
ahoy test-bdd
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ckanext_resource_type_validation-1.0.11.tar.gz.
File metadata
- Download URL: ckanext_resource_type_validation-1.0.11.tar.gz
- Upload date:
- Size: 31.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10bd1442a73d68fa42fd68e74f16912fdee15d7fa77076329b2e9efb80d8ba72
|
|
| MD5 |
8d381885cca8b8276e73fa8ec7f9f2c5
|
|
| BLAKE2b-256 |
e271a7ebb151c1a9931206a330ef91ebb8bda305b260bb5612a3f06dbf2fefbe
|
Provenance
The following attestation bundles were made for ckanext_resource_type_validation-1.0.11.tar.gz:
Publisher:
publish.yml on qld-gov-au/ckanext-resource-type-validation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckanext_resource_type_validation-1.0.11.tar.gz -
Subject digest:
10bd1442a73d68fa42fd68e74f16912fdee15d7fa77076329b2e9efb80d8ba72 - Sigstore transparency entry: 197806593
- Sigstore integration time:
-
Permalink:
qld-gov-au/ckanext-resource-type-validation@c4585067e2babd19e737116907422ec36c2d2aa4 -
Branch / Tag:
refs/tags/1.0.11 - Owner: https://github.com/qld-gov-au
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c4585067e2babd19e737116907422ec36c2d2aa4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ckanext_resource_type_validation-1.0.11-py3-none-any.whl.
File metadata
- Download URL: ckanext_resource_type_validation-1.0.11-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c134e60054b8d801323465066baa411da74be21ec08284dbc2719c1160b540dc
|
|
| MD5 |
6e3b643fb13e8dbfcacc1accd035c2ee
|
|
| BLAKE2b-256 |
0fec3b64525a86be7ae5a1058d4c53eca1750a54def6ee3431a5f68038e1a0b9
|
Provenance
The following attestation bundles were made for ckanext_resource_type_validation-1.0.11-py3-none-any.whl:
Publisher:
publish.yml on qld-gov-au/ckanext-resource-type-validation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckanext_resource_type_validation-1.0.11-py3-none-any.whl -
Subject digest:
c134e60054b8d801323465066baa411da74be21ec08284dbc2719c1160b540dc - Sigstore transparency entry: 197806594
- Sigstore integration time:
-
Permalink:
qld-gov-au/ckanext-resource-type-validation@c4585067e2babd19e737116907422ec36c2d2aa4 -
Branch / Tag:
refs/tags/1.0.11 - Owner: https://github.com/qld-gov-au
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c4585067e2babd19e737116907422ec36c2d2aa4 -
Trigger Event:
push
-
Statement type: