Validate Geopackage files according to PDOK requirements and recommendations
Project description
PDOK geopackage-validator
Geopackages are a data format that have a deliberately broad application, so many of the requirements are dependend on your use.
The PDOK geopackage validator is used by PDOK. PDOK is part of the Dutch government. This geopackage validator is used to validate a set of requirements to make sure geopackages adhere to our standardized ETL pipeline. It is possible to use this for your own purposes as described here. The validations will not change (except for bugfixes); new validations are always added to the list.
Table of Contents
TL;DR Commands
Either run through docker or locally.
Docker
Validate a GeoPackage with the default set of validation rules:
gpkg_path=relative/path/to/the.gpkg
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}"
Validate a GeoPackage with the default set of validation rules including a schema:
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}" --table-definitions-path "/gpkg/${schema_path}"
Generate a schema:
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator generate-definitions --gpkg-path "/gpkg/${gpkg_path}" > "$schema_path"
Local
For a local setup we require/tested against python > 3.6 and gdal = 3.4.
gpkg_path=relative/path/to/the.gpkg
geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}"
Validate a GeoPackage with the default set of validation rules including a schema:
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}" --table-definitions-path "/gpkg/${schema_path}"
Generate a schema:
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
geopackage-validator generate-definitions --gpkg-path "/gpkg/${gpkg_path}" > "$schema_path"
What does it do
The Geopackage validator can validate .gkpg files to see if they conform to a set of standards. The current checks are (see also the 'show-validations' command):
Validation code** | Description |
---|---|
UNKNOWN_ERROR | No unexpected (GDAL) errors must occur. |
RQ0 | LEGACY: use RQ8 * Geopackage must conform to table names in the given JSON or YAML definitions. |
RQ1 | Layer names must start with a letter, and valid characters are lowercase a-z, numbers or underscores. |
RQ2 | Layers must have at least one feature. |
RQ3 | LEGACY: use RQ14 * Layer features should have an allowed geometry_type (one of POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, or MULTIPOLYGON). |
RQ4 | The geopackage should have no views defined. |
RQ5 | LEGACY: use RQ23 * Geometry should be valid. |
RQ6 | Column names must start with a letter, and valid characters are lowercase a-z, numbers or underscores. |
RQ7 | Tables should have a feature id column with unique index. |
RQ8 | Geopackage must conform to given JSON or YAML definitions. |
RQ9 | All geometry tables must have an rtree index. |
RQ10 | All geometry table rtree indexes must be valid. |
RQ11 | OGR indexed feature counts must be up to date. |
RQ12 | LEGACY: use RQ22 * Only the following EPSG spatial reference systems are allowed: 28992, 3034, 3035, 3038, 3039, 3040, 3041, 3042, 3043, 3044, 3045, 3046, 3047, 3048, 3049, 3050, 3051, 4258, 4936, 4937, 5730, 7409. |
RQ13 | It is required to give all GEOMETRY features the same default spatial reference system. |
RQ14 | The geometry_type_name from the gpkg_geometry_columns table must be one of POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, or MULTIPOLYGON. |
RQ15 | All table geometries must match the geometry_type_name from the gpkg_geometry_columns table. |
RQ16 | LEGACY: use RQ21 * All layer and column names shall not be longer than 53 characters. |
RQ21 | All layer and column names shall not be longer than 57 characters. |
RQ22 | Only the following EPSG spatial reference systems are allowed: 28992, 3034, 3035, 3040, 3041, 3042, 3043, 3044, 3045, 3046, 3047, 3048, 3049, 3857, 4258, 4326, 4936, 4937, 5730, 7409. |
RQ23 | Geometry should be valid and simple. |
RC17 | It is recommended to name all GEOMETRY type columns 'geom'. |
RC18 | It is recommended to give all GEOMETRY type columns the same name. |
RC19 | It is recommended to only use multidimensional geometry coordinates (elevation and measurement) when necessary. |
RC20 | It is recommended that all (MULTI)POLYGON geometries have a counter-clockwise orientation for their exterior ring, and a clockwise direction for all interior rings. |
UNKNOWN_WARNINGS | It is recommended that the unexpected (GDAL) warnings are looked into. |
* Legacy requirements are only executed with the validate command when explicitly requested in the validation set.
** Since version 0.8.0 the recommendations are part of the same sequence as the requirements. From now on a check will always maintain the integer part of the code. Even if at a later time the validation type can shift between requirement and recommendation.
An explanation in Dutch with a reason for each rule can be found here.
Geopackage versions
The Geopackage validator support the following Geopackage versions:
- 1.3.1
- 1.3
- 1.2.1
Installation
This package requires:
- GDAL version >= 3.2.1.
- Spatialite version >= 5.0.0
- And python >= 3.8 to run.
We recommend using the docker image. When above requirements are met the package can be installed using pip (pip install pdok-geopackage-validator
).
Docker Installation
Pull the latest version of the Docker image (only once needed, or after an update)
docker pull pdok/geopackage-validator:latest
Or build the Docker image from source:
docker build -t pdok/geopackage-validator .
The command is directly called so subcommands can be run in the container directly:
docker run -v ${PWD}:/gpkg --rm pdok/geopackage-validator validate -t /path/to/generated_definitions.json --gpkg-path /gpkg/tests/data/test_allcorrect.gpkg
Usage
RQ8 Validation
To validate RQ8 you have to generate definitions first.
docker run -v ${PWD}:/gpkg --rm pdok/geopackage-validator geopackage-validator generate-definitions --gpkg-path /path/to/file.gpkg
Validate
Usage: geopackage-validator validate [OPTIONS]
Geopackage validator validating a local file or a file from S3 storage.
When the filepath is preceded with '/vsis3' or '/vsicurl' the gdal virtual
file system will be used to access the file on S3 and will not be directly
downloaded. See https://gdal.org/user/virtual_file_systems.html for
further explanation how to use gdal virtual file systems. For convenience
the gdal vsi environment parameters and optional parameters are provided
with an S3_ instead of an AWS_ prefix. The AWS_ environment parameters
will also work.
Examples:
viscurl:
geopackage-validator validate --gpkg-path /vsicurl/http://minio-
url.nl/bucketname/key/to/public.gpkg
vsis3:
geopackage-validator validate --gpkg-path
/vsis3/bucketname/key/to/public.gpkg --s3-signing-region eu-central-1
--s3-secret-key secret --s3-access-key acces-key --s3-secure=false
--s3-virtual-hosting false --s3-endpoint-no-protocol minio-url.nl
S3_SECRET_KEY=secret S3_ACCESS_KEY=acces-key S3_SIGNING_REGION=eu-
central-1 S3_SECURE=false S3_VIRTUAL_HOSTING=false
S3_ENDPOINT_NO_PROTOCOL=minio-url.nl geopackage-validator validate --gpkg-
path /vsis3/bucketname/key/to/public.gpkg
AWS_SECRET_ACCESS_KEY=secret AWS_ACCESS_KEY_ID=acces-key
AWS_DEFAULT_REGION=eu-central-1 AWS_HTTPS=NO AWS_VIRTUAL_HOSTING=FALSE
AWS_S3_ENDPOINT=minio-url.nl geopackage-validator validate --gpkg-path
/vsis3/bucketname/key/to/public.gpkg
Options:
--gpkg-path FILE Path pointing to the geopackage.gpkg file
[env var: GPKG_PATH]
-t, --table-definitions-path FILE
Path pointing to the table-definitions JSON
or YAML file (generate this file by calling
the generate-definitions command)
--validations-path FILE Path pointing to the set of validations to
run. If validations-path and validations are
not given, validate runs all validations
[env var: VALIDATIONS_FILE]
--validations TEXT Comma-separated list of validations to run
(e.g. --validations RQ1,RQ2,RQ3). If
validations-path and validations are not
given, validate runs all validations [env
var: VALIDATIONS]
--exit-on-fail Exit with code 1 when validation success is
false.
--yaml Output yaml.
--s3-endpoint-no-protocol TEXT Endpoint for the s3 service without protocol
[env var: S3_ENDPOINT_NO_PROTOCOL]
--s3-access-key TEXT Access key for the s3 service [env var:
S3_ACCESS_KEY]
--s3-secret-key TEXT Secret key for the s3 service [env var:
S3_SECRET_KEY]
--s3-bucket TEXT Bucket where the geopackage is on the s3
service [env var: S3_BUCKET]
--s3-key TEXT Key where the geopackage is in the bucket
[env var: S3_KEY]
--s3-secure BOOLEAN Use a secure TLS connection for S3. [env
var: S3_SECURE]
--s3-virtual-hosting TEXT TRUE value, identifies the bucket via a
virtual bucket host name, e.g.:
mybucket.cname.domain.com - FALSE value,
identifies the bucket as the top-level
directory in the URI, e.g.:
cname.domain.com/mybucket. Convenience
parameter, same as gdal AWS_VIRTUAL_HOSTING.
[env var: S3_VIRTUAL_HOSTING]
--s3-signing-region TEXT S3 signing region. Convenience parameter,
same as gdal AWS_DEFAULT_REGION. [env var:
S3_SIGNING_REGION]
--s3-no-sign-request TEXT When set, request signing is disabled. This
option might be used for buckets with public
access rights. Convenience parameter, same
as gdal AWS_NO_SIGN_REQUEST. [env var:
S3_NO_SIGN_REQUEST]
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG
--help Show this message and exit.
Examples:
docker run -v ${PWD}:/gpkg --rm pdok/geopackage-validator validate -t /path/to/generated_definitions.json --gpkg-path /gpkg/tests/data/test_allcorrect.gpkg
Run with specific validations only
Specified in file:
docker run -v ${PWD}:/gpkg --rm pdok/geopackage-validator validate --gpkg-path tests/data/test_allcorrect.gpkg --validations-path tests/validationsets/example-validation-set.json
Or specified on command line:
docker run -v ${PWD}:/gpkg --rm pdok/geopackage-validator validate --gpkg-path tests/data/test_allcorrect.gpkg --validations RQ1,RQ2,RQ3
Show validations
Show all the possible validations that are executed in the validate command.
Usage: geopackage-validator show-validations [OPTIONS]
Show all the possible validations that are executed in the validate
command.
Options:
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or DEBUG
--help Show this message and exit.
Generate table definitions
Usage: geopackage-validator generate-definitions [OPTIONS]
Generate table definition for a geopackage on local or S3 storage. Use the
generated definition JSON or YAML in the validation step by providing the
table definitions with the --table-definitions-path parameter. When the
filepath is preceded with '/vsi' the gdal virtual file system method will
be used to access the file on S3 and will not be directly downloaded. See
https://gdal.org/user/virtual_file_systems.html for further explanation.
For convenience the gdal vsi environment parameters and optional
parameters are provided with an S3_ instead of an AWS_ prefix. The AWS_
environment parameters will also work.
Examples:
viscurl:
geopackage-validator validate --gpkg-path /vsicurl/http://minio-
url.nl/bucketname/key/to/public.gpkg
vsis3:
geopackage-validator generate-definitions --gpkg-path
/vsis3/bucketname/key/to/public.gpkg --s3-signing-region eu-central-1
--s3-secret-key secret --s3-access-key acces-key --s3-secure=false
--s3-virtual-hosting false --s3-endpoint-no-protocol minio-url.nl
S3_SECRET_KEY=secret S3_ACCESS_KEY=acces-key S3_SIGNING_REGION=eu-
central-1 S3_SECURE=false S3_VIRTUAL_HOSTING=false
S3_ENDPOINT_NO_PROTOCOL=minio-url.nl geopackage-validator generate-definitions --gpkg-
path /vsis3/bucketname/key/to/public.gpkg
AWS_SECRET_ACCESS_KEY=secret AWS_ACCESS_KEY_ID=acces-key
AWS_DEFAULT_REGION=eu-central-1 AWS_HTTPS=NO AWS_VIRTUAL_HOSTING=FALSE
AWS_S3_ENDPOINT=minio-url.nl geopackage-validator generate-definitions --gpkg-path
/vsis3/bucketname/key/to/public.gpkg
Options:
--gpkg-path FILE Path pointing to the geopackage.gpkg file
[env var: GPKG_PATH]
--yaml Output yaml
--s3-endpoint-no-protocol TEXT Endpoint for the s3 service without protocol
[env var: S3_ENDPOINT_NO_PROTOCOL]
--s3-access-key TEXT Access key for the s3 service [env var:
S3_ACCESS_KEY]
--s3-secret-key TEXT Secret key for the s3 service [env var:
S3_SECRET_KEY]
--s3-bucket TEXT Bucket where the geopackage is on the s3
service [env var: S3_BUCKET]
--s3-key TEXT Key where the geopackage is in the bucket
[env var: S3_KEY]
--s3-secure BOOLEAN Use a secure TLS connection for S3. [env
var: S3_SECURE]
--s3-virtual-hosting TEXT TRUE value, identifies the bucket via a
virtual bucket host name, e.g.:
mybucket.cname.domain.com - FALSE value,
identifies the bucket as the top-level
directory in the URI, e.g.:
cname.domain.com/mybucket. Convenience
parameter, same as gdal AWS_VIRTUAL_HOSTING.
[env var: S3_VIRTUAL_HOSTING]
--s3-signing-region TEXT S3 signing region. Convenience parameter,
same as gdal AWS_DEFAULT_REGION. [env var:
S3_SIGNING_REGION]
--s3-no-sign-request TEXT When set, request signing is disabled. This
option might be used for buckets with public
access rights. Convenience parameter, same
as gdal AWS_NO_SIGN_REQUEST. [env var:
S3_NO_SIGN_REQUEST]
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG
--help Show this message and exit.
Local development
We advise using docker-compose for local development. This allows live editing and testing code with the correct gdal/ogr version with spatialite 5.0.0. First build the local image with your machines user id and group id:
docker-compose build --build-arg USER_ID=`id -u` --build-arg GROUP_ID=`id -g`
Usage
There will be a script you can run like this:
docker-compose run --rm validator geopackage-validator
This command has direct access to the files found in this directory. In case you want
to point the docker-compose to other files, you can add or edit the volumes in the docker-compose.yaml
Python console
Ipython is available in the docker:
docker-compose run --rm validator ipython
Code style
In order to get nicely formatted python files without having to spend manual work on it, run the following command periodically:
docker-compose run --rm validator black .
Tests
Run the tests regularly. This also checks with pyflakes and black:
docker-compose run --rm validator pytest
Releasing
Release in github by creating a new release in github.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdok_geopackage_validator-0.9.4rc1.tar.gz
.
File metadata
- Download URL: pdok_geopackage_validator-0.9.4rc1.tar.gz
- Upload date:
- Size: 30.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cf65fb95b0a1fbd5923424795a97046ea5b19bdad9256c72559ed1c6db95b5b |
|
MD5 | 7cdd677d95d220981b30c535a04a5cee |
|
BLAKE2b-256 | 4f085e5111ad6c185777df7797d7b0115e49020d5a13ff33c9aba6a3a63d95ba |
File details
Details for the file pdok_geopackage_validator-0.9.4rc1-py3-none-any.whl
.
File metadata
- Download URL: pdok_geopackage_validator-0.9.4rc1-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46037da166ce267554700026faf2d142ff7b9645dabeacbe8dfb7f0d6c9992c7 |
|
MD5 | 4564f4efbbf07a639122eea23a55c5e6 |
|
BLAKE2b-256 | bb0344b9f889a6efee731c622d18c1bf095429f91131b4d19674f5a4b7a42aaa |