Tools to work with Amsterdam schema.
Project description
amsterdam-schema-tools
Set of libraries and tools to work with Amsterdam schema.
Install the package with: pip install amsterdam-schema-tools
Currently, the following cli commands are available:
- schema import events
- schema import ndjson
- schema show schema
- schema show tablenames
- schema introspect db
- schema introspect geojson *.geojson
- schema validate
- schema permissions apply
The tools expect either a DATABASE_URL
environment variable or a command-line option --db-url
with a DSN.
The output is a json-schema output according to the Amsterdam schemas definition for the tables that are being processed.
Generate amsterdam schema from existing database tables
The --prefix argument controls whether table prefixes are removed in the schema, because that is required for Django models.
As example we can generate a BAG schema. Point DATABASE_URL
to bag_v11
database and then run :
schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq
The jq formats it nicely and it can be redirected to the correct directory in the schemas repository directly.
Express amsterdam schema information in relational tables
Amsterdam schema is expressed as jsonschema. However, to make it easier for people with a more relational mind- or toolset it is possible to express amsterdam schema as a set of relational tables. These tables are meta_dataset, meta_table and meta_field.
It is possible to convert a jsonschema into the relational table structure and vice-versa.
This command converts a dataset from an existing dataset in jsonschema format:
schema import schema <id of dataset>
To convert from relational tables back to jsonschema:
schema show schema <id of dataset>
Generating amsterdam schema from existing GeoJSON files
The following command can be used to inspect and import the GeoJSON files:
schema introspect geojson <dataset-id> *.geojson > schema.json
edit schema.json # fine-tune the table names
schema import geojson schema.json <table1> file1.geojson
schema import geojson schema.json <table2> file2.geojson
Importing GOB events
The schematools library has a module that read GOB events into database tables that are defines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream. It is also possible to read GOB events from a batch file with line-separeted events using:
schema import events <path-to-dataset> <path-to-file-with-events>
Schema Tools as a pre-commit hook
Included in the project is a pre-commit
hook
that can validate schema files
in a project such as amsterdam-schema
To configure it
extend the .pre-commit-config.yaml
in the project with the schema file defintions as follows:
- repo: https://github.com/Amsterdam/schema-tools
rev: v0.20.2
hooks:
- id: validate-schema
args: ['https://schemas.data.amsterdam.nl/schema@v1.1.1#']
exclude: |
(?x)^(
schema.+| # exclude meta schemas
datasets/index.json
)$
args
is a one element list
containing the URL to the Amsterdam Meta Schema.
validate-schema
will only process json
files.
However not all json
files are Amsterdam schema files.
To exclude files or directories use exclude
with pattern.
pre-commit
depends on properly tagged revisions of its hooks.
Hence we should take care to, not only bump version numbers
on updates to this package,
but also commit a tag with the version number.
This is automated by means of the tbump
tool.
Bumping a version from 0.18.1 to 0.18.2
and generating the appropriate git commits/tags
is as easy as running:
$ tbump 0.18.2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file amsterdam_schema_tools-0.21.6-py3-none-any.whl
.
File metadata
- Download URL: amsterdam_schema_tools-0.21.6-py3-none-any.whl
- Upload date:
- Size: 96.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3860ad593a192efddb6d3a6617e816b34aff512ccd7f26fb5278008f87183efc |
|
MD5 | 611d698d6492dfa1df99680631ebd508 |
|
BLAKE2b-256 | ab1e0ac0b86442532713d01d676eb46a3b998d2037771249e4b2350a3f86ccdb |