Skip to main content

Spec and validator for directories, files and metadata based on JSON Schema and regexes.

Project description

Project status Docs CI Test Coverage Docs Coverage PyPIPkgVersion

dirschema


DirSchema Logo   

A directory structure and metadata linter based on JSON Schema.

JSON Schema is great for validating (files containing) JSON objects that e.g. contain metadata, but these are only the smallest pieces in the organization of a whole directory structure, e.g. of some dataset of project. When working on datasets of a certain kind, they might contain various types of data, each different file requiring different accompanying metadata, based on its file type and/or location.

DirSchema combines JSON Schemas and regexes into a solution to enforce structural dependencies and metadata requirements in directories and directory-like archives. With it you can for example check that:

  • only files of a certain type are in a location (e.g. only jpg files in directory img)
  • for each data file there exists a metadata file (e.g. test.jpg has test.jpg_meta.json)
  • each metadata file is valid according to some JSON Schema

If validating these kinds of constraints looks appealing to you, this tool is for you!

Dirschema features:

  • Built-in support for schemas and metadata stored as JSON or YAML
  • Built-in support for checking contents of ZIP and HDF5 archives
  • Extensible validation interface for advanced needs beyond JSON Schema
  • Both a Python library and a CLI tool to perform the validation

Installation

pip install dirschema

Getting Started

The dirschema tool needs as input:

  • a DirSchema YAML file (containing a specification), and
  • a path to a directory or file (e.g. zip file) that should be checked.

You can run it like this:

dirschema my_dirschema.yaml DIRECTORY_OR_ARCHIVE_PATH

If the validation was successful, there will be no output. Otherwise, the tool will output a list of errors (e.g. invalid metadata, missing files, etc.).

You can also use dirschema from other Python code as a library:

from dirschema.validate import DSValidator
DSValidator("/path/to/dirschema").validate("/dataset/path")

Similarly, the method will return an error dict, which will be empty if the validation succeeded.

You can find more information on using and contributing to this repository in the documentation.

How to Cite

If you want to cite this project in your scientific work, please use the citation file in the repository.

Acknowledgements

We kindly thank all authors and contributors.

HMC Logo    FZJ Logo

This project was developed at the Institute for Materials Data Science and Informatics (IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration (HMC), an incubator-platform of the Helmholtz Association within the framework of the Information and Data Science strategic initiative.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirschema-0.1.0.tar.gz (52.4 kB view details)

Uploaded Source

Built Distribution

dirschema-0.1.0-py3-none-any.whl (40.9 kB view details)

Uploaded Python 3

File details

Details for the file dirschema-0.1.0.tar.gz.

File metadata

  • Download URL: dirschema-0.1.0.tar.gz
  • Upload date:
  • Size: 52.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for dirschema-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f9334259953afd847799a4fd405a4dbefa027b48ce74096329bd7ac03a62250b
MD5 f72a2866e18f652d5b219b92eaf5c3dd
BLAKE2b-256 d25b7cc9a7bb63510aa906fafb968d286724b64221ddefadebf5c9c9ee61da17

See more details on using hashes here.

File details

Details for the file dirschema-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dirschema-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 40.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for dirschema-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 efca1a7a2431305b83f6373b1cd4b6a8cf789b047a434cf1174d82353d25bfb0
MD5 2409bfabc5e4e91c47b76b593d7785d0
BLAKE2b-256 ddad378110c2bb5a0f4dc43b67b85d8b1d9aa54398891babd674f5be1953a846

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page