Skip to main content

OArepo Validate library for record metadata validation

Project description

oarepo-validate

image image image image image

OArepo Validate library for model-level matedata validation

Installation

    pip install oarepo-validate

Usage

The library provides mixins for enforcing json schema and marshmallow validation.

JSON schema validation

If $schema is present on metadata, invenio performs a json schema validation inside the validate() method. The problem is that $schema can be set/removed via the REST API. This means that an ill-written client can completely bypass the validation.

To mitigate this issue, create your own Record implementation:

from oarepo_validate import SchemaKeepingRecordMixin
from invenio_records import Record

class MyRecord(SchemaKeepingRecordMixin, Record):
    ALLOWED_SCHEMAS = ('records/record-v1.0.0.json', 'records/record-v2.0.0.json')
    PREFERRED_SCHEMA = 'records/record-v2.0.0.json'

And register the record in REST endpoints in configuration:

RECORD_PID = 'pid(recid,record_class="my:MyRecord")'

RECORDS_REST_ENDPOINTS = {
    'records': dict(
        pid_type='recid',
        pid_minter='recid',
        pid_fetcher='recid',
        record_class='my:MyRecord',
        item_route='/records/<{0}:pid_value>'.format(RECORD_PID),
        # ...
    )
}

Create record

When creating a new record, if $schema is not set, MyRecord.PREFERRED_SCHEMA is added automatically. If $schema is set, it is validated against MyRecord.ALLOWED_SCHEMAS and an exception is raised if the schema is not present in ALLOWED_SCHEMAS.

PUT / PATCH record

Before the result of the operation is committed, $schema is checked again.

Marshmallow validation

In invenio, REST create operation use the following sequence:

<flask>
<invenio_records_rest.views.RecordsListResource:post>
   <loader>
      <marshmallow>
   <permission factory>
   <pid minter>
   <record_class.create>
      <record.commit>
         <record.validate>

REST PUT operation then uses:

<flask>
<invenio_records_rest.views.RecordResource:put>
   <permission factory>
   <loader>
      <marshmallow>
   <record.update>
   <record.commit>
      <record.validate>

REST PATCH operation:

<flask>
<invenio_records_rest.views.RecordResource:put>
   <permission factory>
   <simple json loader>
   <record.patch>
   <record.commit>
      <record.validate>

As you can see, if you place any validation code in loader's marshmallow, it is not executed. An alternative is to have the validation code in validate and handle all validations there. This library does exactly this - it provides a record mixin that calls marshmallow schema's load method inside its validate method.

Usage

Create your own record and inherit from the mixin:

from oarepo_validate import MarshmallowValidatedRecordMixin
from invenio_records import Record
from marshmallow import Schema, fields

class TestSchema(Schema):
    name = fields.Str(required=True)

class MyRecord(MarshmallowValidatedRecordMixin, Record):
    MARSHMALLOW_SCHEMA = TestSchema

Do not forget to register it as in the previous example.

Now marshmallow schema will be processed before each commit method.

What about marshmallow in loader?

In most cases, marshmallow schema in loader can be removed and a simple json loader used instead. However, if you need a custom processing of input data that is independent of validation, you can keep the two marshmallows. To achieve this, use oarepo_validate.json_loader as the record loader.

RECORDS_REST_ENDPOINTS = {
    'recid': dict(
        record_loaders={
            'application/json': 'oarepo_validate:json_loader',
        },
        # ...
    )
}

A special case is when the marshmallow in loader already includes validation marshmallow rules. Then you would want to use loader's marshmallow for create / replace and marshmallow in validation only for patch operation (so that the same marshmallow rules are not called twice). To accomplish this, set:

class MyRecord(MarshmallowValidatedRecordMixin, Record):
    MARSHMALLOW_SCHEMA = TestSchema

    VALIDATE_MARSHMALLOW = False
    VALIDATE_PATCH = True

VALIDATE_MARSHMALLOW will switch off marshmallow validation in validate method and VALIDATE_PATCH will switch on marshmallow validation in patch method.

Context

Marshmallow validation is called with a context, that is filled with:

  • record
  • pid if it is known
  • Any **kwargs passed to Record.create or Record.commit

Signals

The library provides the following signals:

before_marshmallow_validate = signal('oarepo_before_marshmallow_validate')
"""
Signal invoked before record metadata are validated (loaded by marshmallow schema)
inside Record.validate

:param source:  the record being validated
:param record:  the record being validated
:param context: marshmallow context
:param **kwargs: kwargs passed to Record.create or Record.commit (or Record.validate)
"""

after_marshmallow_validate = signal('oarepo_after_marshmallow_validate')
"""
Signal invoked after record metadata are validated (loaded by marshmallow schema)
inside Record.validate

:param source:  the record being validated
:param record:  the record that was successfully validated
:param context: marshmallow context
:param result:  result of load that will be used to update record's metadata.
                Signal handler can modify it.
:param **kwargs: kwargs passed to Record.create or Record.commit (or Record.validate)
"""

Serializers

If marhsmallow.dump is not required for metadata serialization, oarepo_validate.json_search, oarepo_validate.json_response are faster replacements for marshmallow-based serializers:

RECORDS_REST_ENDPOINTS = {
    'recid': dict(
        record_serializers={
            'application/json': 'oarepo_validate:json_response',
        },
        search_serializers={
            'application/json': 'oarepo_validate:json_search',
        }
    )
}

Changes

Version 1.2.3 (released 2020-08-30)

  • Handling pid field in search hit serialization

Version 1.2.2 (released 2020-08-25)

  • Handling pid field in record serialization

Version 1.2.1 (released 2020-08-25)

  • Keeping schema in Record.__init__ (useful mostly for tests)

Version 1.2.0 (released 2020-08-25)

  • Added marshmallow-less loaders and serializers

Version 1.1.0 (released 2020-08-18)

  • Added before and after validation signals.

Version 1.0.0 (released 2020-08-16)

  • Initial public release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oarepo-validate-1.2.4.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oarepo_validate-1.2.4-py2.py3-none-any.whl (16.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file oarepo-validate-1.2.4.tar.gz.

File metadata

  • Download URL: oarepo-validate-1.2.4.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.0

File hashes

Hashes for oarepo-validate-1.2.4.tar.gz
Algorithm Hash digest
SHA256 3df781acd81c69f683bb2fd14e20f9e8d403f673899b07c404d2b81f2fde8b28
MD5 6780e5c04d3cea1da2680851d13c2377
BLAKE2b-256 1ade499337b43c3dd1ba310c6ff87a6ae872683a6bcaea82dc0ff7370712f633

See more details on using hashes here.

File details

Details for the file oarepo_validate-1.2.4-py2.py3-none-any.whl.

File metadata

  • Download URL: oarepo_validate-1.2.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.0

File hashes

Hashes for oarepo_validate-1.2.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b07cd36bbbe0cde7beff2df7959308595f270d7dd4b37de0e04b266e619ccedd
MD5 f4206fa52e470a6383cc7966a140102d
BLAKE2b-256 a3b6b378e4332077716bec57d16500633bf54240e15f47eaf553fa0a7e86a9ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page