A lightweight package for validating JSON like Python objects
Project description
vtjson
A lightweight package for validating JSON like Python objects.
Schemas
Validation of JSON like Python objects is done according to a schema
which is somewhat inspired by a typescript type. The format of a schema is more or less self explanatory as the following example shows.
Example
Below is a simplified version of the schema of the run object in the mongodb database underlying the Fishtest web application https://tests.stockfishchess.org/tests
import math
from datetime import datetime
from bson.objectid import ObjectId
from vtjson import glob, ip_address, regex, url
net_name = regex("nn-[a-z0-9]{12}.nnue", name="net_name")
tc = regex(r"([1-9]\d*/)?\d+(\.\d+)?(\+\d+(\.\d+)?)?", name="tc")
str_int = regex(r"[1-9]\d*", name="str_int")
sha = regex(r"[a-f0-9]{40}", name="sha")
country_code = regex(r"[A-Z][A-Z]", name="country_code")
run_id = regex(r"[a-f0-9]{24}", name="run_id")
uuid = regex(r"[0-9a-zA-Z]{2,}(-[a-f0-9]{4}){3}-[a-f0-9]{12}", name="uuid")
epd_file = glob("*.epd", name="epd_file")
pgn_file = glob("*.pgn", name="pgn_file")
worker_info_schema = {
"uname": str,
"architecture": [str, str],
"concurrency": int,
"max_memory": int,
"min_threads": int,
"username": str,
"version": int,
"python_version": [int, int, int],
"gcc_version": [int, int, int],
"compiler": union("clang++", "g++"),
"unique_key": uuid,
"modified": bool,
"ARCH": str,
"nps": float,
"near_github_api_limit": bool,
"remote_addr": ip_address,
"country_code": union(country_code, "?"),
}
results_schema = {
"wins": int,
"losses": int,
"draws": int,
"crashes": int,
"time_losses": int,
"pentanomial": [int, int, int, int, int],
}
schema = {
"_id?": ObjectId,
"start_time": datetime,
"last_updated": datetime,
"tc_base": float,
"base_same_as_master": bool,
"rescheduled_from?": run_id,
"approved": bool,
"approver": str,
"finished": bool,
"deleted": bool,
"failed": bool,
"is_green": bool,
"is_yellow": bool,
"workers": int,
"cores": int,
"results": results_schema,
"results_info?": {
"style": str,
"info": [str, ...],
},
"args": {
"base_tag": str,
"new_tag": str,
"base_nets": [net_name, ...],
"new_nets": [net_name, ...],
"num_games": int,
"tc": tc,
"new_tc": tc,
"book": union(epd_file, pgn_file),
"book_depth": str_int,
"threads": int,
"resolved_base": sha,
"resolved_new": sha,
"msg_base": str,
"msg_new": str,
"base_options": str,
"new_options": str,
"info": str,
"base_signature": str_int,
"new_signature": str_int,
"username": str,
"tests_repo": url,
"auto_purge": bool,
"throughput": float,
"itp": float,
"priority": float,
"adjudication": bool,
"sprt?": {
"alpha": 0.05,
"beta": 0.05,
"elo0": float,
"elo1": float,
"elo_model": "normalized",
"state": union("", "accepted", "rejected"),
"llr": float,
"batch_size": int,
"lower_bound": -math.log(19),
"upper_bound": math.log(19),
"lost_samples?": int,
"illegal_update?": int,
"overshoot?": {
"last_update": int,
"skipped_updates": int,
"ref0": float,
"m0": float,
"sq0": float,
"ref1": float,
"m1": float,
"sq1": float,
},
},
"spsa?": {
"A": float,
"alpha": float,
"gamma": float,
"raw_params": str,
"iter": int,
"num_iter": int,
"params": [
{
"name": str,
"start": float,
"min": float,
"max": float,
"c_end": float,
"r_end": float,
"c": float,
"a_end": float,
"a": float,
"theta": float,
},
...,
],
"param_history?": [
[{"theta": float, "R": float, "c": float}, ...],
...,
],
},
},
"tasks": [
{
"num_games": int,
"active": bool,
"last_updated": datetime,
"start": int,
"residual?": float,
"residual_color?": str,
"bad?": True,
"stats": results_schema,
"worker_info": worker_info_schema,
},
...,
],
"bad_tasks?": [
{
"num_games": int,
"active": False,
"last_updated": datetime,
"start": int,
"residual": float,
"residual_color": str,
"bad": True,
"task_id": int,
"stats": results_schema,
"worker_info": worker_info_schema,
},
...,
],
}
Conventions
- As in typescript, a (string) key ending in
?
represents an optional key. The corresponding schema (the item the key points to) will only be used for validation when the key is present in the object that should be validated. A key can also be made optional by wrapping it asoptional_key(key)
. - If in a list/tuple the last entry is
...
(ellipsis) it means that the next to last entry will be repeated zero or more times. In this way generic types can be created. For example the schema[str, ...]
represents a list of strings.
Usage
To validate an object against a schema one can simply do
validate(schema, object)
If the validation fails this will throw a ValidationError
and the exception contains an explanation about what went wrong. The full signature of validate
is
validate(schema, object, name="object", strict=True, subs={})
- The optional argument
name
is used to refer to the object being validated in the returned message. - The optional argument
strict
indicates whether or not the object being validated is allowed to have keys/entries which are not in the schema. - The optional argument
subs
is a dictionary whose keys are labels (see below) and whose values are substitution schemas for schemas with those labels.
Wrappers
A wrapper takes one or more schemas as arguments and produces a new schema.
- An object matches the schema
union(schema1, ..., schemaN)
if it matches one of the schemasschema1, ..., schemaN
. - An object matches the schema
intersect(schema1, ..., schemaN)
if it matches all the schemasschema1, ..., schemaN
. - An object matches the schema
complement(schema)
if it does not matchschema
. - An object matches the schema
lax(schema)
if it matchesschema
when validated withstrict=False
. - An object matches the schema
strict(schema)
if it matchesschema
when validated withstrict=True
. - An object matches the schema
set_name(schema, name)
if it matchesschema
. But thename
argument will be used in non-validation messages. - An object matches the schema
quote(schema)
if it is equal toschema
. For example the schemastr
matches strings but the schemaquote(str)
matches the objectstr
. - An object matches the schema
set_label(schema, label1, ..., labelN, debug=False)
if it matchesschema
, unless the schema is replaced by a different one via thesubs
argument tovalidate
. If the optional argumentdebug
isTrue
then a message will be printed on the console if the schema was changed.
Built-ins
Some built-ins take arguments. If no arguments are given then the parentheses can be omitted. So email
is equivalent to email()
. Some built-ins have an optional name
argument. This is used in non-validation messages.
regex(pattern, name=None, fullmatch=True, flags=0)
. This matches the strings which match the given pattern. By default the entire string is matched, but this can be overruled via thefullmatch
argument. Theflags
argument has the usual meaning.glob(pattern, name=None)
. Unix style filename matching. This is implemented usingpathlib.PurePath().match()
.div(divisor, remainder=0, name=None)
. This matches the integersx
such that(x - remainder) % divisor
== 0.close_to(x, abs_tol=None, rel_tol=None)
. This matches the floats that are close tox
in the sense ofmath.isclose
.email
. Checks if the object is a valid email address. This uses the packageemail_validator
. Theemail
schema accepts the same options asvalidate_email
in loc. cit.ip_address(version=None)
. Matches ip addresses of the specified version which can be 4, 6 or None.url
. Matches valid urls.domain_name(ascii_only=True, resolve=False)
. Checks if the object is a valid domain name. Ifascii_only=False
then allow IDNA domain names. Ifresolve=True
check if the domain name resolves.date_time(format=None)
. Without argument this represents an ISO 8601 date-time. Theformat
argument represents a format string forstrftime
.date
andtime
. These represent an ISO 8601 date and an ISO 8601 time.anything
. Matches anything. This is functionally the same as justobject
.nothing
. Matches nothing.
Mixins
Mixins are built-ins that are usually combined with other schemas using intersect
.
one_of(key1, ..., keyN)
. This represents a dictionary with exactly one key amongkey1, ..., keyN
.at_least_one_of(key1, ..., keyN)
. This represents a dictionary with a least one key amongkey1, ..., keyN
.at_most_one_of(key1, ..., keyN)
. This represents an dictionary with at most one key amongkey1, ..., keyN
.keys(key1, ..., keyN)
. This represents a dictionary containing all the keys inkey1, ..., keyN
.interval(lb, ub, strict_lb=False, strict_ub=False)
. This checks iflb <= object <= ub
, provided the comparisons make sense. An upper/lowerbound...
(ellipsis) means that the corresponding inequality is not checked. The optional argumentsstrict_lb
,strict_ub
indicate whether the corresponding inequalities should be strict.gt(lb)
. This checks ifobject > lb
.ge(lb)
. This checks ifobject >= lb
.lt(ub)
. This checks ifobject < ub
.le(ub)
. This checks ifobject <= ub
.size(lb, ub=None)
. Matches the objects (which supportlen()
such as strings or lists) whose length is in the interval[lb, ub]
. The value ofub
can be...
(ellipsis). Ifub=None
thenub
is set tolb
.fields({field1: schema1, field2: schema2, ..., fieldN: schemaN})
. Matches Python objects with attributesfield1, field2, ..., fieldN
whose corresponding values should validate againstschema1, schema2, ..., schemaN
respectively.magic(mime_type, name=None)
. Checks if a buffer (for example a string or a byte array) has the given mime type. This is implemented using thepython-magic
package.filter(callable, schema, filter_name=None)
. Appliescallable
to the object and validates the result withschema
. If the callable throws an exception then validation fails. The optional argumentfilter_name
is used in non-validation messages.
Conditional schemas
ifthen(if_schema, then_schema, else_schema=None)
. If the object matches theif_schema
then it should also match thethen_schema
. If the object does not match theif_schema
then it should match theelse_schema
, if present.cond((if_schema1, then_schema1), ... , (if_schemaN, then_schemaN))
. An object is successively validated againstif_schema1
,if_schema2
, ... until a validation succeeds. When this happens the object should match the correspondingthen_schema
. If noif_schema
succeeds then the object is considered to have been validated. If one setsif_schemaN
equal toanything
then this serves as a catch all.
Pre-compiling a schema
An object matches the schema compile(schema)
if it matches schema
. vtjson
compiles a schema before using it for validation, so pre-compiling is not necessary. However for large schemas it may gain some of performance as it needs to be done only once. Compiling is an idempotent operation. It does nothing for an already compiled schema.
The full signature of compile()
is
compile(schema)
Schema format
A schema can be, in order of precedence:
-
An instance of the class
compiled_schema
.The class
compiled_schema
defines a single method with signature__validate__(self, object, name, strict, subs)
The parameters of
__validate__()
have the same semantics as those ofvalidate()
. The return value of__validate__()
should be the empty string if validation succeeds, and otherwise it should be an explanation about what went wrong. -
A subclass of
compiled_schema
with a no-argument constructor. -
An object having a
__validate__
attribute with signature__validate__(object, name, strict, subs)
as above.
-
An object having a
__compile__
attribute with signature__compile__(_deferred_compiles=None)
This is an advanced feature which is used for the implementation of wrapper schemas. The function
compile
, which was discussed above, internally invokes_compile(schema, _deferred_compiles=None)
where the optional argument
_deferred_compiles
is an opaque data structure used for handling recursive schemas. If appropriate, the function_compile
internally invokes the methodschema.__compile__
and this should produce an instance of the classcompiled_schema
. The method__compile__
may invoke the function_compile
again. If this happens then the optional argument_deferred_compiles
should be passed unmodified. Please consult the source code ofvtjson
for more details. -
A Python type hint such as
list[str]
. This is discussed further below. -
A Python type. In that case validation is done by checking membership. By convention the schema
float
matches both ints and floats. Similarly the schemacomplex
matches ints and floats besides of course complex numbers. -
A callable. Validation is done by applying the callable to the object. If applying the callable throws an exception then the corresponding message will be part of the non-validation message.
-
A
list
or atuple
. Validation is done by first checking membership of the corresponding types, and then performing validation for each of the entries of the object being validated against the corresponding entries of the schema. -
A dictionary. Validation is done by first checking membership of the
dict
type, and then performing validation for each of the values of the object being validated against the corresponding values of the schema. Keys are themselves considered as schemas. E.g.{str: str}
represents a dictionary whose keys and values are both strings. A more elaborate discussion of validation of dictionaries is given below. -
A
set
. A set validates an object if the object is a set and the elements of the object are validated by an element of the schema. -
An arbitrary Python object. Validation is done by checking equality of the schema and the object, except when the schema is
float
, in which casemath.isclose
is used. Below we call such an object aconst schema
.
Validating dictionaries
For a dictionary schema containing only const keys
(i.e. keys corresponding to a const schema
) the interpretation is obvious (see the introductory example above). Below we discuss the validation of an object against a dictionary schema in the general case.
- First we verify that the object is also a dictionary. If not then validation fails.
- We verify that all non-optional const keys of the schema are also keys of the object. If this is not the case then validation fails.
- Now we make a list of all the keys of the schema (both optional and non-optional). The result will be called the
key list
below. - The object will pass validation if all its keys pass validation. We next discuss how to validate a particular key of the object.
- If none of the entries of the key list validate the given key and
strict==True
(the default) then the key fails validation. If on the other handstrict==False
then the key passes. - Assuming the fate of the given key hasn't been decided yet, we now match it against all entries of the key list. If it matches an entry and the corresponding value also validates then the key is validated. Otherwise we keep going through the key list.
- If the entire key list is consumed then the key fails validation.
A consequence of this algorithm is that non-const keys are automatically optional. So applying the wrapper optional_key
to them is meaningless and has no effect.
Type hints integration
Type hints as schemas
vtjson
recognizes the following type hints as schemas.
Annotated, dict[...], Dict[...], list[...], List[...], tuple[...], Protocol,
Tuple[...], Literal, NewType, TypedDict, Union (or the equivalent operator |).
For example dict[str, str]
is translated internally into the schema {str: str}
. See below for more information.
Annotated
-
More general vtjson schemas can work along Python type hints by using
typing.Annotated
contruct. The most naive way to do this is viaAnnotated[type_hint, vtjson_schema, skip_first]
For example
Annotated[list[object], [int, str, float], skip_first]
A type checker such as
mypy
will only see the type hint (list[object]
in the example), whereas vtjson will only see the vtjson schema ([int, str, float]
in the example).skip_first
is a built-in short hand forApply(skip_first=True)
(see below) which directs vtjson to ignore the first argument of anAnnotated
schema. -
In some use cases a vtjon_schema will meaningfully refine a Python type or type hint. In that case one should not use
skip_first
. For example:Annotated[datetime, fields({"tzinfo": timezone.utc})]
defines a
datetime
object whose time zone isutc
.The built-in schemas already check that an object has the correct type. So for those one should use
skip_first
. For example:Annotated[int, div(2), skip_first]
matches even integers.
-
If one wants to pre-compile a schema and still use it as a type hint (assuming it is valid as such) then one can do:
schema = <schema definition> Schema = Annotated[schema, compile(schema), skip_first]
Supported type hints
Note that Python imposes strong restrictions on what constitutes a valid type hint but vtjson
is much more lax about this. Enforcing the restrictions is left to the type checkers or the Python interpreter.
-
TypedDict
A TypedDict type hint is translated into adict
schema. E.g.class Movie(TypedDict): title: str price: float
internally becomes
{"title": str, "price": float}
.vtjson
supports thetotal
option toTypedDict
as well as theRequired
andNotRequired
annotations of fields, if they are compatible with the Python version being used. -
Protocol
. A class implementing a protocol is translated into a fields schemas. E.g.class Movie(Protocol): title: str price: float
internally becomes
fields({"title": str, "price": float})
. -
Annotated
has already been discussed. It is translated into a suitableintersect
schema. The handling ofAnnotated
schemas can be influenced byApply
objects (see below). -
NewType
is translated into aset_name
schema. E.g.NewType('Movie', str)
becomesset_name(str, 'Movie')
-
dict[...]
andDict[...]
are translated into the equivalentdict
schemas. E.g.dict[str, str]
becomes{str: str}
. -
tuple[...]
andTuple[...]
are translated into the equivalenttuple
schemas. -
list[...]
andList[...]
are translated into the equivalentlist
schemas. -
Union
and the|
operator are translated intounion
. -
Literal
is also translated intounion
.
Apply objects
-
If the list of arguments of an Annotated schema includes Apply objects then those modify the treatement of the arguments that come before them. We already encountered
skip_first
which is a built-in alias forApply(skip_first=True)
. The full signature ofApply
isApply(skip_first=False, name=None, labels=None)
The optional
name
argument indicates that the correspondingset_name
command should be applied to the previous arguments. The optionallabels
argument (a list if present) indicates that the correspondingset_label
command should be applied to the previous arguments. -
Multiple
Apply
objects are allowed. E.g. the following contrived schemaAnnotated[int, str, skip_first, float, skip_first]
is equivalent to
float
.
Safe cast
Vtjson includes the command
safe_cast(schema, object)
(where schema
should be a valid type hint) that functions exactly like cast
except that it also verifies at run time that the given object matches the given schema.
Creating types
A cool feature of vtjson
is that one can transform a schema into a genuine Python type via
t = make_type(schema)
so that validation can be done via
isinstance(object, t)
The drawback, compared to using validate
directly, is that there is no feedback when validation fails. You can get it back as a console debug message via the optional debug
argument to make_type
.
The full signature of make_type
is
make_type(schema, name=None, strict=True, debug=False, subs={})
The optional name
argument is used to set the __name__
attribute of the type. If it is not supplied then vtjson
tries to make an educated guess.
Examples
>>> from vtjson import set_name, union, validate
>>> schema = {"fruit" : union("apple", "pear", "strawberry"), "price" : float}
>>> object = {"fruit" : "dog", "price": 1.0 }
>>> validate(schema, object)
...
vtjson.ValidationError: object['fruit'] (value:'dog') is not equal to 'pear' and object['fruit'] (value:'dog') is not equal to 'strawberry' and object['fruit'] (value:'dog') is not equal to 'apple'
>>> fruit = set_name(union("apple", "pear", "strawberry"), "fruit")
>>> schema = {"fruit" : fruit, "price" : float}
>>> validate(schema, object)
...
vtjson.ValidationError: object['fruit'] (value:'dog') is not of type 'fruit'
>>> object = {"fruit" : "apple"}
>>> validate(schema, object)
...
vtjson.ValidationError: object['price'] is missing
A good source of more advanced examples is the file schemas.py
in the source distribution of Fishtest. Another source of examples is the file test_validate.py
in the source distribution of vtjson
.
FAQ
Q: Why not just use the Python implementation of JSON schema
(see https://pypi.org/project/jsonschema/)?
A: Various reasons.
- A
vtjson
schema is much more concise than aJSON
schema! vtjson
can validate objects which are more general than strictlyJSON
. See the introductory example above.- More fundamentally, the design philosophy of
vtsjon
is different. AJSON
schema is language independent and fully declarative. These are very nice properties but, this being said, declarative languages have a tendency to suffer from feature creep as they try to deal with more and more exotic use cases (e.g.css
). Avtjson
schema on the other hand leverages the versatility of the Python language. It is generally declarative, with a limited, but easily extendable set of primitives. But if more functionality is needed then it can be extended by using appropriate bits of Python code (as theordered_pair
example below illustrates). In practice this is what you will need in any case since a purely declarative language will never be able to deal with every possible validation scenario.
Q: Why yet another Python validation framework?
A: Good question! Initially vtjson
consisted of home grown code for validating api calls and database accesses in the Fishtest framework. However the clear and concise schema format seemed to be of independent interest and so the code was refactored into the current self-contained package.
Q: Why are there no variables in vtjson
(see https://opis.io/json-schema/2.x/variables.html)?
A: They did not seem to be essential yet. In our use cases conditional schemas were sufficient to achieve the required functionality. See for example the action_schema
in schemas.py
. More importantly vtjson
has a strict separation between the definition of a schema and its subsequent use for validation. By allowing a schema to refer directly to the object being validated this separation would become blurred. This being said, I am still thinking about a good way to introduce variables.
Q: Does vtjson
support recursive schemas?
A: Yes. But it requires a bit of Python gymnastics to create them. Here is an example
person={}
person["mother"]=union(person, None)
person["father"]=union(person, None)
which matches e.g.
{"father": {"father": None, "mother": None}, "mother": {"father": None, "mother": None}}
Note that you can create an infinite recursion by validating a recursive object against a recursive schema.
Q: How to combine validations?
A: Use intersect
(or Annotated
if applicable). For example the following schema validates positive integers but reject positive floats.
schema = intersect(int, interval(0, ...))
More generally one may use the pattern intersect(schema, more_validations)
where the first argument makes sure that the object to be validated has the required layout to be an acceptable input for the later arguments. For example an ordered pair of integers can be validated using the schema
def ordered_pair(o):
return o[0] <= o[1]
schema = intersect((int, int), ordered_pair)
Or in a one liner
schema = intersect((int, int), set_name(lambda o: o[0] <= o[1], "ordered_pair"))
The following also works if you are content with less nice output on validation failure (try it)
schema = intersect((int, int), lambda o: o[0] <= o[1])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vtjson-2.1.1.tar.gz
.
File metadata
- Download URL: vtjson-2.1.1.tar.gz
- Upload date:
- Size: 35.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6c9d1b69f08d7b837c19252658f5d40dbdefe2fc476cbcb064a525af1cbe83f |
|
MD5 | 4c7e56578b2927ce340c1b7c44058669 |
|
BLAKE2b-256 | 7ddf02f6f94eec84ff9d49761c0a15ed733155c66f09dd470aa8775468a9b71d |
Provenance
File details
Details for the file vtjson-2.1.1-py3-none-any.whl
.
File metadata
- Download URL: vtjson-2.1.1-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f07a54f83d0a9dc0c03a0132ac688e7612aacbb48449ff804722302a7b84a88a |
|
MD5 | 4a5a9637718ae45bf53e4a4b8a430303 |
|
BLAKE2b-256 | 1c02e11519dead66a9be0996a9ea8c79ac5de58c20dd301fdd09c8b74300d256 |