Compact notation for JSON Schemas
Project description
JsonSchema Compact Notation
JSON Schema is very useful to document and validate inputs and outputs of JSON-based REST APIs. If you check everything that goes into your program (requests, configuration files…) and everything that goes out of it (responses), you'll catch a lot of errors and regressions early. Moreover, the schema also acts has a documentation for develpers and users alike, and a documentation that has to stay up-to-date: otherwise you get exceptions thrown around!
Unfortunately, IMO JSON Schema is neither as concise nor as human-readable as it should. Not as bad as XML schemas, but not great either. As a demonstration, here's a schema for a tiny subset of GeoJSON:
{ "type": "object",
"required": ["type", "geometry"],
"properties": {
"type": {"const": "Feature"},
"geometry": {
"anyOf": [
{"$ref": "#/definitions/point"},
{"$ref": "#/definitions/lineString"}]}},
"definitions": {
"coord": {
"type": "array",
"items": {"type": "number"},
"minItems": 2, "maxItems": 2},
"point": {
"type": "object",
"required": ["type", "coordinates"],
"properties": {
"type": {"const": "Point"},
"coordinates": {"$ref": "#/definitions/coord"}}},
"lineString": {
"type": "object",
"required": ["type", "coordinates"],
"properties": {
"type": {"const": "LineString"},
"coordinates": {
"type": "array",
"items": {"$ref": "#/definitions/coord"}}}}}}
Enter JSCN, a.k.a. "JSON Schema Compact Notation", which expresses the exact same schema as follows:
{
type: "Feature",
geometry: <point> | <lineString>
}
where coord = [number*]{2}
and point = {type: "Point", coordinates: <coord>}
and lineString = {type: "LineString", coordinates: [<coord>*]}
Not only is it much shorter; hopefully it reads as well as an informal documentation written for fellow developers.
JSCN allows to express most of JSON Schema, in a language which isn't a subset of JSON, but is much more compact and human-readable. This Python library implements a parser which translates JSCN into JSON Schema, and encapsulates the result in a Python object allowing actual validation through the JSON Schema library.
Below is an informal description of the grammar. Fluency with JSON is expected; familiarity with JSON Schema is probably not mandatory, but won't hurt.
Informal grammar
simple types
Litteral JSON types are accessible as keywords: boolean
, string
,
integer
, number
, null
.
Regular expression strings are represented by r
-prefixed litteral
strings, similar to Python's litterals: r"^[0-9]+$"
converts into
{"type": "string", "pattern": "^[0-9]+$"}
and matches strings which
conform to the regular expression. Beware that for JSON Schema, partial
matches are OK, for instance r"[0-9]+"
does match "foo123bar"
. Add
"^…$"
markers to match the whole string.
Predefined string formats are represented by f
-prefixed litteral
strings: f"uri"
converts into {"type": "string", "format": "uri"}
. The list of currently predefined formats can be found for
instance at
[https://json-schema.org/understanding-json-schema/reference/string.html#format].
JSON constants are introduced between back-quotes: `123`
converts to {"const": 123}
and will only match the number 123. If
several constants are joined with an |
operator, they are translated
into a JSON Schema enum: `1`|`2`
converts to {"enum": [1, 2]}
. For litteral constants (strings, numbers, booleans, null),
backquotes aren't mandatory, i.e. `"foo"`
and "foo"
are
equivalent.
Arrays
Arrays are described between square brackets:
[]
matches every possible array, and can also be writtenarray
.- An homogeneous, non-empty array of integers is denoted
[integer+]
- An homogeneous, possibly empty array of integers is denoted
[integer*]
- An array starting with two booleans is denoted
[boolean, boolean]
. It can also contain additional items after those two booleans, and those items don't have to be booleans. - In order to forbid additional items, add an
only
keyword at the beginning of the array:[only boolean, boolean]
will reject[true, false, 1]
, whereas[boolean, boolean]
would have accepted it. - Arrays support cardinal suffix between braces:
[]{7}
is an array with exactly 7 elements,[integer*]{3,8}
is an array with between 3 and 8 integers (inclusive),[]{_, 9}
an array with at most 9 elements,[string*]{4, _}
an array with at least 4 strings. - A uniqueness constraint can be added with the
unique
prefix, as in[unique integer+]
, which accepts[1, 2, 3]
but not[1, 2, 1]
since1
occurs more than once.
Strings and integers also support cardinal suffixes, e.g. string{16}
(a string of 16 characters), integer{_, 0xFFFF}
(an integer between
0 and 65533). Ranges and sizes are inclusive.
Objects
Objects are described between curly braces:
{ }
matches every possible object, and can also be writtenobject
.{"bar": integer}
is an object with one field"bar"
of type integer, and possibly other fields.- Quotes are optional around property names, if they are are valid
JavaScript identifiers other than
"_"
or"only"
: it's legal to write{bar: integer}
. - To prevent non-listed property names from being accepted, use a
prefix
only
just after the opening brace, as in{only "bar": integer}
. - Property names can be forced to comply with a regex, by an
only r"regex"
prefix, which can also be a reference to a definition:{only r"^[a-z]+$"}
, or the equivalent{only <word>} where word=r"^[a-z]$"+
. Beware that according to JSON Schema, even explicitly listed property names must comply with the regex. For instance, nothing can satisfy the schema{only r"^[0-9]+$", "except_this": _}
, becauser"^[0-9]+$"
doesn't match"except_this"
. To circumvent this limitation, you need to widen the regex with an"|"
, e.g.{only r"^([0-9]+|except_this)$"}
, or{only <key>} where key = `"except_this"` | r"^[0-9]+$"
. - In addition to enforcing a regex on property names, one can also
enforce a type constraint on the associated values:
{only <word>: integer}
. If you want a constraint on the type but not on the name, the name can be replaced by an underscore wildcard:{only _: integer}
. - A special type
forbidden
, equivalent to JSON Schema'sfalse
, can be used to specifically forbid a property name:{reserved_name?: forbidden}
. Notice that the question mark is mandatory: otherwise, it would both expect the property to exist, and accept no value in it.
Definitions
Definitions can be used in the schema, and given with a suffix where name0 = def0 and ... and nameX=defX
. References to definitions are
put between angles, for instance {author: <user_name>} where user_name = r"^\w+$"
. When dumping the schema into actual JSON Schema,
unused definitions are pruned, and missing definitions cause an error.
Definitions can only occur at top-level, i.e.
{foo: <bar>} where bar=number
is legal, but
{foo: (<bar> where bar=number)}
is not.
Operations between types
Types can be combined:
- With infix operator
&
:A & B
is the type of objects which respect both schemasA
andB
. - With infix operator
|
:A | B
is the type of objects which respect at least one of the schemasA
orB
.&
takes precedence over|
, i.e.A & B | C & D
is to be read as(A&B) | (C&D)
. - With conditional expressions:
if A then B elif C then D else E
will enforce constraintB
if constraintA
is met, enforceD
ifC
is met, or enforceE
if neitherA
norC
are met.elif
andelse
parts are optional. For instance,if {country: "USA"} then {postcode: r"\d{5}(-\d{4})?"} else {postcode: string}
will only check the postcode with the regex if the country is"USA"
. - Parentheses can be added to enforce precedences , e.g.
A & (B|C) & D
- There is also an
not
operator:{foo: not boolean}
.
From Python
Combinations can also be performed on Python objects, e.g. the
following Python expression is OK: Schema("{foo: number}") | Schema("{bar: number}")
, and produces a schema equivalent to
Schema("{foo: number}|{bar: number}")
. When definitions are merged
in Python with |
or &
, their definitions are merged as needed. If
a definition appears on both sides, it must be equal, i.e. one can
merge {foo: <n>} where n=number
with {bar: <n>} where n=number
but
not with {foo: <n>} where n=integer
.
More formally
schema ::= type («where» definitions)?
definitions ::= identifier «=» type («and» identifier «=» type)*
type ::= type «&» type # allOf those types; takes precedence over «|».
| type «|» type # anyOf those types.
| «(» type «)» # parentheses to enforce precedence.
| «not» type # anything but this type.
| «`»json_litteral«`» # just this JSON constant value.
| «<»identifier«>» # identifier refering to the matching top-level definition.
| r"regular_expression" # String matched by this regex.
| f"format" # JSON Schema draft7 string format.
| «string» cardinal? # a string, with this cardinal constraint on number of chars.
| «integer» cardinal? # an integer within the range described by cardinal.
| «integer» «/» int # an integer which must be multiple of that int.
| «object» # any object.
| «array» # any array.
| «boolean» # any boolean.
| «null» # the null value.
| «number» # any number.
| «forbidden» # empty type (used mostly to disallow a property name).
| object # structurally described object.
| array # structurally described array.
| conditional # conditional if/then/else rule
cardinal ::= «{» int «}» # Exactly that number of chars / items / properties.
| «{» «_», int «}» # At most that number of chars / items / properties.
| «{» int, «_» «}» # At least that number of chars / items / properties.
| «{» int, int «}» # A number of chars / items / properties within this range.
object ::= «{» object_restriction? (object_key «?»? «:» type «,»...)* «}» cardinal?
# if «only» occurs without a regex, no extra property is allowed.
# if «only» occurs with a regex, all extra property names must match that regex.
# if «?» occurs, the preceding property is optional, otherwise it's required.
object_restriction ::= ø
# Only explicitly listed property names are accepted:
| «only»
# every property name must conform to regex/reference:
| «only» (r"regex" | «<»identifier«>»)
# non-listed property names must conform to regex, values to type:
| «only» (r"regex" | «<»identifier«>» | «_»)«:» type
object_key ::= identifier # Litteral property name.
| «"»string«"» # Properties which aren't identifiers must be quoted.
array ::= «[» «only»? «unique»? (type «,»)* («*»|«+»|ø) «]» cardinal?
# if «only» occurs, no extra item is allowed.
# if «unique» occurs, each array item must be different from every other.
# if «*» occurs, the last type can be repeated from 0 to any times.
# Every extra item must be of that type.
# if «+» occurs, the last type can be repeated from 1 to any times.
# Every extra item must be of that type.
conditional ::= «if» type «then» type («elif» type «then» type)* («else» type)?
int ::= /0x[0-9a-fA-F]+/ | /[0-9]+/
identifier ::= /[A-Za-z_][A-Za-z_0-9]*/
TODO
Some things that may be added in future versions:
- on numbers:
- ranges over floats (reusing cardinal grammar with float boundaries)
- modulus constraints on floats
number/0.25
. - exclusive ranges in addition to inclusive ones. May use returned
braces, e.g.
integer{0,0x100{
as an equivalent forinteger{0,0xFF}
? - ranges alone are treated as integer ranges, i.e.
{1, 5}
is a shortcut forinteger{1, 5}
? Not sure whether it enhances readability, and there would be a need for float support in ranges then.
- combine string constraints: regex, format, cardinals... This can
already be achieved with operator
&
. - try to embedded
#
-comments as"$comment"
- Implementation:
- bubble up
?
markers in grammar to the top level.
- bubble up
- Syntax sugar:
- optional marker:
foobar?
is equivalent tofoobar|null
. Not sure whether it's worth it, the difference between a missing field and a field holdingnull
is most commonly not significant. - check that references as
propertyNames
indeed point at string types. - make keyword case-insensitive?
- treat
{foo: forbidden}
as{foo?: forbidden}
as it's the only thing that would make sense?
- optional marker:
- better error messages, on incorrect grammars, and on non-validating JSON data.
- reciprocal feature: try and translate a JSON Schema into a shorter and more readable JSCN source.
Usage
From command line
$ echo -n '[integer*]' | jscn -
{ "type": "array",
"items": {"type": "integer"},
"$schema": "http://json-schema.org/draft-07/schema#"
}
$ jscn --help
usage: jscn [-h] [-o OUTPUT] [-v] [--version] [filename]
Convert from a compact DSL into full JSON Schema.
positional arguments:
filename Input file; use '-' to read from stdin.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output file; defaults to stdout
-v, --verbose Verbose output
--version Display version and exit
From Python API
Python's jsonschema_cn
package exports two main constructors:
Schema()
, which compiles a source string into a schema object;Definitions()
, which compiles a source string (a sequence of definitions separated by keywordand
, as in ruledefinitions
of the formal grammar.
Schema objects have a jsonschema
property, which contains the Python
dict of the corresponding JSON Schema.
Schemas can be combined with Python operators &
("allOf"
) and |
("anyOf"
). When they have definitions, those definition sets are
merged, and definition names must not overlap.
Schemas can also be combined with definitions through |
, and
definitions can be combined together also with |
.
>>> from jsonschema import Schema, Definitions
>>> defs = Definitions("""
>>> id = r"[a-z]+" and
>>> byte = integer{0,0xff}
>>> """)
>>> s = Schema("{only <id>: <byte>}") | defs
>>> s.jsonschema
ValueError: Missing definition for byte
>>> s = s | defs
>>> s.jsonschema
{"$schema": "http://json-schema.org/draft-07/schema#"
"type": "object",
"propertyNames": {"$ref": "#/definitions/id"},
"additionalProperties": {"$ref": "#/definitions/byte"},
"definitions": {
"id": {"type": "string", "pattern": "[a-z]+"},
"byte": {"type": "integer", "minimum": 0, "maximum": 255}
}
}
>>> Schema("[integer, boolean+]{4}").jsonschema
{ "$schema": "http://json-schema.org/draft-07/schema#",
"type": "array",
"minItems": 4, "maxItems": 4,
"items": [{"type": "integer"}],
"additionalItems": {"type": "boolean"},
}
See also
If you spend a lot of time dealing with complex JSON data structures, you might also want to try jsview, a smarter JSON formatter, which tries to effectively use both your screen's width and height, by only inserting q carriage returns when it makes sense:
$ cat >schema.cn <<EOF
{ only codes: [<byte>+], id: r"[a-z]+", issued: f"date"}
where byte = integer{0, 0xFF}
EOF
$ jscn schema.cn
{"type": "object", "required": ["codes", "id", "issued"], "properties": {
"codes": {"type": "array", "items": [{"$ref": "#/definitions/byte"}], "ad
ditionalItems": {"$ref": "#/definitions/byte"}}, "id": {"type": "string",
"pattern": "[a-z]+"}, "issued": {"type": "string", "format": "date"}}, "a
dditionalProperties": false, "definitions": {"byte": {"type": "integer",
"minimum": 0, "maximum": 255}}, "$schema": "http://json-schema.org/draft-
07/schema#"}
$ cat schema.cn | jscn - | jsview -
{
"type": "object",
"required": ["codes", "id", "issued"],
"properties": {
"codes": {
"type": "array",
"items": [{"$ref": "#/definitions/byte"}],
"additionalItems": {"$ref": "#/definitions/byte"}
},
"id": {"type": "string", "pattern": "[a-z]+"},
"issued": {"type": "string", "format": "date"}
},
"additionalProperties": false,
"definitions": {"byte": {"type": "integer", "minimum": 0, "maximum": 255}},
"$schema": "http://json-schema.org/draft-07/schema#"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file jsonschema_cn-0.28.tar.gz
.
File metadata
- Download URL: jsonschema_cn-0.28.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52541e3427196f01fc8933abfdf1d03d7591b04cfa1445cf2720efd84b7a0bbe |
|
MD5 | c5c1629d185894345fb1b0f63bd6b8dc |
|
BLAKE2b-256 | fec45096b92da3b5fc541901845876c2b589b84dc94e5f25443aacae1e2c8858 |