Compact notation for JSON Schemas

Project description

JsonSchema Compact Notation

Json-schema is very useful to document and validate inputs and outputs of JSON-based REST APIs. Unfortunately, the schemas are more verbose and less human-readable than one may wish. This library defines a more compact syntax to describe JSON schemas, as well as a parser to convert such specifications into actual JSON schema.

At some point in the future, this library may also offer the way back, from JSON schemas back to a compact notation.

Informal grammar

Litteral JSON types are accessible as keywords: boolean, string, integer, number, null.

Regular expression strings are represented by r-prefixed litteral strings, similar to Python's litterals: r"^[0-9]+$" converts into {"type": "string", "pattern": "^[0-9]+$"}.

Predefined formats are represented by f-prefixed litteral strings: f"uri" converts into {"type": "string", "format": "uri"}.

JSON constants are introduced between back-quotes: `123` converts to {"const": 123}. If several constants are joined with an | operator, they are translated into an enum: `1`|`2` converts to {"enum": [1, 2]}.

Arrays are described between square brackets:

  • [] describes every possible array, and can also be written array.
  • an homogeneous, non-empty array of integers is denoted [integer+]
  • an homogeneous, possibly empty array of integers is denoted [integer*]
  • an array starting with two booleans is denoted [boolean, boolean]. It can also contain additional items after those two booleans.
  • To forbid additional items, add an only keyword at the beginning of the array: [only boolean, boolean] will reject [true, false, 1], whereas [boolean, boolean] would have validated it.
  • arrays support cardinal suffix between braces: []{7} is an array of 7 elements, [integer*]{3,8} is an array of between 3 and 8 integers (inclusive), []{_, 9} an array of at most 9 elements, [string*]{4, _} an array of at least 4 strings.
  • a uniqueness constraint can be added with the unique prefix, as in [unique integer+], which will allow [1, 2, 3] but not [1, 2, 1] since 1 occurs more than once.

Strings and integers also support cardinal suffixes, e.g. string{16}, integer{_, 0xFFFF}. Integer ranges as well as sizes are inclusive.

Objects are described between curly braces:

  • { } describes every possible object, and can also be written object.
  • {"bar": integer} is an object with one field "bar" of type integer, and possibly other fields.
  • Quotes are optional around property names, if they are identifiers other than "_" or "only": it's legal to write {bar: integer}.
  • To prevent non-listed property names from being accepted, use a prefix only, as in {only "bar": integer}.
  • property names can be forced to comply with a regex, by an only r"regex" prefix, which can also be a reference to a definition: {only r"^[a-z]+$"}, or the equivalent {only <word>} where word=r"^[a-z]$"+. Beware that according to jsonschema, even explicitly listed property names must comply with the regex, for instance nothing can satisfy the schema {only r"^[0-9]+$", "except_this": _}. You can circumvent this limitation in several ways, e.g. {only r"^([0-9]+|except_this)$"}, or {only <key>} where key = `"except_this"` | r"^[0-9]+$".
  • In addition to enforcing a regex on property names, one can also enforce a type constraint on the associated values: {only <word>: integer}. If no naming constraint is desired, the name can be replaced by an underscore wildcard: {only _: integer}.
  • A special type forbidden, equivalent to JSONSchema's false, can be used to specifically forbid a property name: {reserved_name?: forbidden}. Notice that the question mark is mandatory: otherwise, it would both expect the property to exist, and accept no value in it.

Definitions can be used in the schema, and given with a suffix where name0 = def0 and ... and nameX=defX. References to definitions are put between angles, for instance {author: <user_name>} where user_name = r"^\w+$". When dumping the schema into actual jsonschema, unused definitions are pruned, and missing definitions cause an error. Definitions can only occur at top-level, i.e. {foo: <bar>} where bar=numberis legal, but{foo: (<bar> where bar=number)}` is not.

Types can be combined:

  • With infix operator &: A & B is the type of objects which respect both schemas A and B.
  • With infix operator |: A | B is the type of objects which respect at least one of the schemas A or B. & takes precedence over |, i.e. A & B | C & D is to be read as (A&B) | (C&D).
  • With conditional expressions: if A then B elif C then D else E will enforce constraint B if constraint A is met, enforce D if C is met, or enforce E if neither A nor C are met. elif and else parts are optional. For instance, if {country: "USA"} then {postcode: r"\d{5}(-\d{4})?"} else {postcode: string} will only check the postcode with the regex if the country is "USA".
  • Parentheses can be added to enforce precedences , e.g. A & (B|C) & D

Combinations can also be performed on Python objects, e.g. the following Python expression is OK: Schema("{foo: number}") | Schema("{bar: number}"), and produces a schema equivalent to Schema("{foo: number}|{bar: number}"). When definitions are merged in Python with |or&, their definitions are merged as needed. If a definition appears on both sides, it must be equal, i.e. one can merge {foo: <n>} where n=numberwith{bar: <n>} where n=number but not with{foo: <n>} where n=integer`.

More formally

schema ::= type («where» definitions)?

definitions ::= identifier «=» type («and» identifier «=» type)*

type ::= type «&» type          # allOf those types; takes precedence over «|».
       | type «|» type          # anyOf those types.
       | «(» type «)»           # parentheses to enforce precedence.
       | «not» type             # anything but this type.
       | «`»json_litteral«`»    # just this JSON constant value.
       | «<»identifier«>»       # identifier refering to the matching top-level definition.
       | r"regular_expression"  # String matched by this regex.
       | f"format"              # json-schema draft7 string format.
       | «string» cardinal?     # a string, with this cardinal constraint on number of chars.
       | «integer» cardinal?    # an integer within the range described by cardinal.
       | «integer» «/» int      # an integer which must be multiple of that int.
       | «object»               # any object.
       | «array»                # any array.
       | «boolean»              # any boolean.
       | «null»                 # the null value.
       | «number»               # any number.
       | «forbidden»            # empty type (used mostly to disallow a property name).
       | object                 # structurally described object.
       | array                  # structurally described array.
       | conditional            # conditional if/then/else rule

cardinal ::= «{» int «}»        # Exactly that number of chars / items / properties.
           | «{» «_», int «}»   # At most that number of chars / items / properties.
           | «{» int, «_» «}»   # At least that number of chars / items / properties.
           | «{» int, int «}»   # A number of chars / items / properties within this range.

object ::= «{» object_restriction? (object_key «?»? «:» type «,»...)* «}» cardinal?
         # if «only» occurs without a regex, no extra property is allowed.
         # if «only» occurs with a regex, all extra property names must match that regex.
         # if «?» occurs, the preceding property is optional, otherwise it's required.

object_restriction ::= ø
                     # Only explicitly listed property names are accepted:
                     | «only»
                     # non-listed property names must conform to regex/reference:
                     | «only» (r"regex" | «<»identifier«>»)
                     # non-listed property names must conform to regex, values to type:
                     | «only» (r"regex" | «<»identifier«>» | «_»)«:» type

object_key ::= identifier    # Litteral property name.
             | «"»string«"»  # Properties which aren't identifiers must be quoted.

array ::= «[» «only»? «unique»? (type «,»)* («*»|«+»|ø) «]» cardinal?
        # if «only» occurs, no extra item is allowed.
        # if «unique» occurs, each array item must be different from every other.
        # if «*» occurs, the last type can be repeated from 0 to any times.
        # Every extra item must be of that type.
        # if «+» occurs, the last type can be repeated from 1 to any times.
        # Every extra item must be of that type.

conditional ::= «if» type «then» type («elif» type «then» type)* («else» type)?

int ::= /0x[0-9a-FA-F]+/ | /[0-9]+/
identifier ::= /[A-Za-z_][A-Za-z_0-9]*/


Some things that may be added in future versions:

  • on numbers:
    • ranges over floats (reusing cardinal grammar with float boundaries)
    • modulus constraints on floats number/0.25.
    • exclusive ranges in addition to inclusive ones. May use returned braces, e.g. integer{0,0x100{ as an equivalent for integer{0,0xFF}?
    • ranges alone are treated as integer ranges, i.e. {1, 5} is a shortcut for integer{1, 5}? Not sure whether it enhances readability, and there would be a need for float support in ranges then.
  • combine string constraints: regex, format, cardinals... This can already be achieved with operator &.
  • try to embedded #-comments as "$comment"
  • Implementation:
    • bubble up ? markers in grammar to the top level.
  • Syntax sugar:
    • optional marker: foobar? is equivalent to foobar|null. Not sure whether it's worth it, the difference between a missing field and a field holding null is most commonly not significant.
    • check that references as propertyNames indeed point at string types.
    • make keyword case-insensitive?
    • treat {foo: forbidden} as {foo?: forbidden} as it's the only thing that would make sense?
  • better error messages, on incorrect grammars, and on non-validating JSON data.
  • reciprocal feature: try and translate a JSON-schema into a shorter and more readable JSCN source.


From command line

$ echo -n '[integer*]' | jscn -
{ "type": "array",
  "items": {"type": "integer"},
  "$schema": ""

$ jscn --help

usage: jscn [-h] [-o OUTPUT] [-v] [--version] [filename]

Convert from a compact DSL into full JSON schema.

positional arguments:
  filename              Input file; use '-' to read from stdin.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output file; defaults to stdout
  -v, --verbose         Verbose output
  --version             Display version and exit

From Python API

Python's jsonschema_cn package exports two main constructors:

  • Schema(), which compiles a source string into a schema object;
  • Definitions(), which compiles a source string (a sequence of definitions separated by keyword and, as in rule definitions of the formal grammar.

Schema objects have a jsonschema property, which contains the Python dict of the corresponding JSON schema.

Schemas can be combined with Python operators & ("allOf") and | ("anyOf"). When they have definitions, those definition sets are merged, and definition names must not overlap.

Schemas can also be combined with definitions through |, and definitions can be combined together also with |.

>>> from jsonschema import Schema, Definitions

>>> defs = Definitions("""
>>>     id = r"[a-z]+" and
>>>     byte = integer{0,0xff}
>>> """)

>>> s = Schema("{only <id>: <byte>}") | defs
>>> s.jsonschema
ValueError: Missing definition for byte

>>> s = s | defs
>>> s.jsonschema
{"$schema": ""
  "type": "object",
  "propertyNames": {"$ref": "#/definitions/id"},
  "additionalProperties": {"$ref": "#/definitions/byte"},
  "definitions": {
    "id":   {"type": "string", "pattern": "[a-z]+"},
    "byte": {"type": "integer", "minimum": 0, "maximum": 255}

>>> Schema("[integer, boolean+]{4}").jsonschema
{ "$schema": "",
  "type": "array",
  "minItems": 4, "maxItems": 4,
  "items": [{"type": "integer"}],
  "additionalItems": {"type": "boolean"},

See also

If you spend a lot of time dealing with complex JSON data structures, you might also want to try jsview, a smarter JSON formatter, which tries to effectively use both your screen's width and height, by only inserting q carriage returns when it makes sense:

$ cat > <<EOF

{ only codes: [<byte>+], id: r"[a-z]+", issued: f"date"}
where byte = integer{0, 0xFF}

$ jscn

{"type": "object", "required": ["codes", "id", "issued"], "properties": {
"codes": {"type": "array", "items": [{"$ref": "#/definitions/byte"}], "ad
ditionalItems": {"$ref": "#/definitions/byte"}}, "id": {"type": "string",
"pattern": "[a-z]+"}, "issued": {"type": "string", "format": "date"}}, "a
dditionalProperties": false, "definitions": {"byte": {"type": "integer",
"minimum": 0, "maximum": 255}}, "$schema": "

$ cat | jscn - | jsview -

  "type": "object",
  "required": ["codes", "id", "issued"],
  "properties": {
    "codes": {
      "type": "array",
      "items": [{"$ref": "#/definitions/byte"}],
      "additionalItems": {"$ref": "#/definitions/byte"}
    "id": {"type": "string", "pattern": "[a-z]+"},
    "issued": {"type": "string", "format": "date"}
  "additionalProperties": false,
  "definitions": {"byte": {"type": "integer", "minimum": 0, "maximum": 255}},
  "$schema": ""

