Data specification and normalization toolkit
Project description
Data Spec
What are Specs?
Specs are declarative data specifications written in pure Python code. Specs can be
created using the Spec utility function s
. Specs provide two useful and related
functions. The first is to evaluate whether an arbitrary data structure satisfies
the specification. The second function is to conform (or normalize) valid data
structures into a canonical format.
The simplest Specs are based on common predicate functions, such as
lambda x: isinstance(x, str)
which asks "Is the object x an instance of str
?".
Fortunately, Specs are not limited to being created from single predicates. Specs can
also be created from groups of predicates, composed in a variety of useful ways, and
even defined for complex data structures. Because Specs are ultimately backed by
pure Python code, any question that you can answer about your data in code can be
encoded in a Spec.
How to Use
To begin using the spec
library, you can simply import the s
object:
from dataspec import s
Nearly all of the useful functionality in spec
is packed into s
.
Spec API
s
is a generic Spec constructor, which can be called to generate new Specs from
a variety of sources:
- Enumeration specs:
- Using a Python
set
orfrozenset
:s({"a", "b", ...})
, or - Using a Python
Enum
likeState
,s(State)
.
- Using a Python
- Collection specs:
- Using a Python
list
:s([State])
- Using a Python
- Mapping type specs:
- Using a Python
dict
:s({"name": s.is_str})
- Using a Python
- Tuple type specs:
- Using a Python
tuple
:s((s.is_str, s.is_num))
- Using a Python
- Specs based on:
- Using a standard Python predicate:
s(lambda x: x > 0)
- Using a Python function yielding
ErrorDetails
- Using a standard Python predicate:
Specs are designed to be composed, so each of the above spec types can serve as the
base for more complex data definitions. For collection, mapping, and tuple type Specs,
Specs will be recursively created for child elements if they are types understood
by s
.
Specs may also optionally be created with "tags", which are just string names provided
in ErrorDetails
objects emitted by Spec instance validate
methods. Specs are
required to have tags and all builtin Spec factories will supply a default tag if
one is not given.
The s
API also includes several Spec factories for common Python types such as
bool
, bytes
, date
, datetime
(via s.inst
), float
(via s.num
), int
(via s.num
), str
, time
, and uuid
.
s
also includes several pre-built Specs for basic types which are useful if you
only want to verify that a value is of a specific type. All the pre-built Specs
are supplied as s.is_{type}
on s
.
All Specs provide the following API:
Spec.is_valid(x)
returns abool
indicating ifx
is valid according to the Spec definitionSpec.validate(x)
yields consecutiveErrorDetails
describing every spec violation forx
. By definition, ifnext(Spec.validate(x))
returns an empty generator, thenx
satisfies the Spec.Spec.validate_ex(x)
throws aValidationError
containing the full list ofErrorDetails
of errors occurred validatingx
if any errors are encountered. Otherwise, returnsNone
.Spec.conform(x)
attempts to conformx
according to the Spec conformer iffx
is valid according to the Spec. Otherwise returnsINVALID
.Spec.conform_valid(x)
conformsx
using the Spec conformer, without checking first ifx
is valid. Useful if you wish to check your data for validity and conform it in separate steps without incurring validation costs twice.Spec.with_conformer(c)
returns a new Spec instance with the Conformerc
. The old Spec instance is not modified.Spec.with_tag(t)
returns a new Spec instance with the Tagt
. The old Spec instance is not modified.
Scalar Specs
The simplest data specs are those which evaluate Python's builtin scalar types: strings, integers, floats, and booleans.
You can create a spec which validates strings with s.str()
. Common string
validations can be specified as keyword arguments, such as the min/max length or a
matching regex. If you are only interested in validating that a value is a string
without any further validations, spec features the predefined spec s.is_str
(note
no function call required).
Likewise, numeric specs can be created using s.num()
, with several builtin
validations available as keyword arguments such as min/max value and narrowing down
the specific numeric types. If you are only interested in validating that a value is
numeric, you can use the builtin s.is_num
or s.is_int
or s.is_float
specs.
Predicate Specs
You can define a spec using any simple predicate you may have by passing the predicate
directly to the s
function, since not every valid state of your data can be specified
using existing specs.
spec = s(lambda id_: uuid.UUID(id_).version == 4)
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b") # True
spec.is_valid("b4e9735a-ee8c-11e9-8708-4c327592fea9") # False
UUID Specs
In the previous section, we used a simple predicate to check that a UUID was a certain
version of an RFC 4122 variant UUID. However, spec
includes builtin UUID specs which
can simplify the logic here:
spec = s.uuid(versions={4})
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b") # True
spec.is_valid("b4e9735a-ee8c-11e9-8708-4c327592fea9") # False
Additionally, if you are only interested in validating that a value is a UUID, the
builting spec s.is_uuid
is available.
Date Specs
spec
includes some builtin Specs for Python's datetime
, date
, and time
classes.
With the builtin specs, you can validate that any of these three class types are before
or after a given. Suppose you want to verify that someone is 18 by checking their date
of birth:
spec = s.date(after=date.today() - timedelta(years=18))
spec.is_valid(date.today() - timedelta(years=21)) # True
spec.is_valid(date.today() - timedelta(years=12)) # False
For datetimes (instants) and times, you can also use is_aware=True
to specify that
the instance be timezone-aware (e.g. not naive).
You can use the builtins s.is_date
, s.is_inst
, and s.is_time
if you only want to
validate that a value is an instance of any of those classes.
Set (Enum) Specs
Commonly, you may be interested in validating that a value is one of a constrained set
of known values. In Python code, you would use an Enum
type to model these values.
To define an enumermation spec, you can use either pass an existing Enum
value into
your spec:
class YesNo(Enum):
YES = "Yes"
NO = "No"
s(YesNo).is_valid("Yes") # True
s(YesNo).is_valid("Maybe") # False
Any valid representation of the Enum
value would satisfy the spec, including the
value, alias, and actual Enum
value (like YesNo.NO
).
Additionally, for simpler cases you can specify an enum using Python set
s (or
frozenset
s):
s({"Yes", "No"}).is_valid("Yes") # True
s({"Yes", "No"}).is_valid("Maybe") # False
Collection Specs
Specs can be defined for values in homogenous collections as well. Define a spec for
a homogenous collection as a list passed to s
with the first element as the Spec
for collection elements:
s([s.num(min_=0)]).is_valid([1, 2, 3, 4]) # True
s([s.num(min_=0)]).is_valid([-11, 2, 3]) # False
You may also want to assert certain conditions that apply to the collection as a whole. Spec allows you to specify an optional dictionary as the second element of the list with a few possible rules applying to the collection as a whole, such as length and collection type.
s([s.num(min_=0), {"kind": list}]).is_valid([1, 2, 3, 4]) # True
s([s.num(min_=0), {"kind": list}]).is_valid({1, 2, 3, 4}) # False
Collection specs conform input collections by applying the element conformer(s) to each
element of the input collection. Callers can specify an "into"
key in the collection
options dictionary as part of the spec to specify which type of collection is emitted
by the collection spec default conformer. Collection specs which do not specify the
"into"
collection type will conform collections into the same type as the input
collection.
Tuple Specs
Specs can be defined for heterogenous collections of elements, which is often the use
case for Python's tuple
type. To define a spec for a tuple, pass a tuple of specs for
each element in the collection at the corresponding tuple index:
s(
(
s.str("id", format_="uuid"),
s.str("first_name"),
s.str("last_name"),
s.str("date_of_birth", format_="iso-date"),
s("gender", {"M", "F"}),
)
)
Tuple specs conform input tuples by applying each field's conformer(s) to the fields of
the input tuple to return a new tuple. If each field in the tuple spec has a unique tag
and the tuple has a custom tag specified, the default conformer will yield a
namedtuple
with the tuple spec tag as the type name and the field spec tags as each
field name. The type name and field names will be munged to be valid Python
identifiers.
Mapping Specs
Specs can be defined for mapping/associative types and objects. To define a spec for a
mapping type, pass a dictionary of specs to s
. The keys should be the expected key
value (most often a string) and the value should be the spec for values located in that
key. If a mapping spec contains a key, the spec considers that key required. To
specify an optional key in the spec, wrap the key in s.opt
. Optional keys will
be validated if they are present, but allow the map to exclude those keys without
being considered invalid.
s(
{
"id": s.str("id", format_="uuid"),
"first_name": s.str("first_name"),
"last_name": s.str("last_name"),
"date_of_birth": s.str("date_of_birth", format_="iso-date"),
"gender": s("gender", {"M", "F"}),
s.opt("state"): s("state", {"CA", "GA", "NY"}),
}
)
Above the key "state"
is optional in tested values, but if it is provided it must
be one of "CA"
, "GA"
, or "NY"
.
Note: Mapping specs do not validate that input values only contain the expected set of keys. Extra keys will be ignored. This is intentional behavior.
Mapping specs conform input dictionaries by applying each field's conformer(s) to the fields of the input map to return a new dictionary. As a consequence, the value returned by the mapping spec default conformer will not include any extra keys included in the input. Optional keys will be included in the conformed value if they appear in the input map.
Combination Specs
In most of the previous examples, we used basic builtin Specs. However, real world
data often more nuanced specifications for data. Fortunately, Specs were designed
to be composed. In particular, Specs can be composed using standard boolean logic.
To specify an or
spec, you can use s.any(...)
with any n
specs.
spec = s.any(s.str(format_="uuid"), s.str(maxlength=0))
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b") # True
spec.is_valid("") # True
spec.is_valid("3837273723") # False
Similarly, to specify an and
spec, you can use s.all(...)
with any n
specs:
spec = s.all(s.str(format_="uuid"), s(lambda id_: uuid.UUID(id_).version == 4))
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b") # True
spec.is_valid("b4e9735a-ee8c-11e9-8708-4c327592fea9") # False
and
Specs apply each child Spec's conformer to the value during validation,
so you may assume the output of the previous Spec's conformer in subsequent
Specs.
Examples
Suppose you'd like to define a Spec for validating that a string is at least 10 characters long (ignore encoding nuances), you could define that as follows:
spec = s.str(minlength=10)
spec.is_valid("a string") # False
spec.is_valid("London, England") # True
Or perhaps you'd like to check that every number in a list is above a certain value:
spec = s([s.num(min_=70), {"kind": list}])
spec.is_valid([70, 83, 92, 99]) # True
spec.is_valid({70, 83, 92, 99}) # False, as the input collection is a set
spec.is_valid([43, 66, 80, 93]) # False, not all numbers above 70
A more realistic case for a Spec is validating incoming data at the application boundaries. Suppose you're accepting a user profile submission as a JSON object over an HTTP endpoint, you could validate the data like so:
spec = s(
"user-profile",
{
"id": s.str("id", format_="uuid"),
"first_name": s.str("first_name"),
"last_name": s.str("last_name"),
"date_of_birth": s.str("date_of_birth", format_="iso-date"),
"gender": s("gender", {"M", "F"}),
s.opt("state"): s.str(minlength=2, maxlength=2),
}
)
spec.is_valid( # True
{
"id": "e1bc9fb2-a4d3-4683-bfef-3acc61b0edcc",
"first_name": "Carl",
"last_name": "Sagan",
"date_of_birth": "1996-12-20",
"gender": "M",
"state": "CA",
}
)
spec.is_valid( # True; note that extra keys _are ignored_
{
"id": "958e2f55-5fdf-4b84-a522-a0765299ba4b",
"first_name": "Marie",
"last_name": "Curie",
"date_of_birth": "1867-11-07",
"gender": "F",
"occupation": "Chemist",
}
)
spec.is_valid( # False; missing "gender" key
{
"id": "958e2f55-5fdf-4b84-a522-a0765299ba4b",
"first_name": "Marie",
"last_name": "Curie",
"date_of_birth": "1867-11-07",
}
)
Concepts
Predicates
Predicates are functions of one argument which return a boolean. Predicates answer
questions such as "is x
an instance of str
?" or "is n
greater than 0
?".
Frequently in Python, predicates are simply expressions used in an if
statement.
In functional programming languages (and particularly in Lisps), it is more common
to encode these predicates in functions which can be combined using lambdas or
partials to be reused. Spec encourages that functional paradigm and benefits
directly from it.
Predicate functions should satisfy the PredicateFn
type and can be wrapped in the
PredicateSpec
spec type.
Validators
Validators are like predicates in that they answer the same fundamental questions about
data that predicates do. However, Validators are a Spec concept that allow us to
retrieve richer error data from Spec failures than we can natively with a simple
predicate. Validators are functions of one argument which return 0 or more ErrorDetails
instances (typically yield
ed as a generator) describing the error.
Validator functions should satisfy the ValidatorFn
type and can be wrapped in the
ValidatorSpec
spec type.
Conformers
Conformers are functions of one argument, x
, that return either a conformed value,
which may be x
itself, a new value based on x
, or the special Spec value
INVALID
if the value cannot be conformed.
All specs may include conformers. Scalar spec types such as PredicateSpec
and
ValidatorSpec
simply return their argument if it satisfies the spec. Specs for
more complex data structures supply a default conformer which produce new data
structures after applying any child conformation functions to the data structure
elements.
Tags
All Specs can be created with optional tags, specified as a string in the first positional argument of any spec creation function. Tags are useful for providing useful names for specs in debugging and validation messages.
TODOs
- in dict specs, default child spec tag from corresponding dictionary key
- break out conformers into separate object? main value would be to propogate
.conform_valid()
calls all the way through; currently they don't propogate past collection, dict, and tuple specs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dataspec-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 897c2974ca2146690160ee51bff7f08c57630fec4820e9ead0c295889fe33705 |
|
MD5 | aed289706ca5b7ce827cc7b2b62fea0f |
|
BLAKE2b-256 | 528896787535011b8375ecdf5f00e231bdaf01b4407ba17abdd5890f8901300f |