Helper to create a compatibility layer between inputs in different formats and other parts of application.
Project description
Schema Overseer – Local
This is a local version of Schema Overseer, intended to use in a single repository.
For the multi-repository service see schema-overseer-service.
Schema Overseer ensures strict adherence to defined data formats and raises an exception in case of attempting to process unsupported input schema.
In more technical terms, it is an adapter[^1] between inputs with different schemas and other application components.
Why is it important?
- Data formats evolve over time
- Developers need to simultaneously support both legacy and new data formats
- Mismatches between input data format and the corresponding code can lead to unexpected and hard-to-debug runtime errors
- As the number of supported data formats increases, application code often becomes less maintainable
Features
- Straightforward extensibility
- Static analysis checks via type checking
- Detailed runtime checks
- Incoming data validation with pydantic
Use Cases and Tutorials
Installation
pip install schema-overseer-local
Quick Start
-
Create a file
adapter.py
to define the adapter logic.
For quick start we will use single file, but in a real application it's better to use multiple files. -
Output. Define the output schema you plan to work with.
The output schema could be any object. For the tutorial purpose we will usedataclass
. The output schema attributes could be any python objects, including non-serializables. Output could have the same behavior as the original input object, or a completely different one. Here is example of different behavior.@dataclass class Output: value: int function: Callable
-
Registry. Create the
SchemaRegistry
instance forOutput
.schema_registry = SchemaRegistry(Output)
-
Input schemas. Define the input schemas using pydantic and register them in
schema_registry
.@schema_registry.add_schema class OldInputFormat(BaseModel): value: str @schema_registry.add_schema class NewInputFormat(BaseModel): renamed_value: int
-
Builders. Implement functions to convert each registered input to
Output
.
Builders require type hinting to link input formats andOutput
.@schema_registry.add_builder def old_builder(data: OldInputFormat) -> Output: return Output( value=data.value, function=my_function, ) @schema_registry.add_builder def new_builder(data: NewInputFormat) -> Output: return Output( value=data.renamed_value, function=my_other_function, )
-
Finally, use
schema_registry
inside the application to get validated output or handle the exception.schema_registry.setup() # see "Discovery" chapter in documentation def my_service(raw_data: dict[str, Any]): try: output = schema_registry.build(source_dict=raw_data) # build output object except BuildError as error: raise MyApplicationError() from error # handle the exception # use output object output.function() return output.value
Full quickstart example is here
Run it:
git clone git@github.com:Schema-Overseer/schema-overseer-local.git
cd schema-overseer-local
poetry install
poetry run python -m tutorial.quickstart.app
Usage
Using multiple Python files
While you can define registry, models and builders in one or two files, it is usually a better idea to split them into different files, i.e., Python modules.
There are different ways to do the file structure, we recommend one of the following:
- Minimal — start with this one, when you are still figuring out the best way to work
- Expanded builders — useful for the case with lots of code for each builder
- Detached output — useful for the case, when the output is a big or complex entity
Minimal
|
Expanded builders
|
Detached output
|
Models (i.e., input data formats) are decoupled first for two reasons:
- If models contain inner models inside, it would be harder to distinguish between inner models for different root models. (see Q)
- If you transition to schema-overseer-service, the models are sourced from the outside of your code, so this split will come naturally.
Load modules automatically
[!NOTE] Python will not load modules automatically, unless they are explicitly imported.
SchemaRegistry
has adiscovery_paths: Sequence[str]
argument to load all required models.
Specified modules and packages will be loaded atSchemaRegistry.setup()
.
Definition (SchemaRegistry(...)
) is decoupled with loading (SchemaRegistry.setup()
) to prevent cycle imports, that's why calling setup()
is required.
Argument discovery_paths
takes a sequence of strings in the absolute import format. Entries could be either python modules (single files) or python packages (folder with __init__.py
and other *.py
files inside)
For example, this will work for minimal option, mentioned above:
schema_registry = SchemaRegistry(
Output,
discovery_paths=[
'example_project.payload.models', # loaded as package
'example_project.payload.builders', # loaded as module
],
)
Runtime safety and strict self-checks
In addition to static type hint checks, schema-overseer-local
performs runtime checks to ensure:
- Each registered model has only one corresponding builder.
- All builders have a proper call signature, which includes:
- One argument for the input data
- No additional non-default arguments
- All builders have proper type hints
Additional runtime checks:
- If set to
validate_output=True
(the default isFalse
), it verifies whether the builder returns an object of the annotated type using pydantic. - By default,
schema-overseer-local
selects the builder from the first valid schema. However, ifcheck_for_single_valid_schema=True
is enabled, it ensures only one schema is valid for the input data.
If multiple schemas are found to be valid, aMultipleValidSchemasError
will be raised.
Object as a source
SchemaRegistry.build()
method operates in two modes:
- Using dict-like objects as inputs and extracting fields with
__getitem__
Usebuild(source_dict=...)
for this option - Using objects with data as attributes and extracting fields with
getattr
Usebuild(source_object=...)
for this option
source_dict
and source_object
are mutually exclusive.
Use one of the input schema as output
TODO
FAQ
Q: Why is this project exists? Isn't it too much overhead for such a simple task?
A: It depends on the scale of the different formats you need to support. In case of a few formats to support, schema-overseer-local
would be an overhead indeed. But in the projects with lots of different formats, such extensive adapter layer could be helpful. Another goal of schema-overseer-local
is to serve as a fast and simple introduction to schema-overseer-service
for sophisticated use cases with multiple teams and repositories to work with.
Q: How is this project better than an adapter I can code quickly myself?
A: schema-overseer-local
has three important benefits:
- it provides type checking;
- it has very detailed runtime checks;
- and it is easily extensible.
Q: Why I have to use type hinting in builders?
A: schema-overseer-local
uses the same pattern as pydantic
and FastAPI
for input and output validation in both runtime and static analysis. It provides an extra layer of defense against code errors. Even if your code is not entirely correctly typed or not checked with static analysis tools like mypy, the data is still validated.
Q: Should I re-use inner pydantic models in different data formats?
Code example
class InnerModel(BaseModel):
value: int
class InputFormatV1(BaseModel):
inner: InnerModel
class InnerModelV2(BaseModel):
value: int
class InputFormatV2(BaseModel):
inner: InnerModelV2 # or re-use InnerModel?
A: Not really. While it might be tempting to adhere to the DRY[^2] principle in this context, it's generally a better approach to fully separate nested pydantic models into distinct modules, avoiding their reuse even if they are identical.
The primary rationale is future code maintainability: tracking modifications in reused models can be challenging, and the introduction of a new format version could require changes to the inner model, which would then demand separation regardless.
[^1]: Adapter pattern [^2]: Don't repeat yourself
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for schema_overseer_local-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 832f1c76d8aba7c42f78fb33fb61167456b809a142c91acd1284a998b7ba469e |
|
MD5 | 421cb629561a6cd6a70d19621c5298a2 |
|
BLAKE2b-256 | 848a2f71fad64f8304f388d820bc76fbfa4bf52d6c854439d93864d5fd0f78a4 |
Hashes for schema_overseer_local-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 078a8a65619d9e450e2be86f399ac41012fceefcb8720c35a7b51df98430f9fd |
|
MD5 | 7e150817f9dcd1a69f1b9166da8c1d28 |
|
BLAKE2b-256 | 0fadebb62a11dba1dabe357a6b65c22cd202e9a08c0c723c84544d86b16d847c |