Pydantic BaseModel with a stable, unique identifier of its schema and validation rules.
Project description
pydantic-identity
pydantic-identity provides a way to track the full recursive identity (schema “fingerprint”) of your Pydantic models, in 12 character hash. By storing this identifier along with your data, you can later tell whether two records (even deeply nested) were created under the same conditions: model structure, validation rules, documentation, etc.
Features
- Schema Hashing: Generate a stable hash of a model’s entire schema, recursively (includes nested models).
- Configurable Tracking: Choose whether to include things like model/field descriptions, field ordering, default values, union type ordering, relative file path, or custom data in the hash.
- Full Pydantic Compatibility:
BaseIdentityModelinherits directly frompydantic.BaseModel, and does not alter its behavior, or manipulate itsmodel_config. You can safely swappydantic.BaseModelforBaseIdentityModelanywhere you want. - Caching: Automatically caches computed hashes for performance. A hash is only computed once per class definition, and the hash is lazily computed only when it's first accessed.
Installation & Quick Start
pip install pydantic-identity
from pydantic_identity import BaseIdentityModel
class MyBaseModel(BaseIdentityModel):
"""I'm just like Pydantic BaseModel, but I can hash my schema."""
foo: str = "I'm a default value, included in the schema hash."
print(MyBaseModel.model_schema_hash_get()) # Hash is computed and cached on the class
print(MyBaseModel().model_schema_hash)
Try this for yourself. You’ll get the same 12-character MD5 prefix hash:
221da6ebbb7d
221da6ebbb7d
Or, store an auto-populated hash on every model instance. This is efficient, because the hash is cached on the class.
class BaseModelWithSchemaId(BaseIdentityModel):
"""I'm just like Pydantic BaseModel, but I store my schema hash as a field"""
schema_id: str = ""
"""The class's schema hash. Will be set automatically, if left unset."""
def model_post_init(self, _):
"""Called automatically after an instance is created."""
if not self.schema_id:
self.schema_id = self.model_schema_hash_get()
Using the new base model you just created...
class MyModelWithHash(BaseModelWithSchemaId):
x: int = 10
y: str = "Hi"
print(MyModelWithHash().model_dump())
{'schema_id': '9e19ba08013a', 'x': 10, 'y': 'Hi'}
Why Schema Hashing?
If you’re working with complex systems, microservices, or large-scale data storage, you may want to:
- Compare two models from different codebases or versions to see if they still match.
- Validate that an incoming payload (for example, from a queue or an event stream) was generated by the exact model version you expect.
- Track in a database or metadata store that “Model X was hashed with these exact fields, definitions, and docstrings,” so any changes can be quickly detected.
By hashing the full schema of your Pydantic models—and all nested submodels—pydantic-identity ensures you can confirm that two references to “the same model” are truly using the same structure.
Configuration
BaseIdentityModel offers class-level configuration variables to tune what gets included in the hash. For example:
class MyConfiguredModel(BaseIdentityModel):
# Class configuration
model_schema_hash_track_descriptions = True
model_schema_hash_track_field_order = True
model_schema_hash_track_type_order = False
model_schema_hash_tracked_extra_data = {"some": "config"}
model_schema_hash_limit_length = 16
model_schema_hash_tracked_filepath_parts = 1
model_schema_hash_track_validation_mode = True
# Model fields
a: int
b: str = "default"
Below is a high-level overview of each setting:
-
model_schema_hash_track_descriptions(bool)
Whether to track Pydantic docstrings and field descriptions in the hash. Default:False. -
model_schema_hash_track_field_order(bool)
Whether to track the ordering of fields. Default:False. -
model_schema_hash_track_type_order(bool)
Whether to track the ordering of type union arguments,Literal[...]arguments, and other type hint lists. Default:False. -
model_schema_hash_tracked_extra_data(Any)
Arbitrary JSON-serializable data to include in the hash. Example: environment variables, custom app configs, etc. Default:None. -
model_schema_hash_limit_length(int | None)
The truncated length of the resulting hash string.Nonemeans use the full length (e.g., 32 characters for MD5). Default:12. -
model_schema_hash_tracked_filepath_parts(int)
The number of path segments (from the end of the file path) to include in the model’s “full name.” Renaming files can change the hash if you track them. Default:2. -
model_schema_hash_function(Callable[[bytes], str])
The hashing function used for the schema. By default, MD5 is used. If you need a different algorithm, override this. Default: an MD5 hex wrapper. -
model_schema_hash_track_validation_mode(bool)
By default, both serialization (always) and validation modes are used to build the schema. Disabling validation mode can speed things up slightly, at the risk of ignoring potential differences between serialization and validation schema references. Default:True.
Advanced Usage
See the hash input data
To retrieve the exact input data that's being passed to the hashing function, use .model_schema_hash_get_input_data(). This returns a JSON objects as bytes.
raw_data: bytes = MyConfiguredModel.model_schema_hash_get_input_data()
import json
data = json.loads(raw_data.decode("utf-8"))
print(data)
{'name': 'test.MyConfiguredModel', 'schemas': {'ser_by_alias': {'proper
...
Extract Full Metadata
For a report on general metadata for your schema...
info = MyConfiguredModel.model_schema_identity_report()
print(info.model_dump_json(indent=2))
{
"fullname": "test.MyConfiguredModel",
"date": "2025-03-17T13:28:59.122335Z",
"hash": "9ee658c9b78c0b97",
"hash_settings": {
"track_descriptions": true,
...
Manually Rebuild the Hash
If you ever mutate a model class or for whatever reason need to clear the cache, you can force a rebuild of the hash.
MyConfiguredModel.model_schema_hash_rebuild()
Multiple Inheritance & Caching
BaseIdentityModel handles multiple inheritance well: the schema hash is cached per subclass. Each subclass has its own separate cache.
Testing
All tests are located under pydantic-identity/tests/ and use standard pytest:
pytest
(Or run them however you prefer.) The tests ensure caching works correctly, that each configuration knob is respected, and that advanced scenarios like multiple inheritance behave properly.
Contributing
Contributions, issues, and feature requests are welcome! Feel free to open an issue or submit a pull request.
- Fork the project
- Create your feature branch (
git checkout -b feature/my-new-feature) - Commit your changes (
git commit -m 'Add some feature') - Push to the branch (
git push origin feature/my-new-feature) - Open a new Pull Request
License
This project is licensed under the terms of the MIT License.
Enjoy hashing your Pydantic models with pydantic-identity!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_identity-0.0.2.tar.gz.
File metadata
- Download URL: pydantic_identity-0.0.2.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdea6adb10b0be35b895cd4648ef0780a74138c2238a59dca3458c57f2ba80b1
|
|
| MD5 |
da189e7c2bad37e79564a726531c4fc5
|
|
| BLAKE2b-256 |
20b54385d679fb494958473020b9fd459405f4b2cd817cf11c097686d0dd54c8
|
File details
Details for the file pydantic_identity-0.0.2-py2.py3-none-any.whl.
File metadata
- Download URL: pydantic_identity-0.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc5bd5258cd1d4ef19ccd81cfea71da5421add1a2a0b17c20b4aec34717dcf3b
|
|
| MD5 |
5c0463d4c85496f5fe422835ad6b9021
|
|
| BLAKE2b-256 |
80f13e0ed685e10c8cb62ad563e3caac1930958f716e21b3fd873fbef3337956
|