Skip to main content

Pydantic BaseModel with a stable, unique identifier of its schema and validation rules.

Project description

pydantic-identity

PyPI Tests License Black Pyright

pydantic-identity provides a way to track the full recursive identity (schema “fingerprint”) of your Pydantic models, in 12 character hash. By storing this identifier along with your data, you can later tell whether two records (even deeply nested) were created under the same conditions: model structure, validation rules, documentation, etc.

Features

  • Schema Hashing: Generate a stable hash of a model’s entire schema, recursively (includes nested models).
  • Configurable Tracking: Choose whether to include things like model/field descriptions, field ordering, default values, union type ordering, relative file path, or custom data in the hash.
  • Full Pydantic Compatibility: BaseIdentityModel inherits directly from pydantic.BaseModel, and does not alter its behavior, or manipulate its model_config. You can safely swap pydantic.BaseModel for BaseIdentityModel anywhere you want.
  • Caching: Automatically caches computed hashes for performance. A hash is only computed once per class definition, and the hash is lazily computed only when it's first accessed.

Installation & Quick Start

pip install pydantic-identity
from pydantic_identity import BaseIdentityModel

class MyBaseModel(BaseIdentityModel):
    """I'm just like Pydantic BaseModel, but I can hash my schema."""

    foo: str = "I'm a default value, included in the schema hash."


print(MyBaseModel.model_schema_hash_get())  # Hash is computed and cached on the class
print(MyBaseModel().model_schema_hash)

Try this for yourself. You’ll get the same 12-character MD5 prefix hash:

221da6ebbb7d
221da6ebbb7d

Or, store an auto-populated hash on every model instance. This is efficient, because the hash is cached on the class.

class BaseModelWithSchemaId(BaseIdentityModel):
    """I'm just like Pydantic BaseModel, but I store my schema hash as a field"""

    schema_id: str = ""
    """The class's schema hash. Will be set automatically, if left unset."""

    def model_post_init(self, _):
        """Called automatically after an instance is created."""
        if not self.schema_id:
            self.schema_id = self.model_schema_hash_get()

Using the new base model you just created...

class MyModelWithHash(BaseModelWithSchemaId):
    x: int = 10
    y: str = "Hi"

print(MyModelWithHash().model_dump())
{'schema_id': '9e19ba08013a', 'x': 10, 'y': 'Hi'}

Why Schema Hashing?

If you’re working with complex systems, microservices, or large-scale data storage, you may want to:

  • Compare two models from different codebases or versions to see if they still match.
  • Validate that an incoming payload (for example, from a queue or an event stream) was generated by the exact model version you expect.
  • Track in a database or metadata store that “Model X was hashed with these exact fields, definitions, and docstrings,” so any changes can be quickly detected.

By hashing the full schema of your Pydantic models—and all nested submodels—pydantic-identity ensures you can confirm that two references to “the same model” are truly using the same structure.

Configuration

BaseIdentityModel offers class-level configuration variables to tune what gets included in the hash. For example:

class MyConfiguredModel(BaseIdentityModel):
    # Class configuration
    model_schema_hash_track_descriptions = True
    model_schema_hash_track_field_order = True
    model_schema_hash_track_type_order = False
    model_schema_hash_tracked_extra_data = {"some": "config"}
    model_schema_hash_limit_length = 16
    model_schema_hash_tracked_filepath_parts = 1
    model_schema_hash_track_validation_mode = True
    # Model fields
    a: int
    b: str = "default"

Below is a high-level overview of each setting:

  • model_schema_hash_track_descriptions (bool)
    Whether to track Pydantic docstrings and field descriptions in the hash. Default: False.

  • model_schema_hash_track_field_order (bool)
    Whether to track the ordering of fields. Default: False.

  • model_schema_hash_track_type_order (bool)
    Whether to track the ordering of type union arguments, Literal[...] arguments, and other type hint lists. Default: False.

  • model_schema_hash_tracked_extra_data (Any)
    Arbitrary JSON-serializable data to include in the hash. Example: environment variables, custom app configs, etc. Default: None.

  • model_schema_hash_limit_length (int | None)
    The truncated length of the resulting hash string. None means use the full length (e.g., 32 characters for MD5). Default: 12.

  • model_schema_hash_tracked_filepath_parts (int)
    The number of path segments (from the end of the file path) to include in the model’s “full name.” Renaming files can change the hash if you track them. Default: 2.

  • model_schema_hash_function (Callable[[bytes], str])
    The hashing function used for the schema. By default, MD5 is used. If you need a different algorithm, override this. Default: an MD5 hex wrapper.

  • model_schema_hash_track_validation_mode (bool)
    By default, both serialization (always) and validation modes are used to build the schema. Disabling validation mode can speed things up slightly, at the risk of ignoring potential differences between serialization and validation schema references. Default: True.


Advanced Usage

See the hash input data

To retrieve the exact input data that's being passed to the hashing function, use .model_schema_hash_get_input_data(). This returns a JSON objects as bytes.

raw_data: bytes = MyConfiguredModel.model_schema_hash_get_input_data()
import json
data = json.loads(raw_data.decode("utf-8"))
print(data)
{'name': 'test.MyConfiguredModel', 'schemas': {'ser_by_alias': {'proper
...

Extract Full Metadata

For a report on general metadata for your schema...

info = MyConfiguredModel.model_schema_identity_report()
print(info.model_dump_json(indent=2))
{
  "fullname": "test.MyConfiguredModel",
  "date": "2025-03-17T13:28:59.122335Z",
  "hash": "9ee658c9b78c0b97",
  "hash_settings": {
    "track_descriptions": true,
    ...

Manually Rebuild the Hash

If you ever mutate a model class or for whatever reason need to clear the cache, you can force a rebuild of the hash.

MyConfiguredModel.model_schema_hash_rebuild()

Multiple Inheritance & Caching

BaseIdentityModel handles multiple inheritance well: the schema hash is cached per subclass. Each subclass has its own separate cache.


Testing

All tests are located under pydantic-identity/tests/ and use standard pytest:

pytest

(Or run them however you prefer.) The tests ensure caching works correctly, that each configuration knob is respected, and that advanced scenarios like multiple inheritance behave properly.


Contributing

Contributions, issues, and feature requests are welcome! Feel free to open an issue or submit a pull request.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/my-new-feature)
  3. Commit your changes (git commit -m 'Add some feature')
  4. Push to the branch (git push origin feature/my-new-feature)
  5. Open a new Pull Request

License

This project is licensed under the terms of the MIT License.

Enjoy hashing your Pydantic models with pydantic-identity!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_identity-0.0.2.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_identity-0.0.2-py2.py3-none-any.whl (10.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pydantic_identity-0.0.2.tar.gz.

File metadata

  • Download URL: pydantic_identity-0.0.2.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pydantic_identity-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bdea6adb10b0be35b895cd4648ef0780a74138c2238a59dca3458c57f2ba80b1
MD5 da189e7c2bad37e79564a726531c4fc5
BLAKE2b-256 20b54385d679fb494958473020b9fd459405f4b2cd817cf11c097686d0dd54c8

See more details on using hashes here.

File details

Details for the file pydantic_identity-0.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pydantic_identity-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bc5bd5258cd1d4ef19ccd81cfea71da5421add1a2a0b17c20b4aec34717dcf3b
MD5 5c0463d4c85496f5fe422835ad6b9021
BLAKE2b-256 80f13e0ed685e10c8cb62ad563e3caac1930958f716e21b3fd873fbef3337956

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page