Skip to main content

Data models utilized in the FABLE ecosystem's HTTP-based PPRL service.

Project description

PyPI Python Versions Code Coverage License

FABLE Model

This package contains model classes that are used in the FABLE (Federated Anonymized Bloom filter Linkage Engine) ecosystem's PPRL service for validation purposes. They were developed with the intention of creating an HTTP-based service for Bloom filter-based record linkage. It includes models for the service's data transformation, masking and bit vector matching routines. Validation, serialization and deserialization are done using Pydantic. It is rare to use this package directly. Rather, it powers the functionalities of other packages.

Data models

Models for entity pre-processing, masking and bit vector matching are exposed through this package. The following examples are taken from the test suites of the FABLE PPRL service package and show additional validation steps in addition to the ones native to Pydantic.

Entity transformation

from fable_model import (
    EntityTransformRequest,
    TransformConfig,
    EmptyValueHandling,
    AttributeValueEntity,
    AttributeTransformerConfig,
    NumberTransformer,
    GlobalTransformerConfig,
    NormalizationTransformer,
    CharacterFilterTransformer,
)

# This is a valid config.
_ = EntityTransformRequest(
    config=TransformConfig(empty_value=EmptyValueHandling.ignore),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "bar1": "  12.345  ",
                "bar2": "  12.345  "
            }
        )
    ],
    attribute_transformers=[
        AttributeTransformerConfig(
            attribute_name="bar1",
            transformers=[
                NumberTransformer(decimal_places=2)
            ]
        )
    ],
    global_transformers=GlobalTransformerConfig(
        before=[
            NormalizationTransformer()
        ],
        after=[
            CharacterFilterTransformer(characters=".")
        ]
    )
)

from uuid import uuid4

# Validation will fail since no transformers have been defined.
_ = EntityTransformRequest(
    config=TransformConfig(empty_value=EmptyValueHandling.ignore),
    entities=[
        AttributeValueEntity(
            id=str(uuid4()),
            attributes={
                "foo": "bar"
            }
        )
    ],
    attribute_transformers=[]
)
# => ValidationError: attribute and global transformers are empty: must contain at least one

Entity masking

from fable_model import (
    EntityMaskRequest,
    MaskConfig,
    HashConfig,
    HashFunction,
    HashAlgorithm,
    DoubleHash,
    CLKFilter,
    AttributeValueEntity,
    StaticAttributeConfig,
    AttributeSalt,
    CLKRBFFilter,
)

# This is a valid config.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "John",
                "last_name": "Doe",
                "date_of_birth": "1987-06-05",
                "gender": "m"
            }
        )
    ]
)

# This is an invalid config since salting an attribute can only be done through a fixed value
# or another attribute on an entity, not both at the same time.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="first_name",
            salt=AttributeSalt(
                value="my_salt",
                attribute="salt"
            )
        )
    ]
)
# => ValidationError: value and attribute cannot be set at the same time

# This also fails if neither a static value nor an attribute are set for salting.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="first_name",
            salt=AttributeSalt()
        )
    ]
)
# => ValidationError: neither value nor attribute is set

# When using a weighted filter (RBF, CLKRBF), an error will be thrown if any attribute configuration
# provided is static, not weighted. The same applies vice versa, meaning if CLK is specified as a filter and
# weighted attribute configurations are provided.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKRBFFilter(hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="first_name",
            salt=AttributeSalt(value="my_salt")
        )
    ]
)
# => ValidationError: `clkrbf` filters require weighted attribute configurations, but static ones were found

# Weighted filters (RBF, CLKRBF) always require weighted attribute configurations. If none
# are provided, validation fails.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKRBFFilter(hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar",
                "salt": "0123456789"
            }
        )
    ]
)
# => ValidationError: `clkrbf` filters require weighted attribute configurations, but none were found

# If a configuration is provided for an attribute that doesn't exist on some entities, validation fails.
_ = EntityMaskRequest(
    config=MaskConfig(
        token_size=2,
        hash=HashConfig(
            function=HashFunction(algorithms=[HashAlgorithm.sha1]),
            strategy=DoubleHash()
        ),
        filter=CLKFilter(filter_size=1024, hash_values=5),
        padding="_"
    ),
    entities=[
        AttributeValueEntity(
            id="001",
            attributes={
                "first_name": "foobar"
            }
        )
    ],
    attributes=[
        StaticAttributeConfig(
            attribute_name="last_name",
            salt=AttributeSalt(value="my_salt")
        )
    ]
)
# => ValidationError: some configured attributes are not present on entities: `last_name` on entities with ID `001`

Bit vector matching

from fable_model import VectorMatchRequest, MatchConfig, SimilarityMeasure, BitVectorEntity

_ = VectorMatchRequest(
    config=MatchConfig(
        measure=SimilarityMeasure.jaccard,
        threshold=0.8
    ),
    domain=[
        BitVectorEntity(
            id="D001",
            value="kY7yXn+rmp8L0nyGw5NlMw=="
        )
    ],
    range=[
        BitVectorEntity(
            id="R001",
            value="qig0C1i8YttqhPwo4VqLlg=="
        )
    ]
)

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fable_model-0.1.7.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fable_model-0.1.7-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file fable_model-0.1.7.tar.gz.

File metadata

  • Download URL: fable_model-0.1.7.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fable_model-0.1.7.tar.gz
Algorithm Hash digest
SHA256 39255534fc687d17aba6e91649669ccb5c7972fee74498f2617f183e3acb6be0
MD5 ef5cf9f83e9328fd32600326957b532a
BLAKE2b-256 7f2bf661274f20a02f8be619a200f6ec8d3eaf3016a5bba97bac4a2d43639bbc

See more details on using hashes here.

Provenance

The following attestation bundles were made for fable_model-0.1.7.tar.gz:

Publisher: publish.yml on ul-mds/fable-model

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fable_model-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: fable_model-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fable_model-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 8bb8ecc3c384738058e6496583dd85ae2042bebd8a8cac282adb887d6e96f03c
MD5 381aaa41ddc8f2d82bfc5699aaf5bd7f
BLAKE2b-256 95177ec555413182d7a5274ec4f1cb3fd493d0d2181945370f7b66496a2554f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for fable_model-0.1.7-py3-none-any.whl:

Publisher: publish.yml on ul-mds/fable-model

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page