Data models utilized in the FABLE ecosystem's HTTP-based PPRL service.
Project description
FABLE Model
This package contains model classes that are used in the FABLE (Federated Anonymized Bloom filter Linkage Engine) ecosystem's PPRL service for validation purposes. They were developed with the intention of creating an HTTP-based service for Bloom filter-based record linkage. It includes models for the service's data transformation, masking and bit vector matching routines. Validation, serialization and deserialization are done using Pydantic. It is rare to use this package directly. Rather, it powers the functionalities of other packages.
Data models
Models for entity pre-processing, masking and bit vector matching are exposed through this package. The following examples are taken from the test suites of the FABLE PPRL service package and show additional validation steps in addition to the ones native to Pydantic.
Entity transformation
from fable_model import (
EntityTransformRequest,
TransformConfig,
EmptyValueHandling,
AttributeValueEntity,
AttributeTransformerConfig,
NumberTransformer,
GlobalTransformerConfig,
NormalizationTransformer,
CharacterFilterTransformer,
)
# This is a valid config.
_ = EntityTransformRequest(
config=TransformConfig(empty_value=EmptyValueHandling.ignore),
entities=[
AttributeValueEntity(
id="001",
attributes={
"bar1": " 12.345 ",
"bar2": " 12.345 "
}
)
],
attribute_transformers=[
AttributeTransformerConfig(
attribute_name="bar1",
transformers=[
NumberTransformer(decimal_places=2)
]
)
],
global_transformers=GlobalTransformerConfig(
before=[
NormalizationTransformer()
],
after=[
CharacterFilterTransformer(characters=".")
]
)
)
from uuid import uuid4
# Validation will fail since no transformers have been defined.
_ = EntityTransformRequest(
config=TransformConfig(empty_value=EmptyValueHandling.ignore),
entities=[
AttributeValueEntity(
id=str(uuid4()),
attributes={
"foo": "bar"
}
)
],
attribute_transformers=[]
)
# => ValidationError: attribute and global transformers are empty: must contain at least one
Entity masking
from fable_model import (
EntityMaskRequest,
MaskConfig,
HashConfig,
HashFunction,
HashAlgorithm,
DoubleHash,
CLKFilter,
AttributeValueEntity,
StaticAttributeConfig,
AttributeSalt,
CLKRBFFilter,
)
# This is a valid config.
_ = EntityMaskRequest(
config=MaskConfig(
token_size=2,
hash=HashConfig(
function=HashFunction(algorithms=[HashAlgorithm.sha1]),
strategy=DoubleHash()
),
filter=CLKFilter(filter_size=1024, hash_values=5),
padding="_"
),
entities=[
AttributeValueEntity(
id="001",
attributes={
"first_name": "John",
"last_name": "Doe",
"date_of_birth": "1987-06-05",
"gender": "m"
}
)
]
)
# This is an invalid config since salting an attribute can only be done through a fixed value
# or another attribute on an entity, not both at the same time.
_ = EntityMaskRequest(
config=MaskConfig(
token_size=2,
hash=HashConfig(
function=HashFunction(algorithms=[HashAlgorithm.sha1]),
strategy=DoubleHash()
),
filter=CLKFilter(filter_size=1024, hash_values=5),
padding="_"
),
entities=[
AttributeValueEntity(
id="001",
attributes={
"first_name": "foobar",
"salt": "0123456789"
}
)
],
attributes=[
StaticAttributeConfig(
attribute_name="first_name",
salt=AttributeSalt(
value="my_salt",
attribute="salt"
)
)
]
)
# => ValidationError: value and attribute cannot be set at the same time
# This also fails if neither a static value nor an attribute are set for salting.
_ = EntityMaskRequest(
config=MaskConfig(
token_size=2,
hash=HashConfig(
function=HashFunction(algorithms=[HashAlgorithm.sha1]),
strategy=DoubleHash()
),
filter=CLKFilter(filter_size=1024, hash_values=5),
padding="_"
),
entities=[
AttributeValueEntity(
id="001",
attributes={
"first_name": "foobar",
"salt": "0123456789"
}
)
],
attributes=[
StaticAttributeConfig(
attribute_name="first_name",
salt=AttributeSalt()
)
]
)
# => ValidationError: neither value nor attribute is set
# When using a weighted filter (RBF, CLKRBF), an error will be thrown if any attribute configuration
# provided is static, not weighted. The same applies vice versa, meaning if CLK is specified as a filter and
# weighted attribute configurations are provided.
_ = EntityMaskRequest(
config=MaskConfig(
token_size=2,
hash=HashConfig(
function=HashFunction(algorithms=[HashAlgorithm.sha1]),
strategy=DoubleHash()
),
filter=CLKRBFFilter(hash_values=5),
padding="_"
),
entities=[
AttributeValueEntity(
id="001",
attributes={
"first_name": "foobar",
"salt": "0123456789"
}
)
],
attributes=[
StaticAttributeConfig(
attribute_name="first_name",
salt=AttributeSalt(value="my_salt")
)
]
)
# => ValidationError: `clkrbf` filters require weighted attribute configurations, but static ones were found
# Weighted filters (RBF, CLKRBF) always require weighted attribute configurations. If none
# are provided, validation fails.
_ = EntityMaskRequest(
config=MaskConfig(
token_size=2,
hash=HashConfig(
function=HashFunction(algorithms=[HashAlgorithm.sha1]),
strategy=DoubleHash()
),
filter=CLKRBFFilter(hash_values=5),
padding="_"
),
entities=[
AttributeValueEntity(
id="001",
attributes={
"first_name": "foobar",
"salt": "0123456789"
}
)
]
)
# => ValidationError: `clkrbf` filters require weighted attribute configurations, but none were found
# If a configuration is provided for an attribute that doesn't exist on some entities, validation fails.
_ = EntityMaskRequest(
config=MaskConfig(
token_size=2,
hash=HashConfig(
function=HashFunction(algorithms=[HashAlgorithm.sha1]),
strategy=DoubleHash()
),
filter=CLKFilter(filter_size=1024, hash_values=5),
padding="_"
),
entities=[
AttributeValueEntity(
id="001",
attributes={
"first_name": "foobar"
}
)
],
attributes=[
StaticAttributeConfig(
attribute_name="last_name",
salt=AttributeSalt(value="my_salt")
)
]
)
# => ValidationError: some configured attributes are not present on entities: `last_name` on entities with ID `001`
Bit vector matching
from fable_model import VectorMatchRequest, MatchConfig, SimilarityMeasure, BitVectorEntity
_ = VectorMatchRequest(
config=MatchConfig(
measure=SimilarityMeasure.jaccard,
threshold=0.8
),
domain=[
BitVectorEntity(
id="D001",
value="kY7yXn+rmp8L0nyGw5NlMw=="
)
],
range=[
BitVectorEntity(
id="R001",
value="qig0C1i8YttqhPwo4VqLlg=="
)
]
)
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fable_model-0.1.7.tar.gz.
File metadata
- Download URL: fable_model-0.1.7.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39255534fc687d17aba6e91649669ccb5c7972fee74498f2617f183e3acb6be0
|
|
| MD5 |
ef5cf9f83e9328fd32600326957b532a
|
|
| BLAKE2b-256 |
7f2bf661274f20a02f8be619a200f6ec8d3eaf3016a5bba97bac4a2d43639bbc
|
Provenance
The following attestation bundles were made for fable_model-0.1.7.tar.gz:
Publisher:
publish.yml on ul-mds/fable-model
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fable_model-0.1.7.tar.gz -
Subject digest:
39255534fc687d17aba6e91649669ccb5c7972fee74498f2617f183e3acb6be0 - Sigstore transparency entry: 1719557329
- Sigstore integration time:
-
Permalink:
ul-mds/fable-model@eb34c1ad296e315ddec3199e4bebd2fcbef78ed5 -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/ul-mds
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@eb34c1ad296e315ddec3199e4bebd2fcbef78ed5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fable_model-0.1.7-py3-none-any.whl.
File metadata
- Download URL: fable_model-0.1.7-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bb8ecc3c384738058e6496583dd85ae2042bebd8a8cac282adb887d6e96f03c
|
|
| MD5 |
381aaa41ddc8f2d82bfc5699aaf5bd7f
|
|
| BLAKE2b-256 |
95177ec555413182d7a5274ec4f1cb3fd493d0d2181945370f7b66496a2554f4
|
Provenance
The following attestation bundles were made for fable_model-0.1.7-py3-none-any.whl:
Publisher:
publish.yml on ul-mds/fable-model
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fable_model-0.1.7-py3-none-any.whl -
Subject digest:
8bb8ecc3c384738058e6496583dd85ae2042bebd8a8cac282adb887d6e96f03c - Sigstore transparency entry: 1719557431
- Sigstore integration time:
-
Permalink:
ul-mds/fable-model@eb34c1ad296e315ddec3199e4bebd2fcbef78ed5 -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/ul-mds
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@eb34c1ad296e315ddec3199e4bebd2fcbef78ed5 -
Trigger Event:
push
-
Statement type: