Transform JSON output from Senzing SDK for use with graph technologies, semantics, and downstream LLM integrations

These details have not been verified by PyPI

Project links

Project description

sz_semantics

Transform JSON output from the Senzing SDK for use with graph technologies, semantics, and downstream LLM integration.

Install

This library uses poetry for demos:

poetry update

Otherwise, to use the library:

pip install sz_sematics

For the gRCP server, if you don't already have Senzing and its gRPC server otherwise installed pull the latest Docker container:

docker pull senzing/serve-grpc:latest

Usage: Masking PII

Mask the PII values within Senzing JSON output with tokens which can be substituted back later. For example, mask PII values before calling a remote service (such as an LLM-based chat) then unmask returned text after the roundtrip, to maintain data privacy.

import json
from sz_semantics import Mask

data: dict = { "ENTITY_NAME": "Robert Smith" }

sz_mask: Mask = Mask()
masked_data: dict = sz_mask.mask_data(data)

masked_text: str = json.dumps(masked_data)
print(masked_text)

unmasked: str = sz_mask.unmask_text(masked_text)
print(unmasked)

For an example, run the demo1.py script with a data file which captures Senzing JSON output:

poetry run python3 demo1.py data/get.json

The two lists Mask.KNOWN_KEYS and Mask.MASKED_KEYS enumerate respectively the:

keys for known elements which do not require masking
keys for PII elements which require masking

Any other keys encountered will be masked by default and reported as warnings in the logging. Adjust these lists as needed for a given use case.

For work with large numbers of entities, subclass KeyValueStore to provide a distributed key/value store (other than the Python built-in dict default) to use for scale-out.

Usage: gRPC Client/Server

To use SzClient to simplify access to the Senzing SDK, first launch the serve-grpc container and run it in the background:

docker run -it --publish 8261:8261 --rm senzing/serve-grpc

For example code which runs entity resolution on the "truthset" collection of datasets:

import pathlib
import tomllib
from sz_semantics import SzClient

with open(pathlib.Path("config.toml"), mode = "rb") as fp:
    config: dict = tomllib.load(fp)

data_sources: typing.Dict[ str, str ] = {
    "CUSTOMERS": "data/truth/customers.json",
    "WATCHLIST": "data/truth/watchlist.json",
    "REFERENCE": "data/truth/reference.json",
}

sz: SzClient = SzClient(config, data_sources)
sz.entity_resolution(data_sources)

for ent_json in sz.sz_engine.export_json_entity_report_iterator():
    print(ent_json)

For a demo of running entity resolution on the "truthset", run the demo2.py script:

poetry run python3 demo2.py

This produces the export.json file which is JSONL representing the results of a "get entity" call on each resolved entity.

Note: to show the redo processing, be sure to restart the container each time before re-running the demo2.py script -- although the entity resolution results will be the same even without a container restart.

Usage: Semantic Represenation

Starting with a small SKOS-based taxonomy in the domain.ttl file, parse the Senzing entity resolution (ER) results to generate an RDFlib semantic graph.

In other words, generate the "backbone" for constructing an Entity Resolved Knowledge Graph, as a core componet of a semantic layer.

The example code below serializes the thesaurus generated from Senzing ER results as "thesaurus.ttl" combined with the Senzing taxonomy definitions, which can be used for constructing knowledge graphs:

import pathlib
from sz_semantics import Thesaurus

thesaurus: Thesaurus = Thesaurus()
thesaurus.load_source(Thesaurus.DOMAIN_TTL)

export_path: pathlib.Path = pathlib.Path("data/truth/export.json")

with open(export_path, "r", encoding = "utf-8") as fp_json:
    for line in fp_json:
        for rdf_frag in thesaurus.parse_iter(line, language = "en"):
            thesaurus.load_source_text(
                Thesaurus.RDF_PREAMBLE + rdf_frag,
                format = "turtle",
            )

thesaurus_path: pathlib.Path = pathlib.Path("thesaurus.ttl")
thesaurus.save_source(thesaurus_path, format = "turtle")

For an example, run the demo3.py script to process the JSON file data/truth/export.json which captures Senzing ER exported results:

poetry run python3 demo3.py data/truth/export.json

Check the resulting RDF definitions in the generated thesaurus.ttl file.

License and Copyright

Source code for sz_semantics plus any logo, documentation, and examples have an MIT license which is succinct and simplifies use in commercial applications.

Kudos to @brianmacy, @jbutcher21, @docktermj, @cj2001, @jesstalisman-ia, and the kind folks at GraphGeeks for their support.

Star History

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.3

Nov 11, 2025

1.3.1

Nov 6, 2025

1.2.3

Nov 1, 2025

1.2.2

Oct 26, 2025

1.2.1

Oct 26, 2025

1.2.0

Oct 24, 2025

1.1.0

Sep 28, 2025

1.0.2

Sep 22, 2025

1.0.1

Sep 22, 2025

1.0.0

Sep 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sz_semantics-1.3.3.tar.gz (13.3 kB view details)

Uploaded Nov 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sz_semantics-1.3.3-py3-none-any.whl (13.9 kB view details)

Uploaded Nov 11, 2025 Python 3

File details

Details for the file sz_semantics-1.3.3.tar.gz.

File metadata

Download URL: sz_semantics-1.3.3.tar.gz
Upload date: Nov 11, 2025
Size: 13.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for sz_semantics-1.3.3.tar.gz
Algorithm	Hash digest
SHA256	`c1e628e2e9eb0d4e6042ab3d150ed64b7ef594d365ca3486ba060505d87b6419`
MD5	`d8bcef42a442ada1256792a2e85b01d5`
BLAKE2b-256	`70130d10a64b5ea9f310dcaea0af195f6cbea1f9fed0b825235a540b118547ce`

See more details on using hashes here.

File details

Details for the file sz_semantics-1.3.3-py3-none-any.whl.

File metadata

Download URL: sz_semantics-1.3.3-py3-none-any.whl
Upload date: Nov 11, 2025
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for sz_semantics-1.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3878150635075a8ff483db938224dd7e010ac4d4cdfc3471caa4a4c1dfaf7c61`
MD5	`f596ac18e0bbd64a3a92f0927752c8ad`
BLAKE2b-256	`53826578fc99d0e9b34eefa1ab54b2a57a2bc10b8c57d59a2bde7818ba275dcc`

See more details on using hashes here.

sz_semantics 1.3.3

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sz_semantics

Install

Usage: Masking PII

Usage: gRPC Client/Server

Usage: Semantic Represenation

Star History

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes