Zink lets you safeguard privacy by detecting sensitive information and replacing it with secure, customizable placeholders.

Project description

ZINK (Zero-shot Ink)

ZINK is a Python package designed for zero-shot anonymization of entities within unstructured text data. It allows you to redact or replace sensitive information based on specified entity labels.

Update

With version >=0.4, we are moving from simple NER models to their onnx versions. I hope you enjoy the acceleration gains. The package will download the onnx version of the underlying model(s) when you update.

Description

In today's data-driven world, protecting sensitive information is paramount. ZINK provides a simple and effective solution for anonymizing text data by identifying and masking entities such as names, ages, phone numbers, medical conditions, and more. With ZINK, you can ensure data privacy while still maintaining the utility of your text data for analysis and processing.

ZINK leverages the power of zero-shot techniques, meaning it doesn't require prior training on specific datasets. You simply provide the text and the entity labels you want to anonymize, and ZINK handles the rest.

Features

Zero-shot anonymization: No training data or pre-trained models required.
Flexible entity labeling: Anonymize any type of entity by specifying custom labels.
Redaction and replacement: Choose between redacting entities (replacing them with [LABEL]_REDACTED) or replacing them with a generic placeholder.
Easy integration: Simple and intuitive API for seamless integration into your Python projects.

Installation

pip install zink

Usage

Redacting Entities

The redact function replaces identified entities with [LABEL]_REDACTED.

import zink as pss

text = "John works as a doctor and plays football after work and drives a toyota."
labels = ("person", "profession", "sport", "car")
result = pss.redact(text, labels)
print(result.anonymized_text)
Example output:

person_REDACTED works as a profession_REDACTED and plays sport_REDACTED after work and drives a car_REDACTED.

Replacing Entities

The replace function replaces identified entities with a random entity of the same type.

import zink as pss

text = "John Doe dialled his mother at 992-234-3456 and then went out for a walk."
labels = ("person", "phone number", "relationship")
result = pss.replace(text, labels)
print(result.anonymized_text)

#Possible output: Warren Buffet dialled his Uncle at 2347789287 and then went out for a walk.

Another example:

import zink as pss

text = "Patient, 33 years old, was admitted with a chest pain"
labels = ("age", "medical condition")
result = pss.replace(text, labels)
print(result.anonymized_text)
Example output:

Patient, 78 years old, was admitted with a Diabetes Mellitus.

Replacing Entities with your own data

This feature is for the scenario when you want to replace entities with your own dataset. Unlike the standard replace method, this function does not use caching and therefore accepts replacements as dictionaries directly, simplifying its use for dynamic or runtime-defined pseudonyms.

text = "Melissa works at Google and drives a Tesla."
labels = ("person", "company", "car")
custom_replacements = {
    "person": "Alice",
    "company": "OpenAI",
    "car": ("Honda", "Toyota")
    }

result = zink.replace_with_my_data(text, labels, user_replacements=custom_replacements)

print(result.anonymized_text)
# Possible Output: "Alice works at OpenAI and drives a Honda."

Under the hood:

GLiNER:

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.

NuNer:

NuNerZero is a compact, zero-shot Named Entity Recognition model that leverages the robust GLiNER architecture for efficient token classification. It requires lower-cased labels and processes inputs as a concatenation of entity types and text, enabling it to detect arbitrarily long entities. Trained on the NuNER v2.0 dataset, NuNerZero achieves impressive performance, outperforming larger models like GLiNER-large-v2.1 by over 3% in token-level F1-score. This model is ideal for both research and practical applications where a streamlined, high-accuracy NER solution is essential.

Faker

Zink now leverages the Faker library to generate realistic, synthetic replacements for sensitive information. This feature is relatively new and continues to evolve, enhancing our data masking capabilities while preserving contextual plausibility.

How Faker Is Utilized

Dynamic Data Generation: Faker is used to generate replacement values for various entity types (e.g., names, addresses, dates). For example, when a human name is detected, Faker can provide a full name or first name based on context.

Country and Location Handling:

Our tool reads a list of country names (and their synonyms) from an external file. If a location entity matches one of these names, the system selects a different country from the list to mask the sensitive geographical data.

Date Replacement:

Date-related entities (such as dates, months, and days) are delegated to a dedicated strategy. For purely numeric dates (e.g., "12/02/1975"), the tool returns a Faker-generated date. For dates with explicit alphabetic month names, custom extraction and replacement logic is applied.

Human Entity Roles:

The system differentiates between various human roles (e.g., doctor, patient, engineer) using a predefined list of human entity roles. This allows for context-aware replacement, ensuring that names are replaced appropriately according to their role in the text.

Current Status and Future Improvements

New Feature in Beta:

The Faker integration is one of our latest features, designed to deliver more natural and contextually relevant data replacements. While the current implementation covers many common cases, it is still under active development.

Testing

To run the tests, navigate to the project directory and execute:

pytest

Citation

If you are using this package for your work/research, use the below citation:

Wadhwa, D. (2025). ZINK: Zero-shot anonymization in unstructured text. (v0.2.1). Zenodo. https://doi.org/10.5281/zenodo.15035072

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues to suggest improvements or report bugs.

Fork the repository. Create a new branch: git checkout -b feature/your-feature Make your changes. Commit your changes: git commit -m 'Add your feature' Push to the branch: git push origin feature/your-feature Submit a pull request. License This project is licensed under the Apache 2.0 License.

Project details

Release history Release notifications | RSS feed

0.7.0

Dec 16, 2025

0.6.5

Dec 16, 2025

0.6.4

Dec 15, 2025

0.6.3

Nov 28, 2025

0.6.2

Nov 28, 2025

0.6.1

Jul 10, 2025

0.6.0

Jul 9, 2025

This version

0.5.1

Mar 27, 2025

0.4.0

Mar 22, 2025

0.3.0

Mar 18, 2025

0.2.1

Mar 15, 2025

0.2.0

Mar 15, 2025

0.1.2

Mar 15, 2025

0.1.1

Mar 6, 2025

0.1.0

Mar 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zink-0.5.1.tar.gz (32.1 MB view details)

Uploaded Mar 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zink-0.5.1-py3-none-any.whl (24.4 MB view details)

Uploaded Mar 27, 2025 Python 3

File details

Details for the file zink-0.5.1.tar.gz.

File metadata

Download URL: zink-0.5.1.tar.gz
Upload date: Mar 27, 2025
Size: 32.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.5

File hashes

Hashes for zink-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`440e366d4c35e0215b04fb1fd0f8cfcef4da55b20ce9ed24550e604900c9055a`
MD5	`4f92ed127f628e159cfa36e6efa3a018`
BLAKE2b-256	`8f419c0c9b37b4f91f54461e2d3826571e4b8823e5b35c6bd738d68522fc775a`

See more details on using hashes here.

File details

Details for the file zink-0.5.1-py3-none-any.whl.

File metadata

Download URL: zink-0.5.1-py3-none-any.whl
Upload date: Mar 27, 2025
Size: 24.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.5

File hashes

Hashes for zink-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c1895ca3e194a2333b47692c47bb474b3ca1c5fe3561c38e7d27b2407a96018f`
MD5	`f55f25b4fd6d60692de9a887a2ba8ba9`
BLAKE2b-256	`11851cd65096ccbfecf5b5fe66353c069c094d510e9ef98c2d066c934347b32d`

See more details on using hashes here.

zink 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ZINK (Zero-shot Ink)

Update

Description

Features

Installation

Usage

Redacting Entities

Replacing Entities

Replacing Entities with your own data

Under the hood:

GLiNER:

NuNer:

Faker

How Faker Is Utilized

Country and Location Handling:

Date Replacement:

Human Entity Roles:

Current Status and Future Improvements

New Feature in Beta:

Testing

Citation

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes