NeMo Anonymizer

Project description

NeMo Anonymizer

Detect and replace sensitive entities in text using LLM-powered workflows.

What can you do with Anonymizer?

Detect entities using GLiNER-PII and LLM-based augmentation and validation
Replace with 4 strategies — LLM-generated substitute, redact, annotate, or hash (deterministic, local)
Preview results before full runs with display_record() visualization

Quick Start

1. Install

git clone https://github.com/NVIDIA-NeMo/Anonymizer.git
cd Anonymizer
make install

2. Set up model providers

By default, Anonymizer uses models hosted on build.nvidia.com — GLiNER-PII for entity detection and a text LLM for augmentation/validation. You can also bring your own models via custom provider configs.

Use the default build.nvidia.com setup as a convenient way to experiment with Anonymizer and iterate on small samples. For privacy-sensitive or production data, point Anonymizer at a secure endpoint you trust and to which you are comfortable sending data. Request and token rate limits on build.nvidia.com vary by account and model access, and lower-volume development access can be slow for full-dataset runs.

export NVIDIA_API_KEY="your-nvidia-api-key"

3. Anonymize text

CLI

Tip: All examples below use uv run to invoke commands. If you prefer, activate the venv with source .venv/bin/activate and run commands directly.

DATA_URL="https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/NVIDIA_synthetic_biographies.csv"

# Preview on a small sample
uv run anonymizer preview --source $DATA_URL --text-column biography --replace redact --num_records 3

# Full run with output file
uv run anonymizer run --source $DATA_URL --text-column biography --replace redact --output result.csv 

# Validate config without running
uv run anonymizer validate --source $DATA_URL --text-column biography --replace hash

Run anonymizer --help or anonymizer <subcommand> --help for all options.

Python API

from anonymizer import Anonymizer, AnonymizerConfig, AnonymizerInput, Redact
DATA_URL = "https://raw.githubusercontent.com/NVIDIA-NeMo/Anonymizer/refs/heads/main/docs/data/NVIDIA_synthetic_biographies.csv"

# Uses default model providers (build.nvidia.com) via NVIDIA_API_KEY env var
anonymizer = Anonymizer()

config = AnonymizerConfig(replace=Redact())

preview = anonymizer.preview(
    config=config,
    data=AnonymizerInput(source=DATA_URL, text_column="biography"),
    num_records=3,
)

# Visualize with entity highlights and replacement map
preview.display_record()

# Most important columns only
preview.dataframe

# Full pipeline trace, including internal underscore-prefixed columns
preview.trace_dataframe

For custom model endpoints, pass a providers YAML:

anonymizer = Anonymizer(model_providers="path/to/model_providers.yaml")

Language And Regional Coverage

Anonymizer has been tested most extensively on English-language data. Multilingual quality has not yet been evaluated systematically across languages, domains, and models.

Although testing so far has been primarily in English, the supported entity set is not limited to U.S.-specific identifiers. Detection and anonymization can also apply to international formats such as non-U.S. phone numbers, addresses, legal references, and national or regional identification numbers, though coverage will vary by language, region, and model configuration.

If you are working with another language, we encourage you to experiment on a small sample first with preview(), validate detected entities and transformed output carefully, and adjust your model providers and model configs as needed.

Replacement Strategies

Strategy	Output for `"Alice"` (first_name)	Configurable
Substitute	`Maya`	`instructions`
Redact	`[REDACTED_FIRST_NAME]`	`format_template`
Annotate	`<Alice, first_name>`	`format_template`
Hash	`<HASH_FIRST_NAME_3bc51062973c>`	`format_template`, `algorithm`, `digest_length`

from anonymizer import Redact, Annotate, Hash, Substitute

# LLM-generated contextual replacements
AnonymizerConfig(replace=Substitute())

# Constant redaction
AnonymizerConfig(replace=Redact(format_template="****"))

# Annotation with entities tagging
AnonymizerConfig(replace=Annotate(format_template="<{text}-|-{label}>"))

# Deterministic hash with short digest
AnonymizerConfig(replace=Hash(algorithm="sha256", digest_length=8))

Development

make install-dev          # Install with dev dependencies
make test                 # Run tests
make coverage             # Run with coverage report
make format-check         # Lint + format check (read-only)
anonymizer --help         # CLI usage
make install-pre-commit   # Install pre-commit hooks

Requirements

Python 3.11+
NeMo Data Designer (installed as dependency)
NVIDIA API key for default model providers (GLiNER-PII + text LLM), or custom model endpoints

License

Apache License 2.0 — see LICENSE for details.

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Apr 23, 2026

0.1.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nemo_anonymizer-0.1.1-py3-none-any.whl (139.0 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file nemo_anonymizer-0.1.1-py3-none-any.whl.

File metadata

Download URL: nemo_anonymizer-0.1.1-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 139.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for nemo_anonymizer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`539314c7e61b6f2cb308e533aa83b1422d6ced91b62597678b15915f9f30b364`
MD5	`dfce2c419e36a88b44fdb4145144e8f0`
BLAKE2b-256	`a6de1fb7d9df289f27703562d1cbb2bb8f3b03d9a970efa9838b94acb1549007`

See more details on using hashes here.

nemo-anonymizer 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

NeMo Anonymizer

What can you do with Anonymizer?

Quick Start

1. Install

2. Set up model providers

3. Anonymize text

CLI

Python API

Language And Regional Coverage

Replacement Strategies

Development

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes