Wowool Anonymizer

Project description

Ensuring data privacy

The anonymizer app detects and redacts personally identifiable information (PII) and sensitive entities from unstructured text. Its goal is to preserve privacy while retaining the utility of the original content for downstream processing or analysis.

Options

AnonymizerOptions

interface AnonymizerOptions {
    annotations?: string[];
    pseudonyms?: Record<string, string[]>;
    formatters?: Record<string, string>;
}

with

Property	Description
`annotations`	List of annotations to anonymize. If not provided, all annotations will be anonymized
`pseudonyms`	Mapping from entity URI, such as `Person` or `Company`, to names associated with that entity type
`formatters`	Mapping from entity URI and the corresponding formatter (f-string like) to convert the input data

Formatters

Predefined variables can be used to format the input data:

Property	Description
`uri`	URI of the entity
`literal`	Literal text of the entity
`canonical`	Normalized or canonicalized text, e.g. John Doe instead of he
`concept`	Concept that you can use to anonymize (e.g. concept.gender )
`anonymized`	Converted data

For example, consider the following formatters:

"formatters": {
    "Person": "#{uri}-{concept.position}-#{nr}",
    "PersonalIdentificationNumber": "#{\"*\"* (len(literal)-3)}{literal[-2:]}",
    "default": "{'.'*len(literal)}"
}

The first formatter will replace Person with the URI, the position and a counter. For instance, John Doe will be redacted as #Person-Lawyer-#3
The second will create a mask using the literal's length. For instance, 11-22-333 will be masked as *******33
The last one, which corresponds with the default formatter, will mask the whole length of the literal using dots. For instance, Ikea will be entirely redacted as ....

Results

AnonymizerResults

interface AnonymizerResults {
    text: string;
    locations: Location[];
}

with:

Property	Description
`text`	Anonymized text
`locations`	Structured information of the changes that have been made

Location

interface Location {
    uri: string;
    text: string;
    anonymized: string;
    begin_offset: number;
    end_offset: number;
    byte_begin_offset: number;
    byte_end_offset: number;
}

with:

Property	Description
`uri`	URI of the entity that was anonymized, e.g. `Person` or `Company`
`text`	Original text segment that was anonymized
`anonymized`	Anonymized or pseudonymized version of the original text
`begin_offset`	Starting character offset in the input document
`end_offset`	Ending character offset in the input document
`byte_begin_offset`	Starting byte offset in the input document
`byte_end_offset`	Ending byte offset in the input document

Examples

Ensuring data privacy

Options

AnonymizerOptions

interface AnonymizerOptions {
    annotations?: string[];
    pseudonyms?: Record<string, string[]>;
    formatters?: Record<string, string>;
}

with

Property	Description
`annotations`	List of annotations to anonymize. If not provided, all annotations will be anonymized
`pseudonyms`	Mapping from entity URI, such as `Person` or `Company`, to names associated with that entity type
`formatters`	Mapping from entity URI and the corresponding formatter (f-string like) to convert the input data

Formatters

Predefined variables can be used to format the input data:

Property	Description
`uri`	URI of the entity
`literal`	Literal text of the entity
`canonical`	Normalized or canonicalized text, e.g. John Doe instead of he
`concept`	Concept that you can use to anonymize (e.g. concept.gender )
`anonymized`	Converted data

For example, consider the following formatters:

"formatters": {
    "Person": "#{uri}-{concept.position}-#{nr}",
    "PersonalIdentificationNumber": "#{\"*\"* (len(literal)-3)}{literal[-2:]}",
    "default": "{'.'*len(literal)}"
}

The first formatter will replace Person with the URI, the position and a counter. For instance, John Doe will be redacted as #Person-Lawyer-#3
The second will create a mask using the literal's length. For instance, 11-22-333 will be masked as *******33
The last one, which corresponds with the default formatter, will mask the whole length of the literal using dots. For instance, Ikea will be entirely redacted as ....

Results

AnonymizerResults

interface AnonymizerResults {
    text: string;
    locations: Location[];
}

with:

Property	Description
`text`	Anonymized text
`locations`	Structured information of the changes that have been made

Location

interface Location {
    uri: string;
    text: string;
    anonymized: string;
    begin_offset: number;
    end_offset: number;
    byte_begin_offset: number;
    byte_end_offset: number;
}

with:

Property	Description
`uri`	URI of the entity that was anonymized, e.g. `Person` or `Company`
`text`	Original text segment that was anonymized
`anonymized`	Anonymized or pseudonymized version of the original text
`begin_offset`	Starting character offset in the input document
`end_offset`	Ending character offset in the input document
`byte_begin_offset`	Starting byte offset in the input document
`byte_end_offset`	Ending byte offset in the input document

API

Examples

You will need to install the english language module to run the sample. pip install wowool-english

Anonymize known entities

This script finds entities in a sentence and replaces each character of those entities with a dot, then prints the anonymized output and structured information.

DefaultWriter(formatters={"default": "{'.'*len(literal)}"}) sets up a writer that replaces each character of any entity with a dot (.), matching the entity’s length.

from wowool.sdk import Pipeline
from wowool.anonymizer import Anonymizer, DefaultWriter
from json import dumps

# replace all characters of a entities with dot's
english = Pipeline("english,entity")
document = english("John Smith works for Ikea.")
writer = DefaultWriter(formatters={"default": "{'.'*len(literal)}"})
writer = DefaultWriter(formatters={"default": "###{anonymized_literal}"})
anonymizer = Anonymizer(writer=writer)
document = anonymizer(document)
results = document.results(Anonymizer.ID)
print(dumps(results, indent=2))

results:

{
  "text": ".......... works for .....",
  "locations": [
    {
      "begin_offset": 0,
      "end_offset": 10,
      "text": "John Smith",
      "uri": "Person",
      "anonymized": "..........",
      "byte_begin_offset": 0,
      "byte_end_offset": 10
    },
    {
      "begin_offset": 21,
      "end_offset": 25,
      "text": "IKEA",
      "uri": "Company",
      "anonymized": "....",
      "byte_begin_offset": 21,
      "byte_end_offset": 25
    }
  ]
}

Custom pseudonyms

This script replaces detected person and company names in the text with your chosen pseudonyms, then prints the anonymized result

from wowool.sdk import Pipeline
from wowool.anonymizer import Anonymizer, DefaultWriter

# note you can use the default pseudonyms if you want
# from wowool.anonymizer.core.anonymizer_config import DEFAULT_PSEUDONYMS
from json import dumps

# replace all characters of a entities with dot's
english = Pipeline("english,entity")
document = english("John Smith works for Ikea.")
pseudonyms = {
    "Person": ["Badman"],
    "Company": ["Monster Inc."],
}
writer = DefaultWriter(pseudonyms)
anonymizer = Anonymizer(writer=writer)
document = anonymizer(document)
results = document.results(Anonymizer.ID)
print(dumps(results, indent=2))

results:

{
  "text": "Badman works for Monster Inc..",
  "locations": [
    {
      "begin_offset": 0,
      "end_offset": 6,
      "text": "John Smith",
      "uri": "Person",
      "anonymized": "Badman",
      "byte_begin_offset": 0,
      "byte_end_offset": 10
    },
    {
      "begin_offset": 17,
      "end_offset": 29,
      "text": "IKEA",
      "uri": "Company",
      "anonymized": "Monster Inc.",
      "byte_begin_offset": 21,
      "byte_end_offset": 25
    }
  ]
}

License

In both cases you will need to acquirer a license file at https://www.wowool.com

Non-Commercial

This library is licensed under the GNU AGPLv3 for non-commercial use.  
For commercial use, a separate license must be purchased.

Commercial license Terms

1. Grants the right to use this library in proprietary software.  
2. Requires a valid license key  
3. Redistribution in SaaS requires a commercial license.

Project details

Release history Release notifications | RSS feed

2.2.4

Apr 21, 2026

This version

2.2.3

Apr 14, 2026

2.2.2

Apr 14, 2026

2.2.1

Mar 21, 2026

2.1.1

Jun 23, 2025

2.1.1.dev13 pre-release

May 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wowool_anonymizer-2.2.3-py3-none-any.whl (45.7 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file wowool_anonymizer-2.2.3-py3-none-any.whl.

File metadata

Download URL: wowool_anonymizer-2.2.3-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 45.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for wowool_anonymizer-2.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70539cce5f1967df381744252b61cb2a7eb0ab22ad0868afc2b5cc8636903d34`
MD5	`d8011d7cbb8b9f3570e4046f6c98398a`
BLAKE2b-256	`9714e69596b538a45ad54b34474a0d3dd173ce47f042e1224e71a04130fa8d07`

See more details on using hashes here.

wowool-anonymizer 2.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Ensuring data privacy

Options

AnonymizerOptions

Formatters

Results

AnonymizerResults

Location

Examples

Ensuring data privacy

Options

AnonymizerOptions

Formatters

Results

AnonymizerResults

Location

API

Examples

Anonymize known entities

Custom pseudonyms

License

Non-Commercial

Commercial license Terms

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes