Wowool Anonymizer
Project description
Ensuring data privacy
The anonymizer app detects and redacts personally identifiable information (PII) and sensitive entities from unstructured text. Its goal is to preserve privacy while retaining the utility of the original content for downstream processing or analysis.
Options
AnonymizerOptions
interface AnonymizerOptions {
annotations?: string[];
pseudonyms?: Record<string, string[]>;
formatters?: Record<string, string>;
}
with
| Property | Description |
|---|---|
annotations |
List of annotations to anonymize. If not provided, all annotations will be anonymized |
pseudonyms |
Mapping from entity URI, such as Person or Company, to names associated with that entity type |
formatters |
Mapping from entity URI and the corresponding formatter (f-string like) to convert the input data |
Formatters
Predefined variables can be used to format the input data:
| Property | Description |
|---|---|
uri |
URI of the entity |
literal |
Literal text of the entity |
canonical |
Normalized or canonicalized text, e.g. John Doe instead of he |
concept |
Concept that you can use to anonymize (e.g. concept.gender ) |
anonymized |
Converted data |
For example, consider the following formatters:
"formatters": {
"Person": "#{uri}-{concept.position}-#{nr}",
"PersonalIdentificationNumber": "#{\"*\"* (len(literal)-3)}{literal[-2:]}",
"default": "{'.'*len(literal)}"
}
- The first formatter will replace
Personwith the URI, the position and a counter. For instance, John Doe will be redacted as #Person-Lawyer-#3 - The second will create a mask using the literal's length. For instance, 11-22-333 will be masked as *******33
- The last one, which corresponds with the default formatter, will mask the whole length of the literal using dots. For instance, Ikea will be entirely redacted as ....
Results
AnonymizerResults
interface AnonymizerResults {
text: string;
locations: Location[];
}
with:
| Property | Description |
|---|---|
text |
Anonymized text |
locations |
Structured information of the changes that have been made |
Location
interface Location {
uri: string;
text: string;
anonymized: string;
begin_offset: number;
end_offset: number;
byte_begin_offset: number;
byte_end_offset: number;
}
with:
| Property | Description |
|---|---|
uri |
URI of the entity that was anonymized, e.g. Person or Company |
text |
Original text segment that was anonymized |
anonymized |
Anonymized or pseudonymized version of the original text |
begin_offset |
Starting character offset in the input document |
end_offset |
Ending character offset in the input document |
byte_begin_offset |
Starting byte offset in the input document |
byte_end_offset |
Ending byte offset in the input document |
Examples
Ensuring data privacy
The anonymizer app detects and redacts personally identifiable information (PII) and sensitive entities from unstructured text. Its goal is to preserve privacy while retaining the utility of the original content for downstream processing or analysis.
Options
AnonymizerOptions
interface AnonymizerOptions {
annotations?: string[];
pseudonyms?: Record<string, string[]>;
formatters?: Record<string, string>;
}
with
| Property | Description |
|---|---|
annotations |
List of annotations to anonymize. If not provided, all annotations will be anonymized |
pseudonyms |
Mapping from entity URI, such as Person or Company, to names associated with that entity type |
formatters |
Mapping from entity URI and the corresponding formatter (f-string like) to convert the input data |
Formatters
Predefined variables can be used to format the input data:
| Property | Description |
|---|---|
uri |
URI of the entity |
literal |
Literal text of the entity |
canonical |
Normalized or canonicalized text, e.g. John Doe instead of he |
concept |
Concept that you can use to anonymize (e.g. concept.gender ) |
anonymized |
Converted data |
For example, consider the following formatters:
"formatters": {
"Person": "#{uri}-{concept.position}-#{nr}",
"PersonalIdentificationNumber": "#{\"*\"* (len(literal)-3)}{literal[-2:]}",
"default": "{'.'*len(literal)}"
}
- The first formatter will replace
Personwith the URI, the position and a counter. For instance, John Doe will be redacted as #Person-Lawyer-#3 - The second will create a mask using the literal's length. For instance, 11-22-333 will be masked as *******33
- The last one, which corresponds with the default formatter, will mask the whole length of the literal using dots. For instance, Ikea will be entirely redacted as ....
Results
AnonymizerResults
interface AnonymizerResults {
text: string;
locations: Location[];
}
with:
| Property | Description |
|---|---|
text |
Anonymized text |
locations |
Structured information of the changes that have been made |
Location
interface Location {
uri: string;
text: string;
anonymized: string;
begin_offset: number;
end_offset: number;
byte_begin_offset: number;
byte_end_offset: number;
}
with:
| Property | Description |
|---|---|
uri |
URI of the entity that was anonymized, e.g. Person or Company |
text |
Original text segment that was anonymized |
anonymized |
Anonymized or pseudonymized version of the original text |
begin_offset |
Starting character offset in the input document |
end_offset |
Ending character offset in the input document |
byte_begin_offset |
Starting byte offset in the input document |
byte_end_offset |
Ending byte offset in the input document |
API
Examples
You will need to install the english language module to run the sample. pip install wowool-english
Anonymize known entities
This script finds entities in a sentence and replaces each character of those entities with a dot, then prints the anonymized output and structured information.
DefaultWriter(formatters={"default": "{'.'*len(literal)}"}) sets up a writer that replaces each character of any entity with a dot (.), matching the entity’s length.
from wowool.sdk import Pipeline
from wowool.anonymizer import Anonymizer, DefaultWriter
from json import dumps
# replace all characters of a entities with dot's
english = Pipeline("english,entity")
document = english("John Smith works for Ikea.")
writer = DefaultWriter(formatters={"default": "{'.'*len(literal)}"})
writer = DefaultWriter(formatters={"default": "###{anonymized_literal}"})
anonymizer = Anonymizer(writer=writer)
document = anonymizer(document)
results = document.results(Anonymizer.ID)
print(dumps(results, indent=2))
results:
{
"text": ".......... works for .....",
"locations": [
{
"begin_offset": 0,
"end_offset": 10,
"text": "John Smith",
"uri": "Person",
"anonymized": "..........",
"byte_begin_offset": 0,
"byte_end_offset": 10
},
{
"begin_offset": 21,
"end_offset": 25,
"text": "IKEA",
"uri": "Company",
"anonymized": "....",
"byte_begin_offset": 21,
"byte_end_offset": 25
}
]
}
Custom pseudonyms
This script replaces detected person and company names in the text with your chosen pseudonyms, then prints the anonymized result
from wowool.sdk import Pipeline
from wowool.anonymizer import Anonymizer, DefaultWriter
# note you can use the default pseudonyms if you want
# from wowool.anonymizer.core.anonymizer_config import DEFAULT_PSEUDONYMS
from json import dumps
# replace all characters of a entities with dot's
english = Pipeline("english,entity")
document = english("John Smith works for Ikea.")
pseudonyms = {
"Person": ["Badman"],
"Company": ["Monster Inc."],
}
writer = DefaultWriter(pseudonyms)
anonymizer = Anonymizer(writer=writer)
document = anonymizer(document)
results = document.results(Anonymizer.ID)
print(dumps(results, indent=2))
results:
{
"text": "Badman works for Monster Inc..",
"locations": [
{
"begin_offset": 0,
"end_offset": 6,
"text": "John Smith",
"uri": "Person",
"anonymized": "Badman",
"byte_begin_offset": 0,
"byte_end_offset": 10
},
{
"begin_offset": 17,
"end_offset": 29,
"text": "IKEA",
"uri": "Company",
"anonymized": "Monster Inc.",
"byte_begin_offset": 21,
"byte_end_offset": 25
}
]
}
License
In both cases you will need to acquirer a license file at https://www.wowool.com
Non-Commercial
This library is licensed under the GNU AGPLv3 for non-commercial use.
For commercial use, a separate license must be purchased.
Commercial license Terms
1. Grants the right to use this library in proprietary software.
2. Requires a valid license key
3. Redistribution in SaaS requires a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wowool_anonymizer-2.2.3-py3-none-any.whl.
File metadata
- Download URL: wowool_anonymizer-2.2.3-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70539cce5f1967df381744252b61cb2a7eb0ab22ad0868afc2b5cc8636903d34
|
|
| MD5 |
d8011d7cbb8b9f3570e4046f6c98398a
|
|
| BLAKE2b-256 |
9714e69596b538a45ad54b34474a0d3dd173ce47f042e1224e71a04130fa8d07
|