Library and CLI for text anonymization plus audio/video transcription with diarization
Project description
Anonim Video Text Library
Standalone project for two separate workflows:
- text anonymization for
JSON/JSONL/CSV/Markdown/TXTwith a persistentpeople.jsondictionary - audio/video transcription with diarization
Project layout
src/anonim_video_text_library/- importable Python packagetext_anonim/- default runtime workspace for the anonymizerexamples/Anonimizez_example/- self-contained anonymizer exampleexamples/Transcibator_example/- self-contained transcription examplemain.py- local wrapper for the transcription CLIgpu_backends/- helper scripts for GPU transcription backendswhisper.cpp/- local checkout ofwhisper.cpp
Installation
cd Anonim_video_text_Library
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
This installs the dependencies for both workflows, including torch,
transformers, faster-whisper, pyannote.audio, and imageio-ffmpeg.
Run the anonymizer separately
CLI entrypoints:
python3 -m anonim_video_text_library --help
python3 text_anonim/anonimizer.py --help
Self-contained example:
cd examples/Anonimizez_example
python3 run_anonymizer_example.py
What is inside examples/Anonimizez_example/:
.envand.env.examplefor settingsinput/for sourcejson/jsonl/csv/md/txtfilesoutput/for anonymized copiesruntime_root/files/pii/forpeople.jsonand blocklistsrun_anonymizer_example.pyas the runner
The example writes output files to output/. It does not print anonymized
content only to the terminal.
Run the transcriber separately
CLI entrypoints:
anonim-video-text-transcribe --help
python3 main.py --help
Self-contained example:
cd examples/Transcibator_example
python3 run_transcriber_example.py
What is inside examples/Transcibator_example/:
.envand.env.examplefor settingsinput/for media filesoutput/for generated transcriptsrun_transcriber_example.pyas the runner
If you need diarization, set HF_TOKEN in .env or in your shell.
Generate runtime examples
To generate the same two example folders inside any runtime workspace:
python3 -m anonim_video_text_library \
--runtime-root /path/to/runtime \
--example
To rebuild the generated README and example folders:
python3 -m anonim_video_text_library \
--runtime-root /path/to/runtime \
--example \
--force-example
The generated runtime examples live under:
examples/Anonimizez_example/examples/Transcibator_example/
Each example is isolated. The demo scripts no longer create another nested
examples/ tree inside their own runtime data.
Default runtime workspace
By default the anonymizer uses:
text_anonim
That workspace contains:
files/pii/for input files,people.json, and blocklistsfiles/pii_anonymized/for anonymized outputREADME.mdwith generated workspace instructionsexamples/with the two generated example folders
Python API
The main public API is TextAnonymizationSession.
from pathlib import Path
from anonim_video_text_library import TextAnonymizationSession
session = TextAnonymizationSession.from_defaults(
runtime_root=Path("/path/to/runtime"),
device="auto",
ner_batch_size=16,
)
text, stats = session.anonymize_text(
"Jordan Miller from Northwind Labs wrote to contact@example.com",
file_id="demo.txt",
)
print(text)
print(stats)
payload, stats = session.anonymize_value(
{
"title": "Jordan Miller",
"body": "Northwind Labs contact: contact@example.com",
},
file_id="demo.json",
)
print(payload)
print(stats)
directory_stats = session.anonymize_directory(
input_root=Path("/path/to/input"),
output_root=Path("/path/to/output"),
skip_existing=True,
)
print(directory_stats)
print(session.people_file)
Related docs
Notes
text_anonim/filesmay contain large working datasetswhisper.cpp/models/*.binare not copied automatically with the projectfairseq_envwas intentionally not moved with the standalone package; recreate a local environment if you still need it
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anonim_video_text_library-0.1.7.tar.gz.
File metadata
- Download URL: anonim_video_text_library-0.1.7.tar.gz
- Upload date:
- Size: 41.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c33b04edaf37e72b1460ac4d5a425e948afb65615c5f1e520cce7c86cc4ff4ad
|
|
| MD5 |
e1bdce79b304acc6305323e5e2c3d289
|
|
| BLAKE2b-256 |
4cac15f3845cec78c79990a386bb6e431325c4b24ef141c9bb1630e84a710e80
|
File details
Details for the file anonim_video_text_library-0.1.7-py3-none-any.whl.
File metadata
- Download URL: anonim_video_text_library-0.1.7-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab40da6fdc634da61a96e8c299be29f20875e37b91575b8edfc200044eb10ab1
|
|
| MD5 |
e1773de4f3144cd9ab2395e8b7c53e5e
|
|
| BLAKE2b-256 |
755c1420b124050fc2daa7ea3969b40717733e3ba99c9c5b21dba198ca348571
|