Library and CLI for text anonymization plus audio/video transcription with diarization

Project description

Anonim Video Text Library

Standalone project for two separate workflows:

text anonymization for JSON/JSONL/CSV/Markdown/TXT with a persistent people.json dictionary
audio/video transcription with diarization

Project layout

src/anonim_video_text_library/ - importable Python package
text_anonim/ - default runtime workspace for the anonymizer
examples/Anonimizez_example/ - self-contained anonymizer example
examples/Transcibator_example/ - self-contained transcription example
main.py - local wrapper for the transcription CLI
gpu_backends/ - helper scripts for GPU transcription backends
whisper.cpp/ - local checkout of whisper.cpp

Installation

cd Anonim_video_text_Library
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

This installs the dependencies for both workflows, including torch, transformers, faster-whisper, pyannote.audio, and imageio-ffmpeg.

Run the anonymizer separately

CLI entrypoints:

python3 -m anonim_video_text_library --help
python3 text_anonim/anonimizer.py --help

Self-contained example:

cd examples/Anonimizez_example
python3 run_anonymizer_example.py

What is inside examples/Anonimizez_example/:

.env and .env.example for settings
input/ for source json/jsonl/csv/md/txt files
output/ for anonymized copies
runtime_root/files/pii/ for people.json and blocklists
run_anonymizer_example.py as the runner

The example writes output files to output/. It does not print anonymized content only to the terminal.

Run the transcriber separately

CLI entrypoints:

anonim-video-text-transcribe --help
python3 main.py --help

Self-contained example:

cd examples/Transcibator_example
python3 run_transcriber_example.py

What is inside examples/Transcibator_example/:

.env and .env.example for settings
input/ for media files
output/ for generated transcripts
run_transcriber_example.py as the runner

If you need diarization, set HF_TOKEN in .env or in your shell.

Generate runtime examples

To generate the same two example folders inside any runtime workspace:

python3 -m anonim_video_text_library \
  --runtime-root /path/to/runtime \
  --example

To rebuild the generated README and example folders:

python3 -m anonim_video_text_library \
  --runtime-root /path/to/runtime \
  --example \
  --force-example

The generated runtime examples live under:

examples/Anonimizez_example/
examples/Transcibator_example/

Each example is isolated. The demo scripts no longer create another nested examples/ tree inside their own runtime data.

Default runtime workspace

By default the anonymizer uses:

text_anonim

That workspace contains:

files/pii/ for input files, people.json, and blocklists
files/pii_anonymized/ for anonymized output
README.md with generated workspace instructions
examples/ with the two generated example folders

Python API

The main public API is TextAnonymizationSession.

from pathlib import Path
from anonim_video_text_library import TextAnonymizationSession

session = TextAnonymizationSession.from_defaults(
    runtime_root=Path("/path/to/runtime"),
    device="auto",
    ner_batch_size=16,
)

text, stats = session.anonymize_text(
    "Jordan Miller from Northwind Labs wrote to contact@example.com",
    file_id="demo.txt",
)
print(text)
print(stats)

payload, stats = session.anonymize_value(
    {
        "title": "Jordan Miller",
        "body": "Northwind Labs contact: contact@example.com",
    },
    file_id="demo.json",
)
print(payload)
print(stats)

directory_stats = session.anonymize_directory(
    input_root=Path("/path/to/input"),
    output_root=Path("/path/to/output"),
    skip_existing=True,
)
print(directory_stats)
print(session.people_file)

Related docs

Notes

text_anonim/files may contain large working datasets
whisper.cpp/models/*.bin are not copied automatically with the project
fairseq_env was intentionally not moved with the standalone package; recreate a local environment if you still need it

Project details

Release history Release notifications | RSS feed

This version

0.1.7

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonim_video_text_library-0.1.7.tar.gz (41.7 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anonim_video_text_library-0.1.7-py3-none-any.whl (45.5 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file anonim_video_text_library-0.1.7.tar.gz.

File metadata

Download URL: anonim_video_text_library-0.1.7.tar.gz
Upload date: Mar 31, 2026
Size: 41.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anonim_video_text_library-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`c33b04edaf37e72b1460ac4d5a425e948afb65615c5f1e520cce7c86cc4ff4ad`
MD5	`e1bdce79b304acc6305323e5e2c3d289`
BLAKE2b-256	`4cac15f3845cec78c79990a386bb6e431325c4b24ef141c9bb1630e84a710e80`

See more details on using hashes here.

File details

Details for the file anonim_video_text_library-0.1.7-py3-none-any.whl.

File metadata

Download URL: anonim_video_text_library-0.1.7-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 45.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anonim_video_text_library-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab40da6fdc634da61a96e8c299be29f20875e37b91575b8edfc200044eb10ab1`
MD5	`e1773de4f3144cd9ab2395e8b7c53e5e`
BLAKE2b-256	`755c1420b124050fc2daa7ea3969b40717733e3ba99c9c5b21dba198ca348571`

See more details on using hashes here.

anonim-video-text-library 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Anonim Video Text Library

Project layout

Installation

Run the anonymizer separately

Run the transcriber separately

Generate runtime examples

Default runtime workspace

Python API

Related docs

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes