A library for Robust Vietnamese Audio-Based Toxic Span Detection and Censoring
Project description
ViToSA 2.0: A MULTI-TASK APPROACH TOWARDS ROBUST VIETNAMESE AUDIO-BASED TOXIC SPAN DETECTION | ICASSP 2026
Official implementation of the paper:
“A Multi-Task Approach Towards Robust Vietnamese Audio-Based Toxic Span Detection” (ICASSP 2026).
This package provides an end-to-end pipeline for Vietnamese speech-based toxic span detection, combining ASR and toxic span detection in a unified model. It also supports automatic audio censoring, replacing toxic spans with beep sounds in the output waveform.
Key Features
- Automated Audio Censoring: Takes an input audio file containing toxic language and outputs a clean
.wavfile where profanity is masked with a beep. - Unified Multi-Task Architecture: Integrates ASR and Toxic Span Detection (TSD) into a single model for high speed.
- SOTA Performance: Achieves F1-macro 0.9212 on the ViToSA-v2 dataset using PhoWhisper + BiLSTM-CRF + Knowledge Distillation.
- High Efficiency: Reduces inference latency by over 56% compared to traditional pipelines.
Installation
pip install vitosa-speech
System requirements
This package relies on pydub for audio processing, which requires ffmpeg to be installed.
-
Ubuntu / Debian
sudo apt-get install ffmpeg
-
macOS (Homebrew)
brew install ffmpeg
-
Windows Download ffmpeg from https://ffmpeg.org and add it to your system
PATH.
Quick Start
This library allows you to input a raw audio file and get a censored audio file as the output.
1. Load the Model
The model is pre-trained on the ViToSA-v2 dataset
import torch
from vitosa-speech-II import load_my_model
# Automatically detect device (CUDA/CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load the pre-trained model
model, processor = load_my_model(device)
2. Run Inference (Detect & Censor)
from vitosa-speech-II import return_labels, censor_audio_with_beep
from IPython.display import Audio, display # Optional: to play in notebook
# Path to your input file
input_audio = "samples/toxic_speech.wav"
# Step 1: Detect toxic spans
words_with_labels = return_labels(input_audio, model, processor, device)
# Step 2: Generate Censored Audio
# This function creates a new audio file with beeps over toxic words
output_audio_path = censor_audio_with_beep(
audio_path=input_audio,
model=model,
processor=processor,
words_with_labels=words_with_labels,
device=device
)
# Result
print(f"✅ Censored audio saved to: {output_audio_path}")
# Optional: Play the result (if in Jupyter/Colab)
# display(Audio(output_audio_path))
Methodology
Our system works in two steps:
- Detection: The multi-task model (PhoWhisper + BiLSTM-CRF) processes the audio to identify the exact start and end timestamps of toxic words.
- Censoring: We reconstruct the audio by keeping safe segments and generating a sine wave (beep) to overlay exactly where the toxic tokens occur, ensuring the rest of the sentence remains intelligible.
Contact
For more information: luannt@uit.edu.vn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vitosa_speech_ii-0.0.1.tar.gz.
File metadata
- Download URL: vitosa_speech_ii-0.0.1.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93ad3fff3f7eef2bbf01bf81ad8cc525d09f71027fa4863a58bd102f66b2c3e2
|
|
| MD5 |
e83086793953a74fa14ef790ef16ffca
|
|
| BLAKE2b-256 |
9e87d5e7f4bd25e4f0cc2dd73cdaf6be0442a795d245923f46a182c64936646d
|
File details
Details for the file vitosa_speech_ii-0.0.1-py3-none-any.whl.
File metadata
- Download URL: vitosa_speech_ii-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53de97488242ee30de8ca1a7cbb8f7e95fc8b19b2a10261dee61e307bfedbe7a
|
|
| MD5 |
25943db129a6e4d0d728e6313e1f6dc2
|
|
| BLAKE2b-256 |
61f571af7a1c540ac5188477eaf9a86d28313c041377d39b9d6f5ab1ffc47619
|