Skip to main content

A library for Robust Vietnamese Audio-Based Toxic Span Detection and Censoring

Project description

ViToSA 2.0: A MULTI-TASK APPROACH TOWARDS ROBUST VIETNAMESE AUDIO-BASED TOXIC SPAN DETECTION | ICASSP 2026

Official implementation of the paper:
“A Multi-Task Approach Towards Robust Vietnamese Audio-Based Toxic Span Detection” (ICASSP 2026).

This package provides an end-to-end pipeline for Vietnamese speech-based toxic span detection, combining ASR and toxic span detection in a unified model. It also supports automatic audio censoring, replacing toxic spans with beep sounds in the output waveform.


Key Features

  • Automated Audio Censoring: Takes an input audio file containing toxic language and outputs a clean .wav file where profanity is masked with a beep.
  • Unified Multi-Task Architecture: Integrates ASR and Toxic Span Detection (TSD) into a single model for high speed.
  • SOTA Performance: Achieves F1-macro 0.9212 on the ViToSA-v2 dataset using PhoWhisper + BiLSTM-CRF + Knowledge Distillation.
  • High Efficiency: Reduces inference latency by over 56% compared to traditional pipelines.

Installation

pip install vitosa-speech

System requirements

This package relies on pydub for audio processing, which requires ffmpeg to be installed.

  • Ubuntu / Debian

    sudo apt-get install ffmpeg
    
  • macOS (Homebrew)

    brew install ffmpeg
    
  • Windows Download ffmpeg from https://ffmpeg.org and add it to your system PATH.


Quick Start

This library allows you to input a raw audio file and get a censored audio file as the output.

1. Load the Model

The model is pre-trained on the ViToSA-v2 dataset

import torch
from vitosa-speech-II import load_my_model
# Automatically detect device (CUDA/CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load the pre-trained model
model, processor = load_my_model(device)

2. Run Inference (Detect & Censor)

from vitosa-speech-II import return_labels, censor_audio_with_beep
from IPython.display import Audio, display # Optional: to play in notebook

# Path to your input file
input_audio = "samples/toxic_speech.wav"

# Step 1: Detect toxic spans
words_with_labels = return_labels(input_audio, model, processor, device)

# Step 2: Generate Censored Audio
# This function creates a new audio file with beeps over toxic words
output_audio_path = censor_audio_with_beep(
    audio_path=input_audio, 
    model=model, 
    processor=processor, 
    words_with_labels=words_with_labels, 
    device=device
)

# Result
print(f"✅ Censored audio saved to: {output_audio_path}")

# Optional: Play the result (if in Jupyter/Colab)
# display(Audio(output_audio_path))

Methodology

Our system works in two steps:

  1. Detection: The multi-task model (PhoWhisper + BiLSTM-CRF) processes the audio to identify the exact start and end timestamps of toxic words.
  2. Censoring: We reconstruct the audio by keeping safe segments and generating a sine wave (beep) to overlay exactly where the toxic tokens occur, ensuring the rest of the sentence remains intelligible.

Contact

For more information: luannt@uit.edu.vn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vitosa_speech_ii-0.0.1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vitosa_speech_ii-0.0.1-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file vitosa_speech_ii-0.0.1.tar.gz.

File metadata

  • Download URL: vitosa_speech_ii-0.0.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for vitosa_speech_ii-0.0.1.tar.gz
Algorithm Hash digest
SHA256 93ad3fff3f7eef2bbf01bf81ad8cc525d09f71027fa4863a58bd102f66b2c3e2
MD5 e83086793953a74fa14ef790ef16ffca
BLAKE2b-256 9e87d5e7f4bd25e4f0cc2dd73cdaf6be0442a795d245923f46a182c64936646d

See more details on using hashes here.

File details

Details for the file vitosa_speech_ii-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vitosa_speech_ii-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53de97488242ee30de8ca1a7cbb8f7e95fc8b19b2a10261dee61e307bfedbe7a
MD5 25943db129a6e4d0d728e6313e1f6dc2
BLAKE2b-256 61f571af7a1c540ac5188477eaf9a86d28313c041377d39b9d6f5ab1ffc47619

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page