A toolkit for audio transcription, speaker diarization, and text processing

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.13
Topic
- Software Development :: Libraries

Project description

Audio Transcription Toolkit

A toolkit for audio processing, including transcription, speaker diarization, and stopword removal. This project is designed to deliver a seamless pipeline for processing audio data, identifying speakers, and generating clean text transcriptions using Whisper, PyAnnote, and Natasha.

Key Features

Transcription: Converts audio to text using Whisper and FasterWhisper (for faster processing).
Speaker Diarization: Separates and identifies individual speakers using PyAnnote.
Text Post-Processing:
- Remove stopwords and swear words using Natasha.
- Customize stopword behaviors by adding your own rules.

Installation

Make sure you have Python 3.8+ installed on your machine. To install this package, run:

pip install audio_transcribing

Other requirements

If you’re using GPU for better performance, ensure torch is installed with GPU support. You can use:

pip install torch --index-url https://download.pytorch.org/whl/cu117

Quick Start

Transcription of Audio with Speaker Diarization and Cleaned Output

from audio_transcriber.audio_transcribing import Transcriber, NatashaStopwordsRemover

# Initialize the transcriber
transcriber = Transcriber(
  token="your-huggingface-token",  # Token for PyAnnote diarization
  whisper_model="medium",  # Size of Whisper model
  use_faster_whisper=True  # Use Faster Whisper if performance is a priority
)

# Load the audio file
with open("your_file.mp3", "rb") as f:
  audio_content = f.read()

# Transcribe audio
result = transcriber.transcribe(audio_content, language="ru")

print("Transcription with speaker diarization:")
print(result)

# Post-process text (remove stopwords and optional swear words)
cleaned_result = NatashaStopwordsRemover.remove_stopwords(result, remove_swear_words=True)

print("\nCleaned transcription:")
print(cleaned_result)

Add Stopwords or Swear Words to Natasha Processor

Natasha can be used only with russian language.

from audio_transcriber.audio_transcribing import NatashaStopwordsRemover

# Initialize Natasha processor
stopwords_remover = NatashaStopwordsRemover()

# Add new custom stopwords
stopwords_remover.add_words_to_stopwords(["эм", "эй"])

# Add additional swear words
stopwords_remover.add_words_to_swear_words(["тварь", ])

Modules Overview

1. Transcriber

The core of the project, managing transcription, speaker diarization, and post-processing. Key methods:

transcribe(content: bytes, language=None, max_speakers=None): Transcribes audio content and includes speaker annotations.

2. NatashaStopwordsRemover

Text post-processing with Natasha NLP:

remove_stopwords(text: str, remove_swear_words=True, go_few_times=False): Removes stopwords and optionally swear words from transcribed text.
remove_words(text: str, words: list[str]): Removes predefined words from text.

Limitations

Audio Format: Tested on WAV and MP3 formats.
Speaker Diarization: PyAnnote separates speakers but does not assign "real names" like "John" or "Mary".
Stopword Customization: Requires russian language input for additional stopwords or swear words.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.13
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

This version

0.2.6

May 19, 2025

0.2.5

May 17, 2025

0.2.4

May 12, 2025

0.2.3

May 11, 2025

0.2.2

Apr 22, 2025

0.2.1

Apr 16, 2025

0.1.2

Apr 16, 2025

0.1.1

Apr 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_transcribing-0.2.6.tar.gz (15.6 kB view details)

Uploaded May 19, 2025 Source

File details

Details for the file audio_transcribing-0.2.6.tar.gz.

File metadata

Download URL: audio_transcribing-0.2.6.tar.gz
Upload date: May 19, 2025
Size: 15.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for audio_transcribing-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`2e56e1bfbd9d9fc4a81d62874351158a34c90b815d564c66a5e561ef9dbdb326`
MD5	`1c0dcea6f9e6e819adf64083c6fa4c30`
BLAKE2b-256	`e4b53d19173d6c7d1d12f47731e2fab230b5a06faa66c599d3f6a3fd023f158a`

See more details on using hashes here.

audio-transcribing 0.2.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Audio Transcription Toolkit

Key Features

Installation

Other requirements

Quick Start

Transcription of Audio with Speaker Diarization and Cleaned Output

Add Stopwords or Swear Words to Natasha Processor

Modules Overview

1. Transcriber

2. NatashaStopwordsRemover

Limitations

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes