An AI-powered script to identify speakers in an audio file and split them into separate, clean tracks.

These details have not been verified by PyPI

Project links

Project description

  ██╗     ██╗   ██╗██╗  ██╗██╗██╗   ██╗███╗   ███╗ 
  ██║     ██║   ██║██║ ██╔╝██║██║   ██║████╗ ████║ 
  ██║     ██║   ██║█████╔╝ ██║██║   ██║██╔████╔██║ 
  ██║     ██║   ██║██╔═██╗ ██║██║   ██║██║╚██╔╝██║ 
  ███████╗╚██████╔╝██║  ██╗██║╚██████╔╝██║ ╚═╝ ██║ 
  ╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚═╝ ╚═════╝ ╚═╝     ╚═╝

Speaker Diarization & Splitting System

A powerful Python script that automatically identifies different speakers in an audio file and splits them into separate, clean tracks. Built by Lukium.

Overview

This project uses AI-powered speaker diarization (thanks to pyannote.audio) to process audio files containing multiple speakers. It intelligently determines who is speaking and when, then exports a separate audio file for each person.

The key feature is its ability to remove crosstalk. The output tracks contain silence when the speaker is not talking, ensuring that overlapping speech is eliminated. This makes it an ideal tool for podcast editing, interview transcription, character animation workflows, and any other task requiring isolated speaker audio.

Features

🎙️ Multi-Speaker Diarization: Identifies and separates an unlimited number of speakers in a single audio file.
🧹 Crosstalk Removal: Generates clean, non-overlapping audio tracks for each speaker.
⚙️ Batch Processing: Automatically processes all supported audio files (.wav, .mp3, .m4a, .flac) in the audio/pending directory.
🚀 GPU Acceleration: Automatically detects and uses an NVIDIA GPU for significantly faster processing.
🗣️ Flexible Speaker Count: You can specify an exact number of speakers, a min/max range, or let the model detect it automatically.
🤫 Verbose/Quiet Mode: Run in quiet mode for clean output, or use the --verbose flag to see detailed logs for debugging.

🤖 Automated Sanity Checks

The main split_speakers.py script is designed to make the first run as smooth as possible by including automated checks for common setup problems. If you forget a step, the script will try to help you fix it.

Missing FFmpeg: If the script can't find ffmpeg in your system's PATH, it will print an error with instructions and automatically open the FFmpeg download page in your browser before exiting.

Hugging Face Model Access: The script proactively checks if you have accepted the user agreements for the required pyannote models. If you haven't accepted one, it will print a message identifying the specific model and automatically open its Hugging Face page for you to accept the terms.

Prerequisites

Before you begin, ensure you have the following installed on your system:

Python 3.9+
Git (for cloning the repository).
NVIDIA GPU with CUDA Drivers (required for GPU acceleration).
FFmpeg: The script requires FFmpeg for audio processing.
- Download from: https://www.gyan.dev/ffmpeg/builds/
- Ensure the bin folder from the download is added to your system's PATH.

Setup & Installation

This project uses uv for fast and reliable Python package management. The setup process is guided by an interactive script.

Clone the Repository

git clone <your-repository-url>
cd <your-repository-folder>

Install uv If you don't have uv installed, follow the official instructions for your OS: https://github.com/astral-sh/uv
Create & Activate a Virtual Environment It's critical to run this project in a dedicated virtual environment. Run your terminal as an Administrator for this process on Windows.
```
# Create the environment with pip bootstrapped
uv venv .venv --seed

# Activate it (on Windows)
.venv\Scripts\activate
```
Run the Interactive Setup Script This script will detect your hardware and install the correct dependencies.
```
python install.py
```
Follow the on-screen prompts. If you have an NVIDIA GPU, it will ask if you want to install the CUDA-enabled libraries.
Create .env File Create a file named .env in the project's root directory. Get a read access token from Hugging Face and add it to the file:
```
HF_TOKEN=hf_YourAccessTokenGoesHere
```
Accept Hugging Face Agreements You must accept the user conditions for the gated models used by this project. Visit the links below, make sure you are logged in, and click the "Access repository" button on each page.
- pyannote/speaker-diarization-3.1
- pyannote/segmentation-3.0

Usage

Place Files: Add the audio files you want to process into the audio/pending directory.
Run the Script: Execute the script from your terminal with your virtual environment active.

Command Examples:

Automatic speaker detection:
```
python split_speakers.py
```
Specify an exact number of speakers (e.g., 2):
```
python split_speakers.py 2
```
Specify a range of speakers (e.g., min 2, max 4):
```
python split_speakers.py 2 4
```
Run in verbose/debug mode:
```
python split_speakers.py --verbose
```

File Workflow:

Input: audio/pending/your_file.wav
Processed Original: audio/processed/your_file.wav
Output: audio/completed/your_file_SPEAKER_00.wav, audio/completed/your_file_SPEAKER_01.wav, etc.

Troubleshooting

Permission denied Errors during Setup: You must run your terminal (PowerShell/Command Prompt) as an Administrator on Windows to ensure the setup script can write to the virtual environment directory.
nvidia-smi Not Found: This means your NVIDIA drivers are not installed correctly or nvidia-smi.exe is not in your system's PATH.
Hugging Face Errors: If you get a 401 or GatedRepoError, double-check that your HF_TOKEN in the .env file is correct and that you have accepted the user agreements for both required models.
Latest Libraries Causing Bugs? If you suspect a new library version has introduced a bug, you can install a known-stable set of dependencies by running the setup script in failsafe mode: python install.py --failsafe.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.2

Jul 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speaker_diarization_system-1.0.2.tar.gz (14.1 kB view details)

Uploaded Jul 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speaker_diarization_system-1.0.2-py3-none-any.whl (14.7 kB view details)

Uploaded Jul 29, 2025 Python 3

File details

Details for the file speaker_diarization_system-1.0.2.tar.gz.

File metadata

Download URL: speaker_diarization_system-1.0.2.tar.gz
Upload date: Jul 29, 2025
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for speaker_diarization_system-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`3006a252963d8176e506e34b111fdc2feb6b14711117b51234686114e1e4a713`
MD5	`f1310626e4cc7b0a73430709206d5566`
BLAKE2b-256	`b5c318e27205a74226b02c90fe4754fc6ce4d05470ae44bcefd3b94043eed827`

See more details on using hashes here.

File details

Details for the file speaker_diarization_system-1.0.2-py3-none-any.whl.

File metadata

Download URL: speaker_diarization_system-1.0.2-py3-none-any.whl
Upload date: Jul 29, 2025
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for speaker_diarization_system-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1624f6f91bfbc5f41aedb66b6b1b59342641ee24ba44e556a8a97a585252cdf`
MD5	`01e0e6920846778412c029670d21622f`
BLAKE2b-256	`cc44c76189cc22ac283586c89bd77e1e5e8dde43c26fd11b0814c5c552954309`

See more details on using hashes here.

speaker-diarization-system 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Speaker Diarization & Splitting System

Overview

Features

🤖 Automated Sanity Checks

Prerequisites

Setup & Installation

Usage

Command Examples:

File Workflow:

Troubleshooting

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes