An AI-powered script to identify speakers in an audio file and split them into separate, clean tracks.
Project description
██╗ ██╗ ██╗██╗ ██╗██╗██╗ ██╗███╗ ███╗
██║ ██║ ██║██║ ██╔╝██║██║ ██║████╗ ████║
██║ ██║ ██║█████╔╝ ██║██║ ██║██╔████╔██║
██║ ██║ ██║██╔═██╗ ██║██║ ██║██║╚██╔╝██║
███████╗╚██████╔╝██║ ██╗██║╚██████╔╝██║ ╚═╝ ██║
╚══════╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═════╝ ╚═╝ ╚═╝
Speaker Diarization & Splitting System
A powerful Python script that automatically identifies different speakers in an audio file and splits them into separate, clean tracks. Built by Lukium.
Overview
This project uses AI-powered speaker diarization (thanks to pyannote.audio) to process audio files containing multiple speakers. It intelligently determines who is speaking and when, then exports a separate audio file for each person.
The key feature is its ability to remove crosstalk. The output tracks contain silence when the speaker is not talking, ensuring that overlapping speech is eliminated. This makes it an ideal tool for podcast editing, interview transcription, character animation workflows, and any other task requiring isolated speaker audio.
Features
- 🎙️ Multi-Speaker Diarization: Identifies and separates an unlimited number of speakers in a single audio file.
- 🧹 Crosstalk Removal: Generates clean, non-overlapping audio tracks for each speaker.
- ⚙️ Batch Processing: Automatically processes all supported audio files (
.wav,.mp3,.m4a,.flac) in theaudio/pendingdirectory. - 🚀 GPU Acceleration: Automatically detects and uses an NVIDIA GPU for significantly faster processing.
- 🗣️ Flexible Speaker Count: You can specify an exact number of speakers, a min/max range, or let the model detect it automatically.
- 🤫 Verbose/Quiet Mode: Run in quiet mode for clean output, or use the
--verboseflag to see detailed logs for debugging.
🤖 Automated Sanity Checks
The main split_speakers.py script is designed to make the first run as smooth as possible by including automated checks for common setup problems. If you forget a step, the script will try to help you fix it.
Missing FFmpeg: If the script can't find ffmpeg in your system's PATH, it will print an error with instructions and automatically open the FFmpeg download page in your browser before exiting.
Hugging Face Model Access: The script proactively checks if you have accepted the user agreements for the required pyannote models. If you haven't accepted one, it will print a message identifying the specific model and automatically open its Hugging Face page for you to accept the terms.
Prerequisites
Before you begin, ensure you have the following installed on your system:
- Python 3.9+
- Git (for cloning the repository).
- NVIDIA GPU with CUDA Drivers (required for GPU acceleration).
- FFmpeg: The script requires FFmpeg for audio processing.
- Download from: https://www.gyan.dev/ffmpeg/builds/
- Ensure the
binfolder from the download is added to your system'sPATH.
Setup & Installation
This project uses uv for fast and reliable Python package management. The setup process is guided by an interactive script.
-
Clone the Repository
git clone <your-repository-url> cd <your-repository-folder>
-
Install
uvIf you don't haveuvinstalled, follow the official instructions for your OS: https://github.com/astral-sh/uv -
Create & Activate a Virtual Environment It's critical to run this project in a dedicated virtual environment. Run your terminal as an Administrator for this process on Windows.
# Create the environment with pip bootstrapped uv venv .venv --seed # Activate it (on Windows) .venv\Scripts\activate
-
Run the Interactive Setup Script This script will detect your hardware and install the correct dependencies.
python install.pyFollow the on-screen prompts. If you have an NVIDIA GPU, it will ask if you want to install the CUDA-enabled libraries.
-
Create
.envFile Create a file named.envin the project's root directory. Get a read access token from Hugging Face and add it to the file:HF_TOKEN=hf_YourAccessTokenGoesHere -
Accept Hugging Face Agreements You must accept the user conditions for the gated models used by this project. Visit the links below, make sure you are logged in, and click the "Access repository" button on each page.
Usage
- Place Files: Add the audio files you want to process into the
audio/pendingdirectory. - Run the Script: Execute the script from your terminal with your virtual environment active.
Command Examples:
- Automatic speaker detection:
python split_speakers.py - Specify an exact number of speakers (e.g., 2):
python split_speakers.py 2
- Specify a range of speakers (e.g., min 2, max 4):
python split_speakers.py 2 4
- Run in verbose/debug mode:
python split_speakers.py --verbose
File Workflow:
- Input:
audio/pending/your_file.wav - Processed Original:
audio/processed/your_file.wav - Output:
audio/completed/your_file_SPEAKER_00.wav,audio/completed/your_file_SPEAKER_01.wav, etc.
Troubleshooting
Permission deniedErrors during Setup: You must run your terminal (PowerShell/Command Prompt) as an Administrator on Windows to ensure the setup script can write to the virtual environment directory.nvidia-smiNot Found: This means your NVIDIA drivers are not installed correctly ornvidia-smi.exeis not in your system'sPATH.- Hugging Face Errors: If you get a
401orGatedRepoError, double-check that yourHF_TOKENin the.envfile is correct and that you have accepted the user agreements for both required models. - Latest Libraries Causing Bugs? If you suspect a new library version has introduced a bug, you can install a known-stable set of dependencies by running the setup script in failsafe mode:
python install.py --failsafe.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speaker_diarization_system-1.0.2.tar.gz.
File metadata
- Download URL: speaker_diarization_system-1.0.2.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3006a252963d8176e506e34b111fdc2feb6b14711117b51234686114e1e4a713
|
|
| MD5 |
f1310626e4cc7b0a73430709206d5566
|
|
| BLAKE2b-256 |
b5c318e27205a74226b02c90fe4754fc6ce4d05470ae44bcefd3b94043eed827
|
File details
Details for the file speaker_diarization_system-1.0.2-py3-none-any.whl.
File metadata
- Download URL: speaker_diarization_system-1.0.2-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1624f6f91bfbc5f41aedb66b6b1b59342641ee24ba44e556a8a97a585252cdf
|
|
| MD5 |
01e0e6920846778412c029670d21622f
|
|
| BLAKE2b-256 |
cc44c76189cc22ac283586c89bd77e1e5e8dde43c26fd11b0814c5c552954309
|