SoberMind Offline Session Transcriber with Speaker Diarization
Project description
SoberMind Session Transcriber
An offline-first, private speech-to-text script utilizing OpenAI's Whisper models for local transcription, with optional PyAnnote.audio integration for multi-speaker diarization (speaker separation).
1. System Requirements & Setup
This script runs completely locally on your machine, ensuring absolute confidentiality for your therapy sessions.
Step A: Install FFMPEG
The transcription backend requires ffmpeg to process audio files:
- Windows: Download ffmpeg via chocolatey (
choco install ffmpeg) or from the official website, and add itsbindirectory to your systemPATH. - macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg
Step B: Install Python Packages
Install the required packages in your Python environment:
pip install openai-whisper torch
2. Multi-Speaker Diarization (Optional)
To separate speakers (e.g. distinguishing between Speaker 0 and Speaker 1):
- Install the diarization dependencies:
pip install pyannote.audio
- Go to Hugging Face and accept the user agreements for these models (requires creating a free account):
- Generate a User Access Token (Read Permission) on your Hugging Face Settings Page.
3. Usage Reference
Standard Transcription (No Speaker Separation)
Runs fully offline immediately:
python transcribe.py path/to/session.mp3
Transcribe with Multi-Speaker Diarization
Splits conversation segments by speaker automatically:
python transcribe.py path/to/session.mp3 --hf-token "YOUR_HF_TOKEN"
Options
--model: Footprint of model to load (tiny,base,small,medium,large). Defaults tobase, which balances speed and accuracy on standard laptops.--output: Specify base output name.
Outputs are generated in both:
.md: A structured Markdown dialogue format..txt: A timestamped plaintext dialogue transcript.
4. Web-Based GUI Dashboard
For a premium, interactive editing experience, you can launch the local GUI server:
python gui_server.py [port]
- Default Port:
8080 - Local Address:
http://localhost:8080
GUI Features:
- Drag-and-Drop Form: Easily input your audio target file, Hugging Face Token, and select Whisper model sizes dynamically.
- Live Console Log: Watch the terminal status updates and model downloads inside a scrollable screen.
- Dialogue Workspace:
- Edit transcribed text blocks on the fly.
- Speaker Renamer: Rename default speaker codes (e.g.
SPEAKER_00toMe,SPEAKER_01toDr. Jameson) and instantly replace them across the entire dialogue history. - Export Controls: One-click copy formatted Markdown dialogue logs or download local JSON objects.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sobertranscribe-0.0.2.tar.gz.
File metadata
- Download URL: sobertranscribe-0.0.2.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45190b340800c7a55849e7fd743063f50d882a698d9b203029f308fb817f385f
|
|
| MD5 |
9fff94809acf0bfbf8d5d8a88b27a6a8
|
|
| BLAKE2b-256 |
c8961b445e527eb1f5dd782e407a854b84c9474b11c901eed0417b665fcf8fbe
|
File details
Details for the file sobertranscribe-0.0.2-py3-none-any.whl.
File metadata
- Download URL: sobertranscribe-0.0.2-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8359e0c254240ac0f42cabb6c2bfbe90b3cc2418c5314cfbfd3d89ebd1258c31
|
|
| MD5 |
ae23f89664e6f3ab2e5f720becf4fad2
|
|
| BLAKE2b-256 |
afacbcfc189c33ce571196fa0d8f205f365526b097bdd5da46bd7ff741658029
|