Skip to main content

A tool for transcribing audio files with optional speaker diarization

Project description

Audio Transcriber

Audio Transcriber is a Python tool for transcribing audio files with optional speaker diarization. It provides both a GUI and a programmable interface for easy audio transcription.

Features

  • Transcribe audio files to text
  • Optional speaker diarization
  • User-friendly GUI
  • Export results to HTML

Installation

Prerequisites

  • Python 3.7 or higher

Setup

We use uv for managing virtual environments and package installation. Follow these steps to set up the project:

On macOS and Linux:

# Download the setup script
curl -O https://raw.githubusercontent.com/yourusername/audio_transcriber/main/setup.sh

# Make the script executable
chmod +x setup.sh

# Run the setup script
./setup.sh

On Windows:

# Download the setup script
Invoke-WebRequest -Uri https://raw.githubusercontent.com/yourusername/audio_transcriber/main/setup.ps1 -OutFile setup.ps1

# Set execution policy to run the script
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass

# Run the setup script
.\setup.ps1

These scripts will:

  1. Install uv if it's not already installed
  2. Create a virtual environment
  3. Activate the virtual environment
  4. Install all required packages

Usage

GUI

To run the GUI:

python examples/simple_gui.py

Programmatic Usage

from audio_transcriber import initialize_models, transcribe_audio, diarize_audio

# Initialize models
initialize_models()

# Transcribe audio
transcription = transcribe_audio("path/to/your/audio/file.mp3")
print(transcription)

# Diarize audio (if available)
diarization = diarize_audio("path/to/your/audio/file.mp3")
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"Speaker {speaker}: {turn.start:.2f} - {turn.end:.2f}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Notes

While FFmpeg is not a direct requirement for this project, some underlying libraries may use it for certain audio processing tasks. If you encounter any issues with audio file handling, consider installing FFmpeg as an additional step.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcribify-0.1.1.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transcribify-0.1.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file transcribify-0.1.1.tar.gz.

File metadata

  • Download URL: transcribify-0.1.1.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for transcribify-0.1.1.tar.gz
Algorithm Hash digest
SHA256 012d1e6e823e52454aca162b1474917028aef9a8fd20172da678f6fc16732cad
MD5 d232b7568aae63bffdfacda67978e796
BLAKE2b-256 890a1a69e17b39eb5de6b248bddc62b3f79507548d666ca1de05b7ad399e28e3

See more details on using hashes here.

File details

Details for the file transcribify-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: transcribify-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for transcribify-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ac010fca60865c99609ecc61049cfb5c73e8a04f442474e7ff64513ab5e52633
MD5 c90655dc9530078acf592d3263d747fa
BLAKE2b-256 5eeef150cc1394855c48d9eb2f8cff45fa808798280d422136edcf498757db3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page