Skip to main content

Utilities for transcribing audio files using the Whisper API.

Project description

Transcription with OpenAI's Whisper is very accurate, but it doesn't natively support speaker labelling (diarisation).

Existing libraries for diarisation like pyannote rely on audio features to separate and identify speakers, but are computationally expensive and often inaccurate. A common failure mode arises when the speaker changes their audio quality, such as when they move closer to or further from the microphone. This can cause the diarisation algorithm to incorrectly identify the speaker as a new person.

I had a simple hypothesis: the cues from transcribed speech are sufficient to identify speakers. I developed a pipeline which passes the transcribed text to GPT-4o with a prompt asking it to identify the speaker.

flowchart LR
    A[Input file #40;audio, video#41;]
    B[Whisper transcription]
    C[Text output]
    D[Label and tidy with GPT-4o]
    E[Output in user-defined format]
    A-- Convert to WAV -->B
    B-->C
    C-->D
    D-->C
    D-->E

Installation

pip install precisetranscribe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

precisetranscribe-0.2.0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

precisetranscribe-0.2.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file precisetranscribe-0.2.0.tar.gz.

File metadata

  • Download URL: precisetranscribe-0.2.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for precisetranscribe-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ce744df4269d17364507b9a97ba01b7c376948cc1a2212b52d815565e57098c8
MD5 d0b6910d0387ff6dbeecc75d092130a1
BLAKE2b-256 3652e8f898932ac7b4548cdf040796e2938b9341ea1889c904c3ac156a1034ae

See more details on using hashes here.

File details

Details for the file precisetranscribe-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for precisetranscribe-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 368f90578a28b0cd3c7b038affe66bb7cf64e6c76c8b5faf2210c3be527d89fa
MD5 d070f884bd411fbc333a7dc43663efed
BLAKE2b-256 caf460ef010c8d7df28b1ca1731351dddbe2c98d2ee63a0649d1e1e2e0d2b615

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page