Skip to main content

Transcribes audio files

Project description

Transcribes audio files

pip install audiotranser

Tested against Windows 10 / Python 3.10 / Anaconda

Uses the models from https://huggingface.co/ggerganov/whisper.cpp/tree/main

    Args:
        inputfile: path to the input audio file
        small_large: model size (small or large)
        blas: use BLAS library for faster decoding
        silence_threshold: silence threshold in milliseconds
        min_silence_len: minimum silence length in milliseconds
        keep_silence: minimum silence length to keep after silence removal
        threads: number of threads to use
        processors: number of processors to use
        offset_t: time offset in milliseconds
        offset_n: segment index offset
        duration: duration of audio to process in milliseconds
        max_context: maximum number of text context tokens to store
        max_len: maximum segment length in characters
        best_of: number of best candidates to keep
        beam_size: beam size for beam search
        word_thold: word timestamp probability threshold
        entropy_thold: entropy threshold for decoder fail
        logprob_thold: log probability threshold for decoder fail
        speed_up: speed up audio by x2 (reduced accuracy)
        translate: translate from source language to english
        diarize: stereo audio diarization
        language: spoken language ('auto' for auto_detect)

    Returns:
        Pandas DataFrame with the results of the inference or the path to the output CSV file if pd.read_csv fails.

from audiotranser import transcribe_audio
df = transcribe_audio(
    inputfile=r"C:\untitled.wav",
    small_large="large",
    blas=True,
    silence_threshold=-30,  # ignored if == 0 or None
    min_silence_len=500,  # ignored if silence_threshold == 0 or None
    keep_silence=1000,  # ignored if silence_threshold == 0 or None
    threads=3,  # number of threads to use during computation
    processors=1,  # number of processors to use during computation
    offset_t=0,  # time offset in milliseconds
    offset_n=0,  # segment index offset
    duration=0,  # duration of audio to process in milliseconds
    max_context=-1,  # maximum number of text context tokens to store
    max_len=0,  # maximum segment length in characters
    best_of=5,  # number of best candidates to keep
    beam_size=-1,  # beam size for beam search
    word_thold=0.01,  # word timestamp probability threshold
    entropy_thold=2.40,  # entropy threshold for decoder fail
    logprob_thold=-1.00,  # log probability threshold for decoder fail
    speed_up=True,  # speed up audio by x2 (reduced accuracy)
    translate=False,  # translate from source language to english
    diarize=False,  # stereo audio diarization
    language="en",  # spoken language ('auto' for auto_detect)
)
print(df)

Project details


Release history Release notifications | RSS feed

This version

0.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiotranser-0.10.tar.gz (14.0 MB view details)

Uploaded Source

Built Distribution

audiotranser-0.10-py3-none-any.whl (14.2 MB view details)

Uploaded Python 3

File details

Details for the file audiotranser-0.10.tar.gz.

File metadata

  • Download URL: audiotranser-0.10.tar.gz
  • Upload date:
  • Size: 14.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for audiotranser-0.10.tar.gz
Algorithm Hash digest
SHA256 f60c1f2b32d281365efbcb1ee8a01f2788eb61f6b4e004e6ffb659952a2b4253
MD5 258564d7a0a32b48ab05b29dd42089d3
BLAKE2b-256 29959de2149217988d09d7836d3aa2f695f02ae85df1e003e671361083fece2b

See more details on using hashes here.

File details

Details for the file audiotranser-0.10-py3-none-any.whl.

File metadata

  • Download URL: audiotranser-0.10-py3-none-any.whl
  • Upload date:
  • Size: 14.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for audiotranser-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 5e6d51355d5086f44ce3f2b0a43faffb7af58a2c3de06a960d21a6862dd7d765
MD5 36dbed10f37d5af710339c97600a26c9
BLAKE2b-256 380f8c0a2ec09dd91caf445112310574ddd0e217598ccff9a166ff4c66ed37e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page