Transcribes audio files
Project description
Transcribes audio files
pip install audiotranser
Tested against Windows 10 / Python 3.10 / Anaconda
Uses the models from https://huggingface.co/ggerganov/whisper.cpp/tree/main
Args:
inputfile: path to the input audio file
small_large: model size (small or large)
blas: use BLAS library for faster decoding
silence_threshold: silence threshold in milliseconds
min_silence_len: minimum silence length in milliseconds
keep_silence: minimum silence length to keep after silence removal
threads: number of threads to use
processors: number of processors to use
offset_t: time offset in milliseconds
offset_n: segment index offset
duration: duration of audio to process in milliseconds
max_context: maximum number of text context tokens to store
max_len: maximum segment length in characters
best_of: number of best candidates to keep
beam_size: beam size for beam search
word_thold: word timestamp probability threshold
entropy_thold: entropy threshold for decoder fail
logprob_thold: log probability threshold for decoder fail
speed_up: speed up audio by x2 (reduced accuracy)
translate: translate from source language to english
diarize: stereo audio diarization
language: spoken language ('auto' for auto_detect)
Returns:
Pandas DataFrame with the results of the inference or the path to the output CSV file if pd.read_csv fails.
from audiotranser import transcribe_audio
df = transcribe_audio(
inputfile=r"C:\untitled.wav",
small_large="large",
blas=True,
silence_threshold=-30, # ignored if == 0 or None
min_silence_len=500, # ignored if silence_threshold == 0 or None
keep_silence=1000, # ignored if silence_threshold == 0 or None
threads=3, # number of threads to use during computation
processors=1, # number of processors to use during computation
offset_t=0, # time offset in milliseconds
offset_n=0, # segment index offset
duration=0, # duration of audio to process in milliseconds
max_context=-1, # maximum number of text context tokens to store
max_len=0, # maximum segment length in characters
best_of=5, # number of best candidates to keep
beam_size=-1, # beam size for beam search
word_thold=0.01, # word timestamp probability threshold
entropy_thold=2.40, # entropy threshold for decoder fail
logprob_thold=-1.00, # log probability threshold for decoder fail
speed_up=True, # speed up audio by x2 (reduced accuracy)
translate=False, # translate from source language to english
diarize=False, # stereo audio diarization
language="en", # spoken language ('auto' for auto_detect)
)
print(df)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
audiotranser-0.10.tar.gz
(14.0 MB
view details)
Built Distribution
File details
Details for the file audiotranser-0.10.tar.gz
.
File metadata
- Download URL: audiotranser-0.10.tar.gz
- Upload date:
- Size: 14.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f60c1f2b32d281365efbcb1ee8a01f2788eb61f6b4e004e6ffb659952a2b4253 |
|
MD5 | 258564d7a0a32b48ab05b29dd42089d3 |
|
BLAKE2b-256 | 29959de2149217988d09d7836d3aa2f695f02ae85df1e003e671361083fece2b |
File details
Details for the file audiotranser-0.10-py3-none-any.whl
.
File metadata
- Download URL: audiotranser-0.10-py3-none-any.whl
- Upload date:
- Size: 14.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e6d51355d5086f44ce3f2b0a43faffb7af58a2c3de06a960d21a6862dd7d765 |
|
MD5 | 36dbed10f37d5af710339c97600a26c9 |
|
BLAKE2b-256 | 380f8c0a2ec09dd91caf445112310574ddd0e217598ccff9a166ff4c66ed37e1 |