Project description

Pytranscript 🎙️

Pytranscript is a powerful Python library and command-line tool designed to seamlessly convert video or audio files into text and translate them into various languages. It acts as a simple yet effective wrapper around Vosk, ffmpeg, and deep-translator, making the transcription and translation process straightforward.

Prerequisites

Before using pytranscript, ensure you have the following dependencies installed:

ffmpeg for audio conversion.
vosk-models required for speech recognition. You will have to specify to your specific model path in the --model argument.

Installation

pip install pytranscript

Usage

Command Line

pytranscript INPUT_FILE [OPTIONS]

Options

-m, --model - Path to the Vosk model directory. Always required.
-o, --output - Output file where the text will be saved. Default: input file name with .txt extension.
-li, --lang_input - Language of the input / the model. Default: auto.
-lo --lang_input - Language to translate the text to. Default: no translation.
-s, --start - Start time of the audio to transcribe in seconds.
-e, --end - End time of the audio to transcribe in seconds.
--max_size - Will stop the transcription if the output file reaches the specified size in bytes. Takes precedence over the --end option.
--keep-wav - Keep the converted audio wav file after the process is done.
-v, -verbosity - Verbosity level. 0: no output, 1: only errors, 2: errors, info and progressbar, 3: debug. Default: 2.

Example

The most basic usage is:

pytranscript video.mp4 -m vosk-model-en-us-aspire-0.2 -lo fr

Where vosk-model-en-us-aspire-0.2 is the Vosk model directory. The text will be translated from English to French, and the output will be saved in a file named video.txt.

Using the keep-wav option can be useful if you want to do many transcriptions within the same file, allowing you to use the same .wav file for each transcription, thus saving conversion time. ⚠️ The .wav file is cropped according to the start and end time options.

API

The API provides a Transcript object containing the time and text. The translate method can be used to get another Transcript object with the translated text. The output saved in a file in the cli is just the string str(transcript).

A reproduction of the previous example using the API:

import pytranscript as pt

wav_file = pt.to_valid_wav('video.mp4', "video.wav", start=0, end=None)
transcript = pt.transcribe(wav_file, model='vosk-model-en-us-aspire-0.2', max_size=None)
transcript_fr = transcript.translate('fr')

with open('video.txt', 'w', encoding="utf8") as f:
    f.write(str(transcript_fr))

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.1

Mar 23, 2024

0.1.0

Mar 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytranscript-0.1.1.tar.gz (8.6 kB view hashes)

Uploaded Mar 23, 2024 Source

Built Distribution

pytranscript-0.1.1-py3-none-any.whl (8.2 kB view hashes)

Uploaded Mar 23, 2024 Python 3

Hashes for pytranscript-0.1.1.tar.gz

Hashes for pytranscript-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f21cbc8ed3a0c78f00a55d2568bbdb8083cd62e7910316da383b46df91d73510`
MD5	`26d3f9fc3329bc3dcc54214f26948eb1`
BLAKE2b-256	`0d167d4b00ed27877f1b1f29e17a1b7960ddae723dd115eab265fdb89dacee8d`

Hashes for pytranscript-0.1.1-py3-none-any.whl

Hashes for pytranscript-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b6c0a9c00627498b07744ea01899869b9b0e3e4ac8d95431b78eca401c6aa54`
MD5	`3215049bc01874eb3850b5706de8c3ec`
BLAKE2b-256	`6711b3928ec034cd639be86919ce1c58bff36bb59ee4add188b472aeac2dd59e`