Speech to text using Parakeet TDT model

These details have not been verified by PyPI

Project links

Project description

Sinapsis Parakeet TDT

Templates for advanced speech-to-text transcription with NVIDIA Parakeet TDT

🐍 Installation • 🚀 Features • 📚 Usage example • 🌐 Webapp • 📙 Documentation • 🔍 License

This Sinapsis Parakeet TDT package provides a template for seamlessly integrating, configuring, and running speech-to-text (STT) functionalities powered by NVIDIA's Parakeet TDT model.

🐍 Installation

Install using your favourite package manager. We strongly encourage the use of uv, although any other package manager should work too. If you need to install uv please see the official documentation.

Example with uv:

  uv pip install sinapsis-parkeet-tdt --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-parkeet-tdt --extra-index-url https://pypi.sinapsis.tech

[!IMPORTANT] Templates in each package may require extra dependencies. For development, we recommend installing the package with all the optional dependencies:

with uv:

  uv pip install sinapsis-parkeet-tdt[all] --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-parkeet-tdt[all] --extra-index-url https://pypi.sinapsis.tech

🚀 Features

Templates Supported

This module includes a template for speech-to-text transcription using the Parakeet TDT model:

ParakeetTDTInference: Converts speech to text using NVIDIA's Parakeet TDT 0.6B model. This template processes audio packets from the input container or specified file paths, performs transcription with optional timestamp prediction, and adds the resulting text packets to the container.

Attributes

model_name (str): Name or path of the Parakeet TDT model. Defaults to "nvidia/parakeet-tdt-0.6b-v2".
audio_paths (list[str] | None): Optional list of audio file paths to transcribe. If None, audio will be taken from the AudioPackets in the DataContainer. Defaults to None.
enable_timestamps (bool): Whether to generate timestamps for the transcription. Defaults to False.
timestamp_level (Literal["char", "word", "segment"]): Level of timestamp detail. Defaults to "word".
device (Literal["cpu", "cuda"]): Device to run the model on. Defaults to "cuda".
refresh_cache (bool): Whether to refresh the cache when downloading the model. Defaults to False.

[!TIP] Use CLI command sinapsis info --example-template-config TEMPLATE_NAME to produce an example Agent config for the Template specified in TEMPLATE_NAME.

For example, for ParakeetTDTInference use sinapsis info --example-template-config ParakeetTDTInference to produce an example config like:

agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: ParakeetTDTInference
  class_name: ParakeetTDTInference
  template_input: InputTemplate
  attributes:
    model_name: "nvidia/parakeet-tdt-0.6b-v2"
    audio_paths: []
    enable_timestamps: false
    timestamp_level: "word"
    device: "cuda"
    refresh_cache: false

📚 Usage example

This example illustrates how to use the ParakeetTDTInference template for speech-to-text transcription. It converts audio input into text using NVIDIA's Parakeet TDT model.

Config

agent:
  name: parakeet_tdt_agent
  description: "Agent that transcribes speech to text using the NVIDIA Parakeet TDT model."

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: AudioReaderSoundfile
  class_name: AudioReaderSoundfile
  template_input: InputTemplate
  attributes:
    audio_file_path: "artifacts/sample.wav"
    source: "artifacts/sample.wav"

- template_name: ParakeetTDTInference
  class_name: ParakeetTDTInference
  template_input: AudioReaderSoundfile
  attributes:
    model_name: "nvidia/parakeet-tdt-0.6b-v2"
    enable_timestamps: true
    timestamp_level: "word"
    device: "cuda"

This configuration defines a complete pipeline for speech-to-text transcription:

First, an audio file is read using the AudioReaderSoundfile template
The audio is then processed by the ParakeetTDTInference template, which transcribes it to text
The transcription is saved to a text file using the TextWriter template

[!IMPORTANT] The AudioReaderSoundfile and TextWriter templates correspond to sinapsis-data-readers. If you want to use the example, please make sure you install these packages.

To run the config, use the CLI:

sinapsis run name_of_config.yml

🌐 Webapp

The webapp included in this project showcases the capabilities of the Parakeet TDT model for speech recognition tasks.

[!IMPORTANT] To run the app you first need to clone this repository:

git clone git@github.com:Sinapsis-ai/sinapsis-speech.git
cd sinapsis-speech

[!NOTE] If you'd like to enable external app sharing in Gradio, export GRADIO_SHARE_APP=True

🐳 Docker

IMPORTANT This docker image depends on the sinapsis-nvidia:base image. Please refer to the official sinapsis instructions to Build with Docker.

Build the sinapsis-speech image:

docker compose -f docker/compose.yaml build

Start the app container:

docker compose -f docker/compose_apps.yaml up -d sinapsis-parakeet-tdt

Check the logs

docker logs -f sinapsis-parakeet-tdt

The logs will display the URL to access the webapp, e.g.,::

Running on local URL:  http://127.0.0.1:7860

NOTE: The url may be different, check the output of logs.

To stop the app:

docker compose -f docker/compose_apps.yaml down

💻 UV

To run the webapp using the uv package manager, follow these steps:

Sync the virtual environment:

uv sync --frozen

Install the wheel:

uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech

Run the webapp:

uv run webapps/speech_to_text_apps/parakeet_tdt_app.py

The terminal will display the URL to access the webapp (e.g.):

Running on local URL:  http://127.0.0.1:7860

NOTE: The URL may vary; check the terminal output for the correct address.

📙 Documentation

Documentation is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.5

Dec 9, 2025

0.1.4

Aug 29, 2025

0.1.3

Aug 25, 2025

0.1.2

Aug 5, 2025

0.1.1

Jul 24, 2025

This version

0.1.0

Jun 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_parakeet_tdt-0.1.0.tar.gz (21.2 kB view details)

Uploaded Jun 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sinapsis_parakeet_tdt-0.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Jun 5, 2025 Python 3

File details

Details for the file sinapsis_parakeet_tdt-0.1.0.tar.gz.

File metadata

Download URL: sinapsis_parakeet_tdt-0.1.0.tar.gz
Upload date: Jun 5, 2025
Size: 21.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.16

File hashes

Hashes for sinapsis_parakeet_tdt-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ff250f9191522af378eca71a76594532beaf9f4c6bb451a85ce9dc60131c1e96`
MD5	`d51bc9755d99073e08a64449bce57b5a`
BLAKE2b-256	`d3e709781a336fe9391e3bce8b66e22fa2726f6a213f8139edf100b923907f63`

See more details on using hashes here.

File details

Details for the file sinapsis_parakeet_tdt-0.1.0-py3-none-any.whl.

File metadata

Download URL: sinapsis_parakeet_tdt-0.1.0-py3-none-any.whl
Upload date: Jun 5, 2025
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.16

File hashes

Hashes for sinapsis_parakeet_tdt-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab565271e7f42447f90ae59373704ea2f440473821f15b615a3c29d9065ef502`
MD5	`ef25010fd107e7a243882f267057ac07`
BLAKE2b-256	`b52188f40af0a96869a4d6cfbc742b84910f2615a008693f7316f5f80827bdc5`

See more details on using hashes here.

sinapsis-parakeet-tdt 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

Sinapsis Parakeet TDT

Templates for advanced speech-to-text transcription with NVIDIA Parakeet TDT

🐍 Installation

🚀 Features

Templates Supported

📚 Usage example

🌐 Webapp

📙 Documentation

🔍 License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes