Speech transcription package by Shunya Labs with ct2 and transformers backends

These details have not been verified by PyPI

Project links

Project description

Pingala Shunya

A comprehensive speech transcription package by Shunya Labs supporting ct2 (CTranslate2) and transformers backends. Get superior transcription quality with unified API and advanced features.

Overview

Pingala Shunya provides a unified interface for transcribing audio files using state-of-the-art backends optimized by Shunya Labs. Whether you want the high-performance CTranslate2 optimization or the flexibility of Hugging Face transformers, Pingala Shunya delivers exceptional results with the shunyalabs/pingala-v1-en-verbatim model.

Features

Shunya Labs Optimized: Built by Shunya Labs for superior performance
CT2 Backend: High-performance CTranslate2 optimization (default)
Transformers Backend: Hugging Face models and latest research
Auto-Detection: Automatically selects the best backend for your model
Unified API: Same interface across all backends
Word-Level Timestamps: Precise timing for individual words
Confidence Scores: Quality metrics for transcription segments and words
Voice Activity Detection (VAD): Filter out silence and background noise
Language Detection: Automatic language identification
Multiple Output Formats: Text, SRT subtitles, and WebVTT
Streaming Support: Process segments as they are generated
Advanced Parameters: Full control over all backend features
Rich CLI: Command-line tool with comprehensive options
Error Handling: Comprehensive error handling and validation

Installation

Basic Installation (ct2 backend)

pip install pingala-shunya

Backend-Specific Installations

# For Hugging Face transformers support
pip install "pingala-shunya[transformers]"

# For all backends
pip install "pingala-shunya[all]"

# Complete installation with development tools
pip install "pingala-shunya[complete]"

Requirements

Python 3.8 or higher
CUDA-compatible GPU (recommended for optimal performance)
PyTorch and torchaudio

Supported Backends

ct2 (CTranslate2) - Default

Performance: Fastest inference with CTranslate2 optimization
Features: Full parameter control, VAD, streaming, GPU acceleration
Models: All compatible models, optimized for Shunya Labs models
Best for: Production use, real-time applications

transformers

Performance: Good performance with Hugging Face ecosystem
Features: Access to latest models, easy fine-tuning integration
Models: Any Seq2Seq model on Hugging Face Hub
Best for: Research, latest models, custom transformer models

Supported Models

Default Model

shunyalabs/pingala-v1-en-verbatim - High-quality English transcription model by Shunya Labs

Shunya Labs Models

shunyalabs/pingala-v1-en-verbatim - Optimized for English verbatim transcription
More Shunya Labs models coming soon!

Custom Models (Advanced Users)

Any Hugging Face Seq2Seq model compatible with automatic-speech-recognition pipeline
Local model paths supported

Local Models

/path/to/local/model - Local model directory or file

Quick Start

Basic Usage with Auto-Detection

from pingala_shunya import PingalaTranscriber

# Initialize with default Shunya Labs model and auto-detected backend
transcriber = PingalaTranscriber()

# Simple transcription
segments = transcriber.transcribe_file_simple("audio.wav")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Backend Selection

from pingala_shunya import PingalaTranscriber

# Explicitly choose backends with Shunya Labs model
transcriber_ct2 = PingalaTranscriber(model_name="shunyalabs/pingala-v1-en-verbatim", backend="ct2")
transcriber_tf = PingalaTranscriber(model_name="shunyalabs/pingala-v1-en-verbatim", backend="transformers")  

# Auto-detection (recommended)
transcriber_auto = PingalaTranscriber()  # Uses default Shunya Labs model with ct2

Advanced Usage with All Features

from pingala_shunya import PingalaTranscriber

# Initialize with specific backend and settings
transcriber = PingalaTranscriber(
    model_name="shunyalabs/pingala-v1-en-verbatim",
    backend="ct2",
    device="cuda", 
    compute_type="float16"
)

# Advanced transcription with full metadata
segments, info = transcriber.transcribe_file(
    "audio.wav",
    beam_size=10,                    # Higher beam size for better accuracy
    word_timestamps=True,            # Enable word-level timestamps
    temperature=0.0,                 # Deterministic output
    compression_ratio_threshold=2.4, # Filter out low-quality segments
    log_prob_threshold=-1.0,         # Filter by probability
    no_speech_threshold=0.6,         # Silence detection threshold
    initial_prompt="High quality audio recording",  # Guide the model
    hotwords="Python, machine learning, AI",        # Boost specific words
    vad_filter=True,                 # Enable voice activity detection
    task="transcribe"                # or "translate" for translation
)

# Print transcription info
model_info = transcriber.get_model_info()
print(f"Backend: {model_info['backend']}")
print(f"Model: {model_info['model_name']}")
print(f"Language: {info.language} (confidence: {info.language_probability:.3f})")
print(f"Duration: {info.duration:.2f} seconds")

# Process segments with all metadata
for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
    if segment.confidence:
        print(f"Confidence: {segment.confidence:.3f}")
    
    # Word-level details
    for word in segment.words:
        print(f"  '{word.word}' [{word.start:.2f}-{word.end:.2f}s] (conf: {word.probability:.3f})")

Using Transformers Backend

# Use Shunya Labs model with transformers backend
transcriber = PingalaTranscriber(
    model_name="shunyalabs/pingala-v1-en-verbatim",
    backend="transformers"
)

segments = transcriber.transcribe_file_simple("audio.wav")

# Auto-detection will use ct2 by default for Shunya Labs models
transcriber = PingalaTranscriber()  # Uses ct2 backend (recommended)

Command-Line Interface

The package includes a comprehensive CLI supporting both backends:

Basic CLI Usage

# Basic transcription with auto-detected backend
pingala audio.wav

# Specify backend explicitly  
pingala audio.wav --backend ct2
pingala audio.wav --backend transformers

# Use Shunya Labs model with different backends
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim --backend ct2
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim --backend transformers

# Save to file
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim -o transcript.txt

# Use CPU for processing
pingala audio.wav --device cpu

Advanced CLI Features

# Word-level timestamps with confidence scores (ct2)
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim --word-timestamps --show-confidence --show-words

# Voice Activity Detection (ct2 only)
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim --vad --verbose

# Language detection with different backends
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim --detect-language --backend ct2

# SRT subtitles with word-level timing
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim --format srt --word-timestamps -o subtitles.srt

# Transformers backend with Shunya Labs model
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim --backend transformers --verbose

# Advanced parameters (ct2)
pingala audio.wav --model shunyalabs/pingala-v1-en-verbatim \
  --beam-size 10 \
  --temperature 0.2 \
  --compression-ratio-threshold 2.4 \
  --log-prob-threshold -1.0 \
  --initial-prompt "This is a technical presentation" \
  --hotwords "Python,AI,machine learning"

CLI Options Reference

Option	Description	Backends	Default
`--model`	Model name or path	All	shunyalabs/pingala-v1-en-verbatim
`--backend`	Backend selection	All	auto-detect
`--device`	Device: cuda, cpu, auto	All	cuda
`--compute-type`	Precision: float16, float32, int8	All	float16
`--beam-size`	Beam size for decoding	All	5
`--language`	Language code (e.g., 'en')	All	auto-detect
`--word-timestamps`	Enable word-level timestamps	ct2	False
`--show-confidence`	Show confidence scores	All	False
`--show-words`	Show word-level details	All	False
`--vad`	Enable VAD filtering	ct2	False
`--detect-language`	Language detection only	All	False
`--format`	Output format: text, srt, vtt	All	text
`--temperature`	Sampling temperature	All	0.0
`--compression-ratio-threshold`	Compression ratio filter	ct2	2.4
`--log-prob-threshold`	Log probability filter	ct2	-1.0
`--no-speech-threshold`	No speech threshold	All	0.6
`--initial-prompt`	Initial prompt text	All	None
`--hotwords`	Hotwords to boost	ct2	None
`--task`	Task: transcribe, translate	All	transcribe

Backend Comparison

Feature	ct2	transformers
Performance	Fastest	Good
GPU Acceleration	Optimized	Standard
Memory Usage	Lowest	Moderate
Model Support	Any model	Any HF model
Word Timestamps	Full support	Limited
VAD Filtering	Built-in	No
Streaming	True streaming	Batch only
Advanced Params	All features	Basic
Latest Models	Updated	Latest
Custom Models	CTranslate2	Any format

Recommendations

Production/Performance: Use ct2 with Shunya Labs models
Latest Research Models: Use transformers
Real-time Applications: Use ct2 with VAD
Custom Transformer Models: Use transformers

Performance Optimization

Backend Selection Tips

# Real-time/Production: Use ct2 with Shunya Labs model
transcriber = PingalaTranscriber(model_name="shunyalabs/pingala-v1-en-verbatim", backend="ct2")

# Maximum accuracy: Use Shunya Labs model with ct2  
transcriber = PingalaTranscriber(model_name="shunyalabs/pingala-v1-en-verbatim", backend="ct2")

# Alternative backend: Use transformers with Shunya Labs model
transcriber = PingalaTranscriber(model_name="shunyalabs/pingala-v1-en-verbatim", backend="transformers")

# Research/Latest models: Use transformers backend
transcriber = PingalaTranscriber(model_name="shunyalabs/pingala-v1-en-verbatim", backend="transformers")

Hardware Recommendations

Use Case	Model	Backend	Hardware
Real-time	shunyalabs/pingala-v1-en-verbatim	ct2	GPU 4GB+
Production	shunyalabs/pingala-v1-en-verbatim	ct2	GPU 6GB+
Maximum Quality	shunyalabs/pingala-v1-en-verbatim	ct2	GPU 8GB+
Alternative	shunyalabs/pingala-v1-en-verbatim	transformers	GPU 4GB+
CPU-only	shunyalabs/pingala-v1-en-verbatim	any	8GB+ RAM

Examples

See example.py for comprehensive examples:

# Run with default backend (auto-detected)
python example.py audio.wav

# Test specific backends with Shunya Labs model
python example.py audio.wav --backend ct2
python example.py audio.wav --backend transformers  

# Test Shunya Labs model with different backends
python example.py audio.wav shunyalabs/pingala-v1-en-verbatim

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built by Shunya Labs for superior transcription quality
Powered by CTranslate2 for optimized inference
Supports Hugging Face transformers
Uses the Pingala model from Shunya Labs

About Shunya Labs

Visit Shunya Labs to learn more about our AI research and products. Contact us at 0@shunyalabs.ai for questions or collaboration opportunities.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.7

Nov 7, 2025

0.1.6

Aug 4, 2025

0.1.5

Jul 25, 2025

0.1.4

Jul 25, 2025

0.1.3

Jul 23, 2025

0.1.2

Jul 23, 2025

0.1.1

Jul 23, 2025

This version

0.1.0

Jul 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pingala_shunya-0.1.0.tar.gz (18.9 kB view details)

Uploaded Jul 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pingala_shunya-0.1.0-py3-none-any.whl (16.5 kB view details)

Uploaded Jul 23, 2025 Python 3

File details

Details for the file pingala_shunya-0.1.0.tar.gz.

File metadata

Download URL: pingala_shunya-0.1.0.tar.gz
Upload date: Jul 23, 2025
Size: 18.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pingala_shunya-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d05d9cfeda5300bed4686be78c35c3e781333dafa9d9cd52cbf2163c87660b03`
MD5	`aa68014f8c08ca39c7f418c717e614d4`
BLAKE2b-256	`42a27e93d93243fcdda10e98dc8c3630fedc84e5979c850d79c96a250823a112`

See more details on using hashes here.

File details

Details for the file pingala_shunya-0.1.0-py3-none-any.whl.

File metadata

Download URL: pingala_shunya-0.1.0-py3-none-any.whl
Upload date: Jul 23, 2025
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pingala_shunya-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dcb5d0450ac95995212703ff03b51f7aa4a6142592714d5c627d1a72c80409a1`
MD5	`98d132ec11f3501892abea9cd559b2e0`
BLAKE2b-256	`f06e7fc119af87d06ab5ea975ce00098e2d4d94982ae3c034c1e1391c75b20e8`

See more details on using hashes here.

pingala-shunya 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pingala Shunya

Overview

Features

Installation

Basic Installation (ct2 backend)

Backend-Specific Installations

Requirements

Supported Backends

ct2 (CTranslate2) - Default

transformers

Supported Models

Default Model

Shunya Labs Models

Custom Models (Advanced Users)

Local Models

Quick Start

Basic Usage with Auto-Detection

Backend Selection

Advanced Usage with All Features

Using Transformers Backend

Command-Line Interface

Basic CLI Usage

Advanced CLI Features

CLI Options Reference

Backend Comparison

Recommendations

Performance Optimization

Backend Selection Tips

Hardware Recommendations

Examples

Contributing

License

Acknowledgments

About Shunya Labs

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes