Skip to main content

ivrit.ai helper package

Project description

ivrit

Python package providing wrappers around ivrit.ai's capabilities.

Installation

pip install ivrit

Usage

Audio Transcription

The ivrit package provides audio transcription functionality using multiple engines.

Basic Usage

import ivrit

# Transcribe a local audio file
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2")
result = model.transcribe(path="audio.mp3")

# With custom device
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2", device="cpu")
result = model.transcribe(path="audio.mp3")

print(result["text"])

Transcribe from URL

# Transcribe audio from a URL
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2")
result = model.transcribe(url="https://example.com/audio.mp3")

print(result["text"])

Streaming Results

# Get results as a stream (generator)
model = ivrit.load_model(engine="faster-whisper", model="base")
for segment in model.transcribe(path="audio.mp3", stream=True, verbose=True):
    print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")

# Or use the model directly
model = ivrit.FasterWhisperModel(model="base")
for segment in model.transcribe(path="audio.mp3", stream=True):
    print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")

# Access word-level timing
for segment in model.transcribe(path="audio.mp3", stream=True):
    print(f"Segment: {segment.text}")
    for word in segment.extra_data.get('words', []):
        print(f"  {word['start']:.2f}s - {word['end']:.2f}s: '{word['word']}'")

Async Transcription (RunPod Only)

For RunPod models, you can use async transcription for better performance:

import asyncio
from ivrit.audio import load_model

async def transcribe_async():
    # Load RunPod model
    model = load_model(
        engine="runpod",
        model="large-v3-turbo",
        api_key="your-api-key",
        endpoint_id="your-endpoint-id"
    )
    
    # Stream results asynchronously
    async for segment in model.transcribe_async(path="audio.mp3", language="he"):
        print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")

# Run the async function
asyncio.run(transcribe_async())

Note: Async transcription is only available for RunPod models. The sync transcribe() method uses the original sync implementation.

API Reference

load_model()

Load a transcription model for the specified engine and model.

Parameters

  • engine (str): Transcription engine to use. Options: "faster-whisper", "stable-ts"
  • model (str): Model name for the selected engine
  • device (str, optional): Device to use for inference. Default: "auto". Options: "auto", "cpu", "cuda", "cuda:0", etc.
  • model_path (str, optional): Custom path to the model (for faster-whisper)

Returns

  • TranscriptionModel object that can be used for transcription

Raises

  • ValueError: If the engine is not supported
  • ImportError: If required dependencies are not installed

transcribe() and transcribe_async()

Transcribe audio using the loaded model.

Parameters

  • path (str, optional): Path to the audio file to transcribe
  • url (str, optional): URL to download and transcribe
  • blob (str, optional): Base64 encoded blob data to transcribe
  • language (str, optional): Language code for transcription (e.g., 'he' for Hebrew, 'en' for English)
  • stream (bool, optional): Whether to return results as a generator (True) or full result (False) - only for transcribe()
  • diarize (bool, optional): Whether to enable speaker diarization
  • verbose (bool, optional): Whether to enable verbose output
  • **kwargs: Additional keyword arguments for the transcription model

Returns

  • transcribe(): If stream=True: Generator yielding transcription segments, If stream=False: Complete transcription result as dictionary
  • transcribe_async(): AsyncGenerator yielding transcription segments

Raises

  • ValueError: If multiple input sources are provided, or none is provided
  • FileNotFoundError: If the specified path doesn't exist
  • Exception: For other transcription errors

Note: transcribe_async() is only available for RunPod models and always returns an AsyncGenerator.

Architecture

The ivrit package uses an object-oriented design with a base TranscriptionModel class and specific implementations for each transcription engine.

Model Classes

  • TranscriptionModel: Abstract base class for all transcription models
  • FasterWhisperModel: Implementation for the Faster Whisper engine

Usage Patterns

Pattern 1: Using load_model() (Recommended)

# Step 1: Load the model
model = ivrit.load_model(engine="faster-whisper", model="base")

# Step 2: Transcribe audio
result = model.transcribe(path="audio.mp3")

Pattern 2: Direct Model Creation

# Create model directly
model = ivrit.FasterWhisperModel(model="base")

# Use the model
result = model.transcribe(path="audio.mp3")

Multiple Transcriptions

For multiple transcriptions, load the model once and reuse it:

# Load model once
model = ivrit.load_model(engine="faster-whisper", model="base")

# Use for multiple transcriptions
result1 = model.transcribe(path="audio1.mp3")
result2 = model.transcribe(path="audio2.mp3")
result3 = model.transcribe(path="audio3.mp3")

Installation

Basic Installation

pip install ivrit

With Faster Whisper Support

pip install ivrit[faster-whisper]

Supported Engines

faster-whisper

Fast and accurate speech recognition using the Faster Whisper model.

Model Class: FasterWhisperModel

Available Models: base, large, small, medium, large-v2, large-v3

Features:

  • Word-level timing information
  • Language detection with confidence scores
  • Support for custom devices (CPU, CUDA, etc.)
  • Support for custom model paths
  • Streaming transcription

Dependencies: faster-whisper>=1.1.1

stable-ts

Stable and reliable transcription using Stable-TS models.

Status: Not yet implemented

Development

Installation for Development

git clone <repository-url>
cd ivrit
pip install -e ".[dev]"

Running Tests

pytest

Code Formatting

black .
isort .

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ivrit-0.1.8.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ivrit-0.1.8-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file ivrit-0.1.8.tar.gz.

File metadata

  • Download URL: ivrit-0.1.8.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for ivrit-0.1.8.tar.gz
Algorithm Hash digest
SHA256 8ba23d4e85d368adee5db02849b84ebc1bf08389e91c5063561e8884c79cebf4
MD5 c8d2915395e9b35df817871004e76352
BLAKE2b-256 35c1e840c4862d56be1d60c9454c304110abaf6d8fa4762eab4b8d5d0572767a

See more details on using hashes here.

File details

Details for the file ivrit-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: ivrit-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for ivrit-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 da483653bab338f66469f5af92537db66d251375714c6234ead2b2162c41ea53
MD5 a3065ee7ce103b037c09f33518b16928
BLAKE2b-256 dcf644747dfcc4fa899e8c8fe23d68c1b5e5e429911c0daa8c4613f4062419a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page