Lattifai Python SDK: Seamless Integration with Lattifai's Speech and Video AI Services

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Lattifai

These details have not been verified by PyPI

Project description

🌐 Official Website | 🖥️ GitHub | 🤗 Model | 📑 Blog |

LattifAI: Precision Alignment, Infinite Possibilities

Advanced forced alignment and subtitle generation powered by 🤗 Lattice-1 model.

⚠️ Note on Current Limitations:

Memory Usage: We are aware of high memory consumption and are actively working on further optimizations.

Long Audio: Currently, long audio files might face issues. Support for long-form audio (up to 20 hours) will be available shortly.

Installation
Quick Start
CLI Reference
Python SDK Reference
Advanced Features
Supported Formats
Roadmap
Development

Installation

Step 1: Install SDK

Using pip:

pip install install-k2
install-k2 --torch-version 2.9.1  # if not set will auto-detect PyTorch version and install compatible k2

pip install lattifai==1.0.2

Using uv (Recommended - 10-100x faster):

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a new project with uv
uv init my-project
cd my-project
source .venv/bin/activate

# Install k2 (required dependency)
uv pip install install-k2
uv pip install pip
uv run install-k2 --torch-version 2.9.1

# Install LattifAI (v1.0.2)
uv pip install lattifai==1.0.2

Note: install-k2 automatically detects your PyTorch version (up to 2.9) and installs the compatible k2 wheel.

install-k2 options

usage: install-k2 [-h] [--system {linux,darwin,windows}] [--dry-run] [--torch-version TORCH_VERSION]

optional arguments:
  -h, help                      Show this help message and exit
  --system {linux,darwin,windows}  Override OS detection
  --dry-run                     Show what would be installed without making changes
  --torch-version TORCH_VERSION    Specify torch version (e.g., 2.8.0)

Step 2: Get Your API Key

LattifAI API Key (Required)

Get your free API key at https://lattifai.com/dashboard/api-keys

Option A: Environment variable (recommended)

export LATTIFAI_API_KEY="lf_your_api_key_here"

Option B: .env file

# .env
LATTIFAI_API_KEY=lf_your_api_key_here

Gemini API Key (Optional - for transcription)

If you want to use Gemini models for transcription (e.g., gemini-2.5-pro), get your free Gemini API key at https://aistudio.google.com/apikey

# Add to environment variable
export GEMINI_API_KEY="your_gemini_api_key_here"

# Or add to .env file
GEMINI_API_KEY=your_gemini_api_key_here  # AIzaSyxxxx

Note: Gemini API key is only required if you use Gemini models for transcription. It's not needed for alignment or when using other transcription models.

Quick Start

Command Line Interface

CLI Demo

# Align local audio with subtitle
lai alignment align audio.wav subtitle.srt output.srt

# Download and align YouTube video
lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID"

Python SDK (5 Lines of Code)

from lattifai import LattifAI

client = LattifAI()
caption = client.alignment(
    input_media="audio.wav",
    input_caption="subtitle.srt",
    output_caption_path="aligned.srt",
)

That's it! Your aligned subtitles are saved to aligned.srt.

Web Interface

web Demo

Install the web application (one-time setup):
```
lai-app-install
```
This command will:
- Check if Node.js/npm is installed (and install if needed)
- Install frontend dependencies
- Build the application
- Setup the lai-app command globally
Start the backend server:
```
lai-server

# Custom port (default: 8001)
lai-server --port 9000

# Custom host
lai-server --host 127.0.0.1 --port 9000

# Production mode (disable auto-reload)
lai-server --no-reload
```
Backend Server Options:
- -p, --port - Server port (default: 8001)
- --host - Host address (default: 0.0.0.0)
- --no-reload - Disable auto-reload for production
- -h, --help - Show help message
Start the frontend application:
```
lai-app

# Custom port (default: 5173)
lai-app --port 8080

# Custom backend URL
lai-app --backend http://localhost:9000

# Don't auto-open browser
lai-app --no-open
```
Frontend Application Options:
- -p, --port - Frontend server port (default: 5173)
- --backend - Backend API URL (default: http://localhost:8001)
- --no-open - Don't automatically open browser
- -h, --help - Show help message
The web interface will automatically open in your browser at http://localhost:5173.

Features:

✅ Automatic backend server status detection
✅ Visual file upload with drag-and-drop
✅ Real-time alignment progress
✅ Multiple subtitle format support
✅ Built-in transcription with multiple models
✅ API key management interface
✅ Download aligned subtitles in various formats

CLI Reference

Command Overview

Command	Description
`lai alignment align`	Align local audio/video with caption
`lai alignment youtube`	Download & align YouTube content
`lai transcribe run`	Transcribe audio/video or YouTube URL to caption
`lai transcribe align`	Transcribe audio/video and align with generated transcript
`lai caption convert`	Convert between caption formats
`lai caption normalize`	Clean and normalize caption text
`lai caption shift`	Shift caption timestamps

lai alignment align

# Basic usage
lai alignment align <audio> <caption> <output>

# Examples
lai alignment align audio.wav caption.srt output.srt
lai alignment align video.mp4 caption.vtt output.srt alignment.device=cuda
lai alignment align audio.wav caption.srt output.json \
    caption.split_sentence=true \
    caption.word_level=true

lai alignment youtube

# Basic usage
lai alignment youtube <url>

# Examples
lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID"
lai alignment youtube "https://youtube.com/watch?v=VIDEO_ID" \
    media.output_dir=~/Downloads \
    caption.output_path=aligned.srt \
    caption.split_sentence=true

lai transcribe run

Perform automatic speech recognition (ASR) on audio/video files or YouTube URLs to generate timestamped transcriptions.

# Basic usage - local file
lai transcribe run <input> <output>

# Basic usage - YouTube URL
lai transcribe run <url> <output_dir>

# Examples - Local files
lai transcribe run audio.wav output.srt
lai transcribe run audio.mp4 output.ass \
    transcription.model_name=nvidia/parakeet-tdt-0.6b-v3

# Examples - YouTube URLs
lai transcribe run "https://youtube.com/watch?v=VIDEO_ID" output_dir=./output
lai transcribe run "https://youtube.com/watch?v=VIDEO_ID" output.ass output_dir=./output \
    transcription.model_name=gemini-2.5-pro \
    transcription.gemini_api_key=YOUR_GEMINI_API_KEY

# Full configuration with keyword arguments
lai transcribe run \
    input=audio.wav \
    output_caption=output.srt \
    channel_selector=average \
    transcription.device=cuda \
    transcription.model_name=iic/SenseVoiceSmall

Parameters:

input: Path to audio/video file or YouTube URL (required)
output_caption: Path for output caption file (for local files)
output_dir: Directory for output files (for YouTube URLs, defaults to current directory)
media_format: Media format for YouTube downloads (default: mp3)
channel_selector: Audio channel selection - "average", "left", "right", or channel index (default: "average")
- Note: Ignored when transcribing YouTube URLs with Gemini models
transcription: Transcription configuration (model_name, device, language, gemini_api_key)

Supported Transcription Models (More Coming Soon):

gemini-2.5-pro - Google Gemini API (requires API key)
- Languages: 100+ languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, and more
gemini-3-pro-preview - Google Gemini API (requires API key)
- Languages: 100+ languages (same as gemini-2.5-pro)
nvidia/parakeet-tdt-0.6b-v3 - NVIDIA Parakeet model
- Languages: Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Ukrainian (uk)
iic/SenseVoiceSmall - Alibaba SenseVoice model
- Languages: Chinese/Mandarin (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
More models will be integrated in future releases

Note: For transcription with alignment on local files, use lai transcribe align instead.

lai transcribe align

Transcribe audio/video file and automatically align the generated transcript with the audio.

This command combines transcription and alignment in a single step, producing precisely aligned captions.

# Basic usage
lai transcribe align <input_media> <output_caption>

# Examples
lai transcribe align audio.wav output.srt
lai transcribe align audio.mp4 output.ass \
    transcription.model_name=nvidia/parakeet-tdt-0.6b-v3 \
    alignment.device=cuda

# Using Gemini transcription with alignment
lai transcribe align audio.wav output.srt \
    transcription.model_name=gemini-2.5-pro \
    transcription.gemini_api_key=YOUR_KEY \
    caption.split_sentence=true

# Full configuration
lai transcribe align \
    input_media=audio.wav \
    output_caption=output.srt \
    transcription.device=mps \
    transcription.model_name=iic/SenseVoiceSmall \
    alignment.device=cuda \
    caption.word_level=true

Parameters:

input_media: Path to input audio/video file (required)
output_caption: Path for output aligned caption file (required)
transcription: Transcription configuration (model_name, device, language, gemini_api_key)
alignment: Alignment configuration (model_name, device)
caption: Caption formatting options (split_sentence, word_level, etc.)

lai caption convert

lai caption convert input.srt output.vtt
lai caption convert input.srt output.json normalize_text=true

lai caption shift

lai caption shift input.srt output.srt 2.0    # Delay by 2 seconds
lai caption shift input.srt output.srt -1.5   # Advance by 1.5 seconds

Python SDK Reference

Basic Alignment

from lattifai import LattifAI

# Initialize client (uses LATTIFAI_API_KEY from environment)
client = LattifAI()

# Align audio/video with subtitle
caption = client.alignment(
    input_media="audio.wav",           # Audio or video file
    input_caption="subtitle.srt",      # Input subtitle file
    output_caption_path="output.srt",  # Output aligned subtitle
    split_sentence=True,               # Enable smart sentence splitting
)

# Access alignment results
for segment in caption.supervisions:
    print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")

YouTube Processing

from lattifai import LattifAI

client = LattifAI()

# Download YouTube video and align with auto-downloaded subtitles
caption = client.youtube(
    url="https://youtube.com/watch?v=VIDEO_ID",
    output_dir="./downloads",
    output_caption_path="aligned.srt",
    split_sentence=True,
)

Configuration Objects

LattifAI uses a config-driven architecture for fine-grained control:

ClientConfig - API Settings

from lattifai import LattifAI, ClientConfig

client = LattifAI(
    client_config=ClientConfig(
        api_key="lf_your_api_key",     # Or use LATTIFAI_API_KEY env var
        timeout=30.0,
        max_retries=3,
    )
)

AlignmentConfig - Model Settings

from lattifai import LattifAI, AlignmentConfig

client = LattifAI(
    alignment_config=AlignmentConfig(
        model_name="Lattifai/Lattice-1",
        device="cuda",      # "cpu", "cuda", "cuda:0", "mps"
    )
)

CaptionConfig - Subtitle Settings

from lattifai import LattifAI, CaptionConfig

client = LattifAI(
    caption_config=CaptionConfig(
        split_sentence=True,           # Smart sentence splitting
        word_level=True,               # Word-level timestamps
        normalize_text=True,           # Clean HTML entities
        include_speaker_in_text=False, # Include speaker labels
    )
)

Complete Configuration Example

from lattifai import (
    LattifAI,
    ClientConfig,
    AlignmentConfig,
    CaptionConfig
)

client = LattifAI(
    client_config=ClientConfig(
        api_key="lf_your_api_key",
        timeout=60.0,
    ),
    alignment_config=AlignmentConfig(
        model_name="Lattifai/Lattice-1",
        device="cuda",
    ),
    caption_config=CaptionConfig(
        split_sentence=True,
        word_level=True,
        output_format="json",
    ),
)

caption = client.alignment(
    input_media="audio.wav",
    input_caption="subtitle.srt",
    output_caption_path="output.json",
)

Available Exports

from lattifai import (
    # Client classes
    LattifAI,
    # AsyncLattifAI,  # For async support

    # Config classes
    ClientConfig,
    AlignmentConfig,
    CaptionConfig,
    DiarizationConfig,
    MediaConfig,

    # I/O classes
    Caption,
)

Advanced Features

Word-Level Alignment

Enable word_level=True to get precise timestamps for each word:

from lattifai import LattifAI, CaptionConfig

client = LattifAI(
    caption_config=CaptionConfig(word_level=True)
)

caption = client.alignment(
    input_media="audio.wav",
    input_caption="subtitle.srt",
    output_caption_path="output.json",  # JSON preserves word-level data
)

# Access word-level alignments
for segment in caption.alignments:
    if segment.alignment and "word" in segment.alignment:
        for word_item in segment.alignment["word"]:
            print(f"{word_item.start:.2f}s: {word_item.symbol} (confidence: {word_item.score:.2f})")

Smart Sentence Splitting

The split_sentence option intelligently separates:

Non-speech elements ([APPLAUSE], [MUSIC]) from dialogue
Multiple sentences within a single subtitle
Speaker labels from content

caption = client.alignment(
    input_media="audio.wav",
    input_caption="subtitle.srt",
    split_sentence=True,
)

Speaker Diarization (WIP)

Note: This feature is currently under development and not yet fully available.

Speaker diarization automatically identifies and labels different speakers in audio. When enabled, the system will:

Detect speaker changes in the audio
Assign speaker labels (e.g., SPEAKER_00, SPEAKER_01) to each segment
Update subtitle segments with speaker information

Speaker Name Handling:

Existing speaker labels in subtitles: If your input captions already contain speaker names (e.g., [Alice], >> Bob:, or SPEAKER_01:), the system will preserve them as much as possible during alignment
Gemini Transcriber: When using Gemini models for transcription (e.g., gemini-2.5-pro), the model can intelligently identify and extract speaker names from dialogue context, making it easier to generate speaker-aware transcripts

Python SDK:

from lattifai import LattifAI, DiarizationConfig

client = LattifAI(
    diarization_config=DiarizationConfig(enabled=True)
)

caption = client.alignment(
    input_media="audio.wav",
    input_caption="subtitle.srt",
    output_caption_path="output.srt",
)

# Access speaker information
for segment in caption.supervisions:
    print(f"[{segment.speaker}] {segment.text}")

YAML Configuration Files

Create reusable configuration files:

# config/alignment.yaml
model_name: "Lattifai/Lattice-1"
device: "cuda"
batch_size: 1

lai alignment align audio.wav subtitle.srt output.srt \
    alignment=config/alignment.yaml

Supported Formats

LattifAI supports virtually all common media and subtitle formats:

Type	Formats
Audio	WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, AIFF, and more
Video	MP4, MKV, MOV, WEBM, AVI, and more
Caption/Subtitle Input	SRT, VTT, ASS, SSA, SUB, SBV, TXT, Gemini, and more
Caption/Subtitle Output	All input formats + TextGrid (Praat)

Tabular Formats:

TSV: Tab-separated values with optional speaker column
CSV: Comma-separated values with optional speaker column
AUD: Audacity labels format with [[speaker]] notation

Note: If a format is not listed above but commonly used, it's likely supported. Feel free to try it or reach out if you encounter any issues.

Roadmap

Visit our LattifAI roadmap for the latest updates.

Date	Release	Features
Oct 2025	Lattice-1-Alpha	✅ English forced alignment ✅ Multi-format support ✅ CPU/GPU optimization
Nov 2025	Lattice-1	✅ English + Chinese + German ✅ Mixed languages alignment 🚀 Integrate Speaker Diarization

Development

Setup

git clone https://github.com/lattifai/lattifai-python.git
cd lattifai-python

# Using uv (recommended)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate

# Or using pip
pip install -e ".[test]"

pre-commit install

Testing

pytest                        # Run all tests
pytest --cov=src              # With coverage
pytest tests/test_basic.py    # Specific test

Contributing

Fork the repository
Create a feature branch
Make changes and add tests
Run pytest and pre-commit run
Submit a pull request

License

Apache License 2.0

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Discord: Join our community

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Lattifai

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.2

Feb 26, 2026

1.4.0

Feb 9, 2026

1.3.5

Feb 9, 2026

1.3.4

Feb 4, 2026

1.2.2

Jan 18, 2026

1.2.1

Jan 14, 2026

1.2.0

Dec 31, 2025

1.1.0

Dec 28, 2025

1.0.5

Dec 13, 2025

1.0.4

Dec 9, 2025

This version

1.0.2

Dec 9, 2025

1.0.1

Dec 8, 2025

1.0.0

Dec 7, 2025

0.4.6

Nov 4, 2025

0.4.5

Nov 3, 2025

0.4.4

Nov 3, 2025

0.4.3

Nov 2, 2025

0.4.2

Nov 2, 2025

0.4.1

Nov 1, 2025

0.4.0

Nov 1, 2025

0.2.5

Oct 26, 2025

0.2.4

Oct 19, 2025

0.2.2

Oct 19, 2025

0.2.0

Oct 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lattifai-1.0.2-py3-none-any.whl (113.9 kB view details)

Uploaded Dec 9, 2025 Python 3

File details

Details for the file lattifai-1.0.2-py3-none-any.whl.

File metadata

Download URL: lattifai-1.0.2-py3-none-any.whl
Upload date: Dec 9, 2025
Size: 113.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lattifai-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0364da425bcddef1f21ee2775a8c91f4ac04fe951b2df0d97c0349ff58cf118`
MD5	`8c00d8a1df056aea7f817f6be3ea4206`
BLAKE2b-256	`19497556896ec19739b79c6a5245ea756e182ade6cb947b0ba45539b1c64e189`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lattifai-1.0.2-py3-none-any.whl:

Publisher: publish-wheels.yml on lattifai/lattifai-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lattifai-1.0.2-py3-none-any.whl
- Subject digest: a0364da425bcddef1f21ee2775a8c91f4ac04fe951b2df0d97c0349ff58cf118
- Sigstore transparency entry: 752215437
- Sigstore integration time: Dec 9, 2025
Source repository:
- Permalink: lattifai/lattifai-python@47cd2921d3bcfcf6c62fdfe9d5ec1e342299a0c0
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lattifai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-wheels.yml@47cd2921d3bcfcf6c62fdfe9d5ec1e342299a0c0
- Trigger Event: workflow_dispatch

lattifai 1.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

LattifAI: Precision Alignment, Infinite Possibilities

Table of Contents

Installation

Step 1: Install SDK

Step 2: Get Your API Key

Quick Start

Command Line Interface

Python SDK (5 Lines of Code)

Web Interface

CLI Reference

Command Overview

lai alignment align

lai alignment youtube

lai transcribe run

lai transcribe align

lai caption convert

lai caption shift

Python SDK Reference

Basic Alignment

YouTube Processing

Configuration Objects

ClientConfig - API Settings

AlignmentConfig - Model Settings

CaptionConfig - Subtitle Settings

Complete Configuration Example

Available Exports

Advanced Features

Word-Level Alignment

Smart Sentence Splitting

Speaker Diarization (WIP)

YAML Configuration Files

Supported Formats

Roadmap

Development

Setup

Testing

Contributing

License

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance