Caption/subtitle processing library with multi-format support (SRT, VTT, ASS, TTML, TextGrid, NLE formats)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

lattifai-captions

The universal caption toolkit. Read, write, convert, and validate 25+ subtitle formats with word-level precision — from YouTube to Netflix to Final Cut Pro.

Why lattifai-captions?


25+ formats	SRT, VTT, ASS, TTML, SRV3, TextGrid, FCPXML, Premiere XML, and more
Word-level timing	Preserve millisecond-precision word timestamps across format conversions
Broadcast-ready	Netflix, BBC, and EBU compliance validation out of the box
NLE integration	Direct export to Avid, Final Cut Pro, Premiere Pro, and Pro Tools
Lightweight	Zero dependency on PyTorch, TensorFlow, or any ML framework

Installation

pip install lattifai-captions --extra-index-url https://lattifai.github.io/pypi/simple/

# With AI-powered sentence splitting
pip install lattifai-captions[splitting] --extra-index-url https://lattifai.github.io/pypi/simple/

Or configure pip globally:

# ~/.pip/pip.conf
[global]
extra-index-url = https://lattifai.github.io/pypi/simple/

Quick Start

from lattifai.caption import Caption

# Read any format — auto-detected
caption = Caption.read("input.srt")

# Convert to any format
caption.write("output.vtt")
caption.write("output.ass")
caption.write("output.ttml")

# Get as string
vtt_content = caption.to_string("vtt")

Supported Formats

Read & Write

Format	Extensions	Highlights
SRT	`.srt`	Industry standard subtitle format
WebVTT	`.vtt`	Web standard; auto-detects YouTube word-level timestamps
ASS / SSA	`.ass` `.ssa`	Styled subtitles with karaoke support
SRV3	`.srv3` `.ytt`	YouTube Timed Text v3 — millisecond word timing
SBV	`.sbv`	YouTube SubViewer
SUB	`.sub`	MicroDVD
SAMI	`.sami` `.smi`	SAMI subtitle format
JSON	`.json`	Structured data with full word-level arrays
CSV / TSV	`.csv` `.tsv`	Tabular export for data analysis
TextGrid	`.textgrid`	Praat — phonetics and linguistics research
LRC	`.lrc`	Lyrics with word-level timestamps
Gemini	`.md`	Gemini AI transcript markdown

Write-Only — Professional Post-Production

Format	Target	Use Case
TTML	W3C standard	Streaming platforms
IMSC1	Netflix / streaming	Netflix Timed Text profile
EBU-TT-D	European broadcast	EBU broadcast delivery
Avid DS	Avid Media Composer	SubCap import
FCPXML	Final Cut Pro	Native timeline import
Premiere XML	Adobe Premiere Pro	Graphic clip subtitles
Audition CSV	Adobe Audition	Marker-based editing
EdiMarker CSV	Pro Tools	Session markers

Word-Level Timing

Preserve and convert word-by-word timestamps across formats:

from lattifai.caption import Caption
from lattifai.caption.config import RenderConfig, ASSConfig

caption = Caption.read("video.srv3")  # YouTube SRV3 with word timing

# Inspect word-level data
for seg in caption.supervisions:
    for word in seg.alignment.get("word", []):
        print(f"  {word.symbol}: {word.start:.3f}s ({word.duration:.3f}s)")

# Export with word timing preserved (per-format opt-in)
caption.write("output.srt", render=RenderConfig(word_level=True))   # one cue per word
caption.write("output.lrc", render=RenderConfig(word_level=True))   # enhanced LRC inline
caption.write("output.vtt", render=RenderConfig(word_level=True))   # YouTube VTT inline

# Karaoke-style ASS output (karaoke_effect alone is enough)
caption.write("output.ass", format_config=ASSConfig(karaoke_effect="sweep"))

# JSON / TextGrid keep word data automatically (lossless)
caption.write("output.json")

`RenderConfig.word_level` — Tri-State Semantics

A single Optional[bool] controls word-level output across every writer. Most users never need to set it — the default (None) is the right behavior for every format.

Value	Meaning
`None` (default)	Smart per-format default. Renderers stay segment-level; lossless serializers (JSON/TextGrid) preserve word data when present.
`True`	Force word-level output. Renderers emit per-word cues / inline timestamps; if word alignment is missing, a warning is logged and the writer degrades to segment output.
`False`	Force segment-level output. Lossless serializers drop their word data; ASS karaoke is disabled even when `karaoke_effect` is set.

Per-Format Behavior Matrix

Format	`None` (default)	`True`	`False`
SRT / SBV / SUB	one cue per segment	one cue per word (warns if no alignment)	same as None
VTT	standard segment VTT	YouTube VTT with inline `<00:00:01.000>` markers	same as None
LRC	standard `[mm:ss.xx]` lines	enhanced LRC with inline `<mm:ss.xx>` per word	same as None
TTML / IMSC1 / EBU-TT-D	one `<p>` per segment	iTunes namespace + per-word `<span>`	same as None
SRV3	segment paragraphs	per-word `<s t="">` offsets	same as None
ASS / SSA	segment Dialogue lines; honors `ASSConfig.karaoke_effect` to emit `{\k}` tags automatically	force word level — with `karaoke_effect` → `{\k}` tags; without → one Dialogue line per word	force segment; disables `karaoke_effect` with a warning
JSON	preserves `words` array when alignment is present (lossless)	same as None	strips `words` array even when present
TextGrid	emits `words` tier when alignment is present (lossless)	same as None	omits `words` tier even when present
Premiere XML	one clip per segment	one clip per word	same as None
FCPXML	one element per segment	one element per word	same as None
CSV / TSV / TXT / AUD / Avid / Audition / EdiMarker / Markdown	segment-level only (format has no word concept)	same as None (warning is not emitted; format is incapable)	same as None

Key Design Notes

ASS karaoke is triggered by ASSConfig.karaoke_effect alone — you no longer need to combine it with word_level=True. Setting word_level=False while karaoke_effect is set will explicitly disable karaoke (with a warning).
Lossless serializers (JSON / TextGrid) ignore True — they always preserve word data when it exists. The only state that changes their output is False, which strips the word data.
Degradation is logged, not fatal — when word_level=True is requested but no supervision has word alignment, a single WARNING is logged via the lattifai.caption logger and the writer falls back to segment output. This matches the "best-effort batch" model used by transcription pipelines.

Broadcast Standardization

Enforce Netflix, BBC, or custom broadcast guidelines:

from lattifai.caption import Caption, CaptionStandardizer, CaptionValidator

# Standardize
standardizer = CaptionStandardizer(
    min_duration=0.7,       # Minimum segment duration (seconds)
    max_duration=7.0,       # Maximum segment duration
    min_gap=0.08,           # 80ms gap to prevent flicker
    max_lines=2,            # Lines per segment
    max_chars_per_line=42,  # Auto-adjusts to 21 for CJK
)
caption = Caption.read("input.srt")
standardized = standardizer.process(caption.supervisions)

# Validate
validator = CaptionValidator(min_duration=0.7, max_duration=7.0, max_chars_per_line=42)
result = validator.validate(caption.supervisions)
print(f"Valid: {result.valid} | CPS: {result.avg_cps:.1f} | Warnings: {len(result.warnings)}")

NLE Export

Direct export to professional editing software:

from lattifai.caption import Caption, FCPXMLConfig, FCPXMLStyle

caption = Caption.read("input.srt")

# Final Cut Pro
caption.write("timeline.fcpxml")

# Avid Media Composer
caption.write("avid.txt", format="avid_ds")

# Adobe Premiere Pro
caption.write("premiere.xml", format="premiere_xml")

Sentence Splitting

AI-powered sentence segmentation using wtpsplit (requires [splitting] extra):

caption = Caption.read("input.srt")
split_caption = caption.split_sentences()

Time Operations

# Shift all timestamps
shifted = caption.shift_time(seconds=2.5)

# Adjust word-level margins (prevents cut-off words)
adjusted = caption.with_margins(start_margin=0.05, end_margin=0.15)

# Resolve overlapping segments
from lattifai.caption import resolve_overlaps, CollisionMode
resolved = resolve_overlaps(caption.supervisions, mode=CollisionMode.TRIM)

API Reference

Caption

from lattifai.caption import Caption

# Read
caption = Caption.read("file.srt")                    # Auto-detect format
caption = Caption.read("file.txt", format="srt")      # Explicit format
caption = Caption.from_string(content, format="vtt")   # From string

# Write
caption.write("output.vtt")
caption.write(
    "output.ass",
    format_config=ASSConfig(karaoke_effect="sweep"),  # karaoke_effect alone triggers ASS karaoke
)
content = caption.to_string("srt")

# Transform
caption.shift_time(seconds=1.0)
caption.split_sentences()
caption.with_margins(start_margin=0.05, end_margin=0.15)

# Properties
caption.supervisions    # List[Supervision]
caption.duration        # Total duration in seconds
caption.language        # Language code
caption.source_format   # Original format detected

Supervision

from lattifai.caption import Supervision

sup = Supervision(start=0.0, duration=2.5, text="Hello world", speaker="Alice")

sup.end          # 2.5 (start + duration)
sup.alignment    # {"word": [AlignmentItem(symbol, start, duration, score), ...]}

License

Apache-2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Lattifai

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.14

Apr 28, 2026

0.4.13

Apr 28, 2026

0.4.12

Apr 28, 2026

This version

0.4.11

Apr 26, 2026

0.4.10

Apr 26, 2026

0.4.9

Apr 26, 2026

0.4.8

Apr 25, 2026

0.4.7

Apr 25, 2026

0.4.6

Apr 14, 2026

0.4.5

Apr 13, 2026

0.4.4

Apr 12, 2026

0.4.2

Apr 8, 2026

0.4.1

Apr 6, 2026

0.4.0

Apr 6, 2026

0.2.9

Apr 1, 2026

0.2.8

Mar 31, 2026

0.2.7

Feb 28, 2026

0.2.6

Feb 27, 2026

0.2.5

Feb 27, 2026

0.2.4

Feb 27, 2026

0.2.2

Feb 26, 2026

0.2.1

Feb 9, 2026

0.2.0

Feb 8, 2026

0.1.8

Feb 5, 2026

0.1.7

Feb 4, 2026

0.1.6

Feb 3, 2026

0.1.5

Feb 3, 2026

0.1.4

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lattifai_captions-0.4.11.tar.gz (163.2 kB view details)

Uploaded Apr 26, 2026 Source

File details

Details for the file lattifai_captions-0.4.11.tar.gz.

File metadata

Download URL: lattifai_captions-0.4.11.tar.gz
Upload date: Apr 26, 2026
Size: 163.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lattifai_captions-0.4.11.tar.gz
Algorithm	Hash digest
SHA256	`09ffa60c8850e97730e8a1be8f34011c70cda169165ff4ccecb1026e9543dc7e`
MD5	`367f601026033f34b12c8c18eaad07ef`
BLAKE2b-256	`9e2fecd6e6aeca9c073a64a4f0ca3400ab2c355c082222ea547aaff13feba742`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lattifai_captions-0.4.11.tar.gz:

Publisher: publish-wheels.yml on lattifai/captions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lattifai_captions-0.4.11.tar.gz
- Subject digest: 09ffa60c8850e97730e8a1be8f34011c70cda169165ff4ccecb1026e9543dc7e
- Sigstore transparency entry: 1390791385
- Sigstore integration time: Apr 26, 2026
Source repository:
- Permalink: lattifai/captions@a3f466668c6d7f9f65f8f64764ca527a72571b39
- Branch / Tag: refs/tags/v0.4.11
- Owner: https://github.com/lattifai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-wheels.yml@a3f466668c6d7f9f65f8f64764ca527a72571b39
- Trigger Event: push

lattifai-captions 0.4.11

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

lattifai-captions

Why lattifai-captions?

Installation

Quick Start

Supported Formats

Read & Write

Write-Only — Professional Post-Production

Word-Level Timing

RenderConfig.word_level — Tri-State Semantics

Per-Format Behavior Matrix

Key Design Notes

Broadcast Standardization

NLE Export

Sentence Splitting

Time Operations

API Reference

Caption

Supervision

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance

`RenderConfig.word_level` — Tri-State Semantics