Caption/subtitle processing library with multi-format support (SRT, VTT, ASS, TTML, TextGrid, NLE formats)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Lattifai

These details have not been verified by PyPI

Project description

lattifai-captions

Caption/subtitle processing library with comprehensive format support.

Features

Multi-format support: SRT, VTT, ASS, SSA, TTML, TextGrid, LRC, SRV3, and more
YouTube formats: SRV3 (YTT v3), YouTube VTT with word-level timestamps
Professional NLE formats: Avid DS, Final Cut Pro XML, Premiere Pro XML, Adobe Audition
Word-level timing: Karaoke-style word-by-word timestamps
Standardization: Netflix/BBC broadcast guidelines compliance
Sentence splitting: AI-powered intelligent sentence segmentation
Zero dependencies on heavy ML frameworks: Lightweight and fast

Installation

# Basic installation
pip install lattifai-captions

# With sentence splitting support
pip install lattifai-captions[splitting]

Quick Start

from lattifai.caption import Caption

# Read a caption file
caption = Caption.read("input.srt")

# Write to different format
caption.write("output.vtt")

# Convert to string
vtt_content = caption.to_string("vtt")

# Access segments
for segment in caption.supervisions:
    print(f"{segment.start:.2f} - {segment.end:.2f}: {segment.text}")

Supported Formats

Input/Output (Read & Write)

Format	Extensions	Description
SRT	`.srt`	SubRip subtitle format
VTT	`.vtt`	WebVTT, includes YouTube VTT with word-level timestamps
ASS/SSA	`.ass`, `.ssa`	Advanced SubStation Alpha
SRV3	`.srv3`, `.ytt`	YouTube Timed Text v3 with word-level timing
SBV	`.sbv`	YouTube SubViewer format
SUB	`.sub`	MicroDVD subtitle format
SAMI	`.sami`, `.smi`	SAMI subtitle format
JSON	`.json`	Structured data with word-level support
CSV/TSV	`.csv`, `.tsv`	Tabular formats
TextGrid	`.textgrid`	Praat TextGrid format
LRC	`.lrc`	Lyrics format with word-level timestamps
Gemini	`.md`	Gemini AI transcript markdown

Output Only

Format	Extensions	Description
TTML	`.ttml`	Timed Text Markup Language (W3C standard)
IMSC1	`.ttml`	Netflix/streaming TTML profile
EBU-TT-D	`.ttml`	European broadcast TTML profile
Avid DS	`.txt`	Avid Media Composer SubCap
FCPXML	`.fcpxml`	Final Cut Pro XML
Premiere XML	`.xml`	Adobe Premiere Pro XML
Audition CSV	`.csv`	Adobe Audition markers
EdiMarker CSV	`.csv`	Pro Tools markers

Word-Level Timing

Many formats support word-level timing for karaoke-style output:

from lattifai.caption import Caption

caption = Caption.read("input.srv3")  # SRV3 has built-in word timing

# Access word-level alignment
for segment in caption.supervisions:
    if segment.alignment and "word" in segment.alignment:
        for word in segment.alignment["word"]:
            print(f"  {word.symbol}: {word.start:.3f}s - {word.end:.3f}s")

# Export with word-level timing
caption.write("output.json", word_level=True)  # JSON preserves words array
caption.write("output.ass", word_level=True, karaoke_config=KaraokeConfig(enabled=True))

YouTube SRV3 Format

SRV3 is YouTube's proprietary timed text format with millisecond-precision word timing:

from lattifai.caption import Caption

# Read SRV3 (automatically extracts word-level timing)
caption = Caption.read("video.srv3")

# Convert to other formats
caption.write("output.srt")  # Standard SRT
caption.write("output.vtt", word_level=True)  # VTT with word timing
caption.write("output.srv3", word_level=True)  # Back to SRV3

SRV3 structure example:

<timedtext format="3">
  <body>
    <p t="240" d="6559" w="1">
      <s ac="0">Does</s>
      <s t="320" ac="0"> fast</s>
      <s t="560" ac="0"> charging</s>
    </p>
  </body>
</timedtext>

Sentence Splitting

Split captions into natural sentences (requires [splitting] extra):

from lattifai.caption import Caption, SentenceSplitter

# Using Caption method
caption = Caption.read("input.srt")
split_caption = caption.split_sentences()

# Or use SentenceSplitter directly
splitter = SentenceSplitter()
split_supervisions = splitter.split_sentences(caption.supervisions)

Format Conversion

from lattifai.caption import Caption

# Read any format
caption = Caption.read("input.srt")

# Write to any supported format
caption.write("output.vtt")
caption.write("output.ass")
caption.write("output.json")
caption.write("output.srv3", word_level=True)
caption.write("output.ttml")

# Or get as string
srt_content = caption.to_string("srt")
json_content = caption.to_string("json", word_level=True)

Standardization

Apply broadcast standards to captions:

from lattifai.caption import Caption, CaptionStandardizer

standardizer = CaptionStandardizer(
    min_duration=0.7,      # Minimum segment duration
    max_duration=7.0,      # Maximum segment duration
    min_gap=0.08,          # Minimum gap between segments
    max_lines=2,           # Maximum lines per segment
    max_chars_per_line=42, # Maximum characters per line
)

caption = Caption.read("input.srt")
standardized = standardizer.process(caption.supervisions)

Validation

Check captions against quality standards:

from lattifai.caption import Caption, CaptionValidator

validator = CaptionValidator(
    min_duration=0.7,
    max_duration=7.0,
    min_gap=0.08,
    max_chars_per_line=42,
)

caption = Caption.read("input.srt")
result = validator.validate(caption.supervisions)

print(f"Valid: {result.valid}")
print(f"Average CPS: {result.avg_cps:.1f}")
print(f"Max CPL: {result.max_cpl}")
print(f"Warnings: {result.warnings}")

API Reference

Caption Class

from lattifai.caption import Caption

# Class methods
Caption.read(path, format=None, normalize_text=True)
Caption.from_string(content, format)
Caption.from_supervisions(supervisions, language=None, metadata=None)

# Instance methods
caption.write(path, include_speaker=True, word_level=False, karaoke_config=None)
caption.to_string(format, include_speaker=True, word_level=False, karaoke_config=None)
caption.split_sentences()
caption.shift_time(seconds)

# Properties
caption.supervisions  # List[Supervision]
caption.duration      # Total duration in seconds
caption.language      # Language code
caption.source_format # Original format

Supervision Class

from lattifai.caption import Supervision

sup = Supervision(
    start=0.0,           # Start time in seconds
    duration=2.5,        # Duration in seconds
    text="Hello world",  # Caption text
    speaker="Alice",     # Optional speaker label
    alignment=None,      # Optional word-level alignment
)

# Properties
sup.end       # start + duration
sup.text      # Caption text
sup.speaker   # Speaker label
sup.alignment # Dict with "word" key containing AlignmentItem list

License

Apache-2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Lattifai

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.14

Apr 28, 2026

0.4.13

Apr 28, 2026

0.4.12

Apr 28, 2026

0.4.11

Apr 26, 2026

0.4.10

Apr 26, 2026

0.4.9

Apr 26, 2026

0.4.8

Apr 25, 2026

0.4.7

Apr 25, 2026

0.4.6

Apr 14, 2026

0.4.5

Apr 13, 2026

0.4.4

Apr 12, 2026

0.4.2

Apr 8, 2026

0.4.1

Apr 6, 2026

0.4.0

Apr 6, 2026

0.2.9

Apr 1, 2026

0.2.8

Mar 31, 2026

0.2.7

Feb 28, 2026

0.2.6

Feb 27, 2026

0.2.5

Feb 27, 2026

0.2.4

Feb 27, 2026

0.2.2

Feb 26, 2026

0.2.1

Feb 9, 2026

0.2.0

Feb 8, 2026

0.1.8

Feb 5, 2026

This version

0.1.7

Feb 4, 2026

0.1.6

Feb 3, 2026

0.1.5

Feb 3, 2026

0.1.4

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lattifai_captions-0.1.7.tar.gz (80.5 kB view details)

Uploaded Feb 4, 2026 Source

File details

Details for the file lattifai_captions-0.1.7.tar.gz.

File metadata

Download URL: lattifai_captions-0.1.7.tar.gz
Upload date: Feb 4, 2026
Size: 80.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lattifai_captions-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`925dc02c3dc958d94c71f96fa2be71f35b1538445cd018ac10ecf13db56c2b64`
MD5	`3294c30794d7a202414edc828e454238`
BLAKE2b-256	`b9279d12b3db4b8048a6806d79fe1ba1bd8a823746912c008ef98a6ecc56d5a6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lattifai_captions-0.1.7.tar.gz:

Publisher: publish-wheels.yml on lattifai/captions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lattifai_captions-0.1.7.tar.gz
- Subject digest: 925dc02c3dc958d94c71f96fa2be71f35b1538445cd018ac10ecf13db56c2b64
- Sigstore transparency entry: 910282554
- Sigstore integration time: Feb 4, 2026
Source repository:
- Permalink: lattifai/captions@ed214b373e18e51e09436cd9e3eb77db0f261efa
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/lattifai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-wheels.yml@ed214b373e18e51e09436cd9e3eb77db0f261efa
- Trigger Event: push

lattifai-captions 0.1.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

lattifai-captions

Features

Installation

Quick Start

Supported Formats

Input/Output (Read & Write)

Output Only

Word-Level Timing

YouTube SRV3 Format

Sentence Splitting

Format Conversion

Standardization

Validation

API Reference

Caption Class

Supervision Class

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance