Pipelines implementation for Lightricks' LTX-2 model

Project description

LTX-2 Pipelines

High-level pipeline implementations for generating audio-video content with Lightricks' LTX-2 model. This package provides ready-to-use pipelines for text-to-video, image-to-video, video-to-video, and keyframe interpolation tasks.

Pipelines are built using building blocks from ltx-core (schedulers, guiders, noisers, patchifiers) and handle the complete inference flow including model loading, encoding, decoding, and file I/O.

📋 Overview

LTX-2 Pipelines provides production-ready implementations that abstract away the complexity of the diffusion process, model loading, and memory management. Each pipeline is optimized for specific use cases and offers different trade-offs between speed, quality, and memory usage.

Key Features:

🎬 Multiple Pipeline Types: Text-to-video, image-to-video, video-to-video, audio-to-video, keyframe interpolation, and retake
⚡ Optimized Performance: Support for FP8 transformers, gradient estimation, and memory optimization
🎯 Production Ready: Two-stage pipelines for best quality output
🔧 LoRA Support: Easy integration with trained LoRA adapters
📦 Self-Contained: Handles model loading, encoding, decoding, and file I/O
🚀 CLI Support: All pipelines can be run as command-line scripts

🚀 Quick Start

ltx-pipelines provides ready-made inference pipelines for text-to-video, image-to-video, video-to-video, audio-to-video, keyframe interpolation, and retake. Built using building blocks from ltx-core, these pipelines handle the complete inference flow including model loading, encoding, decoding, and file I/O.

🔧 Installation

# From the repository root
uv sync --frozen

# Or install as a package
pip install -e packages/ltx-pipelines

Running Pipelines

All pipelines can be run directly from the command line. Each pipeline module is executable:

# Run a pipeline (example: two-stage text-to-video)
python -m ltx_pipelines.ti2vid_two_stages \
    --checkpoint-path path/to/checkpoint.safetensors \
    --distilled-lora path/to/distilled_lora.safetensors 0.8 \
    --spatial-upsampler-path path/to/upsampler.safetensors \
    --gemma-root path/to/gemma \
    --prompt "A beautiful sunset over the ocean" \
    --output-path output.mp4

# View all available options for any pipeline
python -m ltx_pipelines.ti2vid_two_stages --help

Available pipeline modules:

ltx_pipelines.ti2vid_two_stages - Two-stage text/image-to-video (recommended).
ltx_pipelines.ti2vid_two_stages_hq - Two-stage text/image-to-video (different sampler, better quality).
ltx_pipelines.ti2vid_one_stage - Single-stage text/image-to-video.
ltx_pipelines.distilled - Fast text/image-to-video pipeline using only the distilled model.
ltx_pipelines.ic_lora - Video-to-video with IC-LoRA.
ltx_pipelines.keyframe_interpolation - Keyframe interpolation.
ltx_pipelines.a2vid_two_stage - Audio-to-video generation conditioned on an input audio.
ltx_pipelines.retake - Regenerate a time region of an existing video.

Use --help with any pipeline module to see all available options and parameters.

🎯 Pipeline Selection Guide

Quick Decision Tree

Do you have an existing video to modify?
├─ YES → Use RetakePipeline (regenerate a specific time region)
│
Do you have an audio file to drive generation?
├─ YES → Use A2VidPipelineTwoStage (audio-to-video)
│
Do you need to condition on existing images/videos?
├─ YES → Do you have reference videos for video-to-video?
│  ├─ YES → Use ICLoraPipeline
│  └─ NO → Do you have multiple keyframe images to interpolate?
│     ├─ YES → Use KeyframeInterpolationPipeline
│     └─ NO → Use TI2VidTwoStagesPipeline (image conditioning only)
│
└─ NO → Text-to-video only
   ├─ Do you need best quality?
   │  └─ YES → Use TI2VidTwoStagesPipeline (recommended for production)
   │
   └─ Do you need fastest inference?
      └─ YES → Use DistilledPipeline (with 8 predefined sigmas)

Note: TI2VidOneStagePipeline is primarily for educational purposes. For best quality, use two-stage pipelines (TI2VidTwoStagesPipeline, TI2VidTwoStagesHQPipeline, ICLoraPipeline, KeyframeInterpolationPipeline, A2VidPipelineTwoStage, or DistilledPipeline). For editing existing videos, use RetakePipeline.

Features Comparison

Pipeline	Stages	Multimodal Guidance	Upsampling	Conditioning	Best For
TI2VidTwoStagesPipeline	2	✅	✅	Image	Production quality (recommended)
TI2VidTwoStagesHQPipeline	2	✅	✅	Image	Same as above, res_2s sampler (higher quality)
TI2VidOneStagePipeline	1	✅	❌	Image	Educational, prototyping
DistilledPipeline	2	❌	✅	Image	Fastest inference (8 sigmas)
ICLoraPipeline	2	✅	✅	Image + Video	Video-to-video transformations
KeyframeInterpolationPipeline	2	✅	✅	Keyframes	Animation, interpolation
A2VidPipelineTwoStage	2	✅	✅	Audio + Image	Audio-driven video generation
RetakePipeline	1	✅	❌	Source Video	Regenerating a time region of a video

📦 Available Pipelines

1. TI2VidTwoStagesPipeline

Best for: High-quality text/image-to-video generation with upsampling. Recommended for production use.

Source: src/ltx_pipelines/ti2vid_two_stages.py

Two-stage generation: Stage 1 generates low-resolution video with multimodal guidance, Stage 2 upsamples to 2x resolution with distilled LoRA refinement. Supports image conditioning. Highest quality output, slower than one-stage but significantly better quality.

Use when: Production-quality video generation, higher resolution needed, quality over speed, text-to-video with image conditioning.

2. TI2VidTwoStagesHQPipeline

Best for: Same two-stage text/image-to-video as TI2VidTwoStagesPipeline but with a different sampler and step count.

Source: src/ltx_pipelines/ti2vid_two_stages_hq.py

Uses the res_2s second-order sampler instead of Euler. Same stage structure (stage 1 at target resolution with CFG, stage 2 upsampling with distilled LoRA) and image conditioning support. Typically allows fewer steps for comparable quality; trade-offs differ from the default Euler-based pipeline.

Use when: You want the same two-stage workflow with fewer steps or prefer the res_2s sampling behavior.

3. TI2VidOneStagePipeline

Best for: Educational purposes and quick prototyping.

Source: src/ltx_pipelines/ti2vid_one_stage.py

⚠️ Important: This pipeline is primarily for educational purposes. For production-quality results, use TI2VidTwoStagesPipeline or other two-stage pipelines.

Single-stage generation (no upsampling) with multimodal guidance and image conditioning support. Faster inference but lower resolution output (typically 512x768).

Use when: Learning how the pipeline works, quick prototyping, testing, or when high resolution is not needed.

4. DistilledPipeline

Best for: Fastest inference with good quality using a distilled model with predefined sigma schedule.

Source: src/ltx_pipelines/distilled.py

Two-stage generation with 8 predefined sigmas (8 steps in stage 1, 4 steps in stage 2). No guidance required. Fastest inference among all pipelines. Supports image conditioning. Requires spatial upsampler.

Use when: Fastest inference is critical, batch processing many videos, or when you have a distilled model checkpoint.

5. ICLoraPipeline

Best for: Video-to-video and image-to-video transformations using IC-LoRA.

Source: src/ltx_pipelines/ic_lora.py

Two-stage generation with IC-LoRA support. Can condition on reference videos (video-to-video) or images at specific frames. CFG guidance in stage 1, upsampling in stage 2. Requires IC-LoRA trained model.

Note: ICLoraPipeline can only be used with a distilled model.

Use when: Video-to-video transformations, image-to-video with strong control, or when you have reference videos to guide generation.

6. KeyframeInterpolationPipeline

Best for: Generating videos by interpolating between keyframe images.

Source: src/ltx_pipelines/keyframe_interpolation.py

Two-stage generation with keyframe interpolation. Uses guiding latents (additive conditioning) instead of replacing latents for smoother transitions. Multimodal guidance in stage 1, upsampling in stage 2.

Use when: You have keyframe images and want to interpolate between them, creating smooth transitions, or animation/motion interpolation tasks.

7. A2VidPipelineTwoStage

Best for: Generating video driven by an input audio.

Source: src/ltx_pipelines/a2vid_two_stage.py

Two-stage audio-to-video generation. Stage 1 generates video at half resolution with audio conditioning (video-only denoising with the audio frozen), then Stage 2 upsamples by 2x and refines the video while keeping the audio fixed, using a distilled LoRA. The input audio is encoded via the audio VAE and used as the initial audio latent, but the original audio waveform is passed through and returned in the output to preserve fidelity. Supports image conditioning and prompt enhancement.

Extra CLI arguments: --audio-path (required), --audio-start-time, --audio-max-duration.

Use when: You have an audio clip and want to generate a matching video, audio-reactive video generation, or music visualization.

8. RetakePipeline

Best for: Regenerating a specific time region of an existing video while keeping the rest unchanged.

Source: src/ltx_pipelines/retake.py

Single-stage generation that encodes the source video and audio into latents, applies a temporal region mask to mark [start_time, end_time] for regeneration, and denoises only the masked region from a text prompt. Content outside the time window is preserved. Supports independent control over video and audio regeneration (regenerate_video, regenerate_audio flags), and can use either the full model with CFG guidance or the distilled model with a fixed sigma schedule.

Extra CLI arguments: --video-path (required), --start-time (required), --end-time (required).

Constraints: Source video frame count must satisfy the 8k+1 format (e.g. 97, 193) and resolution must be multiples of 32.

Use when: You want to re-do a specific section of a generated video (e.g. fix a bad segment), selectively regenerate audio or video in a time window, or iterate on part of a result without re-generating the entire clip.

🎨 Conditioning Types

Pipelines use different conditioning methods from ltx-core for controlling generation. See the ltx-core conditioning documentation for details.

Image Conditioning

All pipelines support image conditioning, but with different methods:

Replacing Latents (image_conditionings_by_replacing_latent):
- Used by: TI2VidOneStagePipeline, TI2VidTwoStagesPipeline, DistilledPipeline, ICLoraPipeline
- Replaces the latent at a specific frame with the encoded image
- Strong control over specific frames
Guiding Latents (image_conditionings_by_adding_guiding_latent):
- Used by: KeyframeInterpolationPipeline
- Adds the image as a guiding signal rather than replacing
- Better for smooth interpolation between keyframes

Video Conditioning

Video Conditioning (ICLoraPipeline only):
- Conditions on entire reference videos
- Useful for video-to-video transformations
- Uses VideoConditionByKeyframeIndex from ltx-core

🎛️ Multimodal Guidance

LTX-2 pipelines use multimodal guidance to steer the diffusion process for both video and audio modalities. Each modality (video, audio) has its own guider with independent parameters, allowing fine-grained control over generation quality and adherence to prompts.

Guidance Parameters

The MultiModalGuiderParams dataclass controls guidance behavior:

Parameter	Description
`cfg_scale`	Classifier-Free Guidance scale. Higher values make the output adhere more strongly to the text prompt. Typical values: 2.0–5.0. Set to 1.0 to disable.
`stg_scale`	Spatio-Temporal Guidance scale. Controls perturbation-based guidance for improved temporal coherence. Typical values: 0.5–1.5. Set to 0.0 to disable.
`stg_blocks`	Which transformer blocks to perturb for STG (e.g., `[29]` for the last block). Set to `[]` to disable STG.
`rescale_scale`	Rescales the guided prediction to match the variance of the conditional prediction. Helps prevent over-saturation. Typical values: 0.5–0.7. Set to 0.0 to disable.
`modality_scale`	Modality CFG scale. Steers the model away from unsynced video and audio results, improving audio-visual coherence. Set to 1.0 to disable.
`skip_step`	Skip guidance every N steps. Can speed up inference with minimal quality loss. Set to 0 to disable (never skip).

How It Works

The multimodal guider combines three guidance signals during each denoising step:

CFG (Text Guidance): Steers generation toward the text prompt by computing (cond - uncond_text).
STG (Perturbation Guidance): Improves structural coherence by perturbing specific transformer blocks and steering away from the perturbed prediction.
Modality CFG: For joint audio-video generation, steers the model away from unsynced video and audio results.

Example Configuration

from ltx_core.components.guiders import MultiModalGuiderParams

# Video guider: moderate CFG, STG enabled, modality isolation
video_guider_params = MultiModalGuiderParams(
    cfg_scale=3.0,
    stg_scale=1.0,
    rescale_scale=0.7,
    modality_scale=3.0,
    stg_blocks=[29],
)

# Audio guider: higher CFG for stronger prompt adherence
audio_guider_params = MultiModalGuiderParams(
    cfg_scale=7.0,
    stg_scale=1.0,
    rescale_scale=0.7,
    modality_scale=3.0,
    stg_blocks=[29],
)

Tip: Start with the default values from constants.py and adjust based on your use case. Higher cfg_scale = stronger prompt adherence but potentially less natural motion; higher stg_scale = better temporal coherence but slower inference (requires extra forward passes).

Tip: When generating video with audio, set modality_scale > 1.0 (e.g., 3.0) to improve audio-visual sync. If generating video-only, set it to 1.0 to disable.

⚡ Optimization Tips

Memory Optimization

FP8 Quantization (Lower Memory Footprint):

For smaller GPU memory footprint, use the --quantization flag and set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.

Two quantization policies are available:

Policy	CLI Flag	Description
FP8 Cast	`--quantization fp8-cast`	Downcasts transformer linear weights to FP8 during loading; upcasts on the fly during inference. No extra dependencies.
FP8 Scaled MM	`--quantization fp8-scaled-mm`	Uses FP8 scaled matrix multiplication via TensorRT-LLM (`tensorrt_llm` must be installed). Best performance on Hopper GPUs.

CLI:

# FP8 Cast (works on any GPU with FP8 support)
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python -m ltx_pipelines.ti2vid_two_stages \
    --quantization fp8-cast --checkpoint-path=...

# FP8 Scaled MM (requires tensorrt_llm, best on Hopper GPUs)
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python -m ltx_pipelines.ti2vid_two_stages \
    --quantization fp8-scaled-mm --checkpoint-path=...

Programmatically:

When authoring custom scripts, pass a QuantizationPolicy to pipeline classes:

from ltx_core.quantization import QuantizationPolicy

pipeline = TI2VidTwoStagesPipeline(
    checkpoint_path=ltx_model_path,
    distilled_lora=distilled_lora,
    spatial_upsampler_path=upsampler_path,
    gemma_root=gemma_root_path,
    loras=[],
    quantization=QuantizationPolicy.fp8_cast(),  # or QuantizationPolicy.fp8_scaled_mm()
)
pipeline(...)

You still need to use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True when launching:

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python my_denoising_pipeline.py

Memory Cleanup Between Stages:

By default, pipelines clean GPU memory (especially transformer weights) between stages. If you have enough memory, you can skip this cleanup to reduce running time:

# In pipeline implementations, memory cleanup happens automatically
# between stages. For custom pipelines, you can skip:
# utils.cleanup_memory()  # Comment out if you have enough VRAM

Denoising Loop Optimization

Gradient Estimation Denoising Loop:

Instead of the standard Euler denoising loop, you can use gradient estimation for fewer steps (~20-30 instead of 40):

from ltx_pipelines.utils import gradient_estimating_euler_denoising_loop

# Use gradient estimation denoising loop
def denoising_loop(sigmas, video_state, audio_state, stepper):
    return gradient_estimating_euler_denoising_loop(
        sigmas=sigmas,
        video_state=video_state,
        audio_state=audio_state,
        stepper=stepper,
        denoise_fn=your_denoise_function,
        ge_gamma=2.0,  # Gradient estimation coefficient
    )

This allows you to use 20-30 steps instead of 40 while maintaining quality. The gradient estimation function is available in pipeline_utils.py.

🔧 Requirements

LTX-2 Model Checkpoint - Local .safetensors file
Gemma Text Encoder - Local Gemma model directory
Spatial Upscaler - Required for two-stage pipelines (except one-stage)
Distilled LoRA - Required for two-stage pipelines (except one-stage and distilled)

📖 Example: Image-to-Video

from ltx_core.loader import LTXV_LORA_COMFY_RENAMING_MAP, LoraPathStrengthAndSDOps
from ltx_pipelines.ti2vid_two_stages import TI2VidTwoStagesPipeline
from ltx_core.components.guiders import MultiModalGuiderParams

distilled_lora = [
    LoraPathStrengthAndSDOps(
        "/path/to/distilled_lora.safetensors",
        0.6,
        LTXV_LORA_COMFY_RENAMING_MAP
    ),
]

pipeline = TI2VidTwoStagesPipeline(
    checkpoint_path="/path/to/checkpoint.safetensors",
    distilled_lora=distilled_lora,
    spatial_upsampler_path="/path/to/upsampler.safetensors",
    gemma_root="/path/to/gemma",
    loras=[],
)

video_guider_params = MultiModalGuiderParams(
    cfg_scale=3.0,
    stg_scale=1.0,
    rescale_scale=0.7,
    modality_scale=3.0,
    skip_step=0,
    stg_blocks=[29],
)

audio_guider_params = MultiModalGuiderParams(
    cfg_scale=7.0,
    stg_scale=1.0,
    rescale_scale=0.7,
    modality_scale=3.0,
    skip_step=0,
    stg_blocks=[29],
)

# Generate video from image
pipeline(
    prompt="A serene landscape with mountains in the background",
    output_path="output.mp4",
    seed=42,
    height=512,
    width=768,
    num_frames=121,
    frame_rate=25.0,
    num_inference_steps=40,
    video_guider_params=video_guider_params,
    audio_guider_params=audio_guider_params,
    images=[ImageConditioningInput("input_image.jpg", 0, 1.0, 33)],  # Image at frame 0, strength 1.0, CRF 33
)

🔗 Related Projects

LTX-Core - Core model implementation and inference components (schedulers, guiders, noisers, patchifiers)
LTX-Trainer - Training and fine-tuning tools

Project details

Release history Release notifications | RSS feed

This version

1.0.0

Mar 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ltx_pipelines-1.0.0.tar.gz (43.2 kB view details)

Uploaded Mar 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ltx_pipelines-1.0.0-py3-none-any.whl (62.3 kB view details)

Uploaded Mar 18, 2026 Python 3

File details

Details for the file ltx_pipelines-1.0.0.tar.gz.

File metadata

Download URL: ltx_pipelines-1.0.0.tar.gz
Upload date: Mar 18, 2026
Size: 43.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ltx_pipelines-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7b69a2697d98aefd2222e83b1f88ef65a4b94b70207b78bf5877e374a5f137b9`
MD5	`a69d0ba6d79056940d0b5069f3e60ad2`
BLAKE2b-256	`e173be7a9dba20a77e6ede5884cd84a4fb47056b9ad9bba8ca25277c8cda34ef`

See more details on using hashes here.

File details

Details for the file ltx_pipelines-1.0.0-py3-none-any.whl.

File metadata

Download URL: ltx_pipelines-1.0.0-py3-none-any.whl
Upload date: Mar 18, 2026
Size: 62.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ltx_pipelines-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2951ece2359de582e558874e9c155bd06dd73220a686fa978107384404f845f1`
MD5	`16419900a38c4f28fa7f09ecfc7baa3a`
BLAKE2b-256	`8bc3995daac35d2d228acdbc54df6829a455a74b23b6469bfd7d3b0cd09876ad`

See more details on using hashes here.

ltx-pipelines 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LTX-2 Pipelines

📋 Overview

🚀 Quick Start

🔧 Installation

Running Pipelines

🎯 Pipeline Selection Guide

Quick Decision Tree

Features Comparison

📦 Available Pipelines

1. TI2VidTwoStagesPipeline

2. TI2VidTwoStagesHQPipeline

3. TI2VidOneStagePipeline

4. DistilledPipeline

5. ICLoraPipeline

6. KeyframeInterpolationPipeline

7. A2VidPipelineTwoStage

8. RetakePipeline

🎨 Conditioning Types

Image Conditioning

Video Conditioning

🎛️ Multimodal Guidance

Guidance Parameters

How It Works

Example Configuration

⚡ Optimization Tips

Memory Optimization

Denoising Loop Optimization

🔧 Requirements

📖 Example: Image-to-Video

🔗 Related Projects

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes