Pipelines implementation for Lightricks' LTX-2 model
Project description
LTX-2 Pipelines
High-level pipeline implementations for generating audio-video content with Lightricks' LTX-2 model. This package provides ready-to-use pipelines for text-to-video, image-to-video, video-to-video, and keyframe interpolation tasks.
Pipelines are built using building blocks from ltx-core (schedulers, guiders, noisers, patchifiers) and handle the complete inference flow including model loading, encoding, decoding, and file I/O.
๐ Overview
LTX-2 Pipelines provides production-ready implementations that abstract away the complexity of the diffusion process, model loading, and memory management. Each pipeline is optimized for specific use cases and offers different trade-offs between speed, quality, and memory usage.
Key Features:
- ๐ฌ Multiple Pipeline Types: Text-to-video, image-to-video, video-to-video, audio-to-video, keyframe interpolation, and retake
- โก Optimized Performance: Support for FP8 transformers, gradient estimation, and memory optimization
- ๐ฏ Production Ready: Two-stage pipelines for best quality output
- ๐ง LoRA Support: Easy integration with trained LoRA adapters
- ๐ฆ Self-Contained: Handles model loading, encoding, decoding, and file I/O
- ๐ CLI Support: All pipelines can be run as command-line scripts
๐ Quick Start
ltx-pipelines provides ready-made inference pipelines for text-to-video, image-to-video, video-to-video, audio-to-video, keyframe interpolation, and retake. Built using building blocks from ltx-core, these pipelines handle the complete inference flow including model loading, encoding, decoding, and file I/O.
๐ง Installation
# From the repository root
uv sync --frozen
# Or install as a package
pip install -e packages/ltx-pipelines
Running Pipelines
All pipelines can be run directly from the command line. Each pipeline module is executable:
# Run a pipeline (example: two-stage text-to-video)
python -m ltx_pipelines.ti2vid_two_stages \
--checkpoint-path path/to/checkpoint.safetensors \
--distilled-lora path/to/distilled_lora.safetensors 0.8 \
--spatial-upsampler-path path/to/upsampler.safetensors \
--gemma-root path/to/gemma \
--prompt "A beautiful sunset over the ocean" \
--output-path output.mp4
# View all available options for any pipeline
python -m ltx_pipelines.ti2vid_two_stages --help
Available pipeline modules:
ltx_pipelines.ti2vid_two_stages- Two-stage text/image-to-video (recommended).ltx_pipelines.ti2vid_two_stages_hq- Two-stage text/image-to-video (different sampler, better quality).ltx_pipelines.ti2vid_one_stage- Single-stage text/image-to-video.ltx_pipelines.distilled- Fast text/image-to-video pipeline using only the distilled model.ltx_pipelines.ic_lora- Video-to-video with IC-LoRA.ltx_pipelines.keyframe_interpolation- Keyframe interpolation.ltx_pipelines.a2vid_two_stage- Audio-to-video generation conditioned on an input audio.ltx_pipelines.retake- Regenerate a time region of an existing video.
Use --help with any pipeline module to see all available options and parameters.
๐ฏ Pipeline Selection Guide
Quick Decision Tree
Do you have an existing video to modify?
โโ YES โ Use RetakePipeline (regenerate a specific time region)
โ
Do you have an audio file to drive generation?
โโ YES โ Use A2VidPipelineTwoStage (audio-to-video)
โ
Do you need to condition on existing images/videos?
โโ YES โ Do you have reference videos for video-to-video?
โ โโ YES โ Use ICLoraPipeline
โ โโ NO โ Do you have multiple keyframe images to interpolate?
โ โโ YES โ Use KeyframeInterpolationPipeline
โ โโ NO โ Use TI2VidTwoStagesPipeline (image conditioning only)
โ
โโ NO โ Text-to-video only
โโ Do you need best quality?
โ โโ YES โ Use TI2VidTwoStagesPipeline (recommended for production)
โ
โโ Do you need fastest inference?
โโ YES โ Use DistilledPipeline (with 8 predefined sigmas)
Note:
TI2VidOneStagePipelineis primarily for educational purposes. For best quality, use two-stage pipelines (TI2VidTwoStagesPipeline,TI2VidTwoStagesHQPipeline,ICLoraPipeline,KeyframeInterpolationPipeline,A2VidPipelineTwoStage, orDistilledPipeline). For editing existing videos, useRetakePipeline.
Features Comparison
| Pipeline | Stages | Multimodal Guidance | Upsampling | Conditioning | Best For |
|---|---|---|---|---|---|
| TI2VidTwoStagesPipeline | 2 | โ | โ | Image | Production quality (recommended) |
| TI2VidTwoStagesHQPipeline | 2 | โ | โ | Image | Same as above, res_2s sampler (higher quality) |
| TI2VidOneStagePipeline | 1 | โ | โ | Image | Educational, prototyping |
| DistilledPipeline | 2 | โ | โ | Image | Fastest inference (8 sigmas) |
| ICLoraPipeline | 2 | โ | โ | Image + Video | Video-to-video transformations |
| KeyframeInterpolationPipeline | 2 | โ | โ | Keyframes | Animation, interpolation |
| A2VidPipelineTwoStage | 2 | โ | โ | Audio + Image | Audio-driven video generation |
| RetakePipeline | 1 | โ | โ | Source Video | Regenerating a time region of a video |
๐ฆ Available Pipelines
1. TI2VidTwoStagesPipeline
Best for: High-quality text/image-to-video generation with upsampling. Recommended for production use.
Source: src/ltx_pipelines/ti2vid_two_stages.py
Two-stage generation: Stage 1 generates low-resolution video with multimodal guidance, Stage 2 upsamples to 2x resolution with distilled LoRA refinement. Supports image conditioning. Highest quality output, slower than one-stage but significantly better quality.
Use when: Production-quality video generation, higher resolution needed, quality over speed, text-to-video with image conditioning.
2. TI2VidTwoStagesHQPipeline
Best for: Same two-stage text/image-to-video as TI2VidTwoStagesPipeline but with a different sampler and step count.
Source: src/ltx_pipelines/ti2vid_two_stages_hq.py
Uses the res_2s second-order sampler instead of Euler. Same stage structure (stage 1 at target resolution with CFG, stage 2 upsampling with distilled LoRA) and image conditioning support. Typically allows fewer steps for comparable quality; trade-offs differ from the default Euler-based pipeline.
Use when: You want the same two-stage workflow with fewer steps or prefer the res_2s sampling behavior.
3. TI2VidOneStagePipeline
Best for: Educational purposes and quick prototyping.
Source: src/ltx_pipelines/ti2vid_one_stage.py
โ ๏ธ Important: This pipeline is primarily for educational purposes. For production-quality results, use
TI2VidTwoStagesPipelineor other two-stage pipelines.
Single-stage generation (no upsampling) with multimodal guidance and image conditioning support. Faster inference but lower resolution output (typically 512x768).
Use when: Learning how the pipeline works, quick prototyping, testing, or when high resolution is not needed.
4. DistilledPipeline
Best for: Fastest inference with good quality using a distilled model with predefined sigma schedule.
Source: src/ltx_pipelines/distilled.py
Two-stage generation with 8 predefined sigmas (8 steps in stage 1, 4 steps in stage 2). No guidance required. Fastest inference among all pipelines. Supports image conditioning. Requires spatial upsampler.
Use when: Fastest inference is critical, batch processing many videos, or when you have a distilled model checkpoint.
5. ICLoraPipeline
Best for: Video-to-video and image-to-video transformations using IC-LoRA.
Source: src/ltx_pipelines/ic_lora.py
Two-stage generation with IC-LoRA support. Can condition on reference videos (video-to-video) or images at specific frames. CFG guidance in stage 1, upsampling in stage 2. Requires IC-LoRA trained model.
Note: ICLoraPipeline can only be used with a distilled model.
Use when: Video-to-video transformations, image-to-video with strong control, or when you have reference videos to guide generation.
6. KeyframeInterpolationPipeline
Best for: Generating videos by interpolating between keyframe images.
Source: src/ltx_pipelines/keyframe_interpolation.py
Two-stage generation with keyframe interpolation. Uses guiding latents (additive conditioning) instead of replacing latents for smoother transitions. Multimodal guidance in stage 1, upsampling in stage 2.
Use when: You have keyframe images and want to interpolate between them, creating smooth transitions, or animation/motion interpolation tasks.
7. A2VidPipelineTwoStage
Best for: Generating video driven by an input audio.
Source: src/ltx_pipelines/a2vid_two_stage.py
Two-stage audio-to-video generation. Stage 1 generates video at half resolution with audio conditioning (video-only denoising with the audio frozen), then Stage 2 upsamples by 2x and refines the video while keeping the audio fixed, using a distilled LoRA. The input audio is encoded via the audio VAE and used as the initial audio latent, but the original audio waveform is passed through and returned in the output to preserve fidelity. Supports image conditioning and prompt enhancement.
Extra CLI arguments: --audio-path (required), --audio-start-time, --audio-max-duration.
Use when: You have an audio clip and want to generate a matching video, audio-reactive video generation, or music visualization.
8. RetakePipeline
Best for: Regenerating a specific time region of an existing video while keeping the rest unchanged.
Source: src/ltx_pipelines/retake.py
Single-stage generation that encodes the source video and audio into latents, applies a temporal region mask to mark [start_time, end_time] for regeneration, and denoises only the masked region from a text prompt. Content outside the time window is preserved. Supports independent control over video and audio regeneration (regenerate_video, regenerate_audio flags), and can use either the full model with CFG guidance or the distilled model with a fixed sigma schedule.
Extra CLI arguments: --video-path (required), --start-time (required), --end-time (required).
Constraints: Source video frame count must satisfy the 8k+1 format (e.g. 97, 193) and resolution must be multiples of 32.
Use when: You want to re-do a specific section of a generated video (e.g. fix a bad segment), selectively regenerate audio or video in a time window, or iterate on part of a result without re-generating the entire clip.
๐จ Conditioning Types
Pipelines use different conditioning methods from ltx-core for controlling generation. See the ltx-core conditioning documentation for details.
Image Conditioning
All pipelines support image conditioning, but with different methods:
-
Replacing Latents (
image_conditionings_by_replacing_latent):- Used by:
TI2VidOneStagePipeline,TI2VidTwoStagesPipeline,DistilledPipeline,ICLoraPipeline - Replaces the latent at a specific frame with the encoded image
- Strong control over specific frames
- Used by:
-
Guiding Latents (
image_conditionings_by_adding_guiding_latent):- Used by:
KeyframeInterpolationPipeline - Adds the image as a guiding signal rather than replacing
- Better for smooth interpolation between keyframes
- Used by:
Video Conditioning
- Video Conditioning (ICLoraPipeline only):
- Conditions on entire reference videos
- Useful for video-to-video transformations
- Uses
VideoConditionByKeyframeIndexfromltx-core
๐๏ธ Multimodal Guidance
LTX-2 pipelines use multimodal guidance to steer the diffusion process for both video and audio modalities. Each modality (video, audio) has its own guider with independent parameters, allowing fine-grained control over generation quality and adherence to prompts.
Guidance Parameters
The MultiModalGuiderParams dataclass controls guidance behavior:
| Parameter | Description |
|---|---|
cfg_scale |
Classifier-Free Guidance scale. Higher values make the output adhere more strongly to the text prompt. Typical values: 2.0โ5.0. Set to 1.0 to disable. |
stg_scale |
Spatio-Temporal Guidance scale. Controls perturbation-based guidance for improved temporal coherence. Typical values: 0.5โ1.5. Set to 0.0 to disable. |
stg_blocks |
Which transformer blocks to perturb for STG (e.g., [29] for the last block). Set to [] to disable STG. |
rescale_scale |
Rescales the guided prediction to match the variance of the conditional prediction. Helps prevent over-saturation. Typical values: 0.5โ0.7. Set to 0.0 to disable. |
modality_scale |
Modality CFG scale. Steers the model away from unsynced video and audio results, improving audio-visual coherence. Set to 1.0 to disable. |
skip_step |
Skip guidance every N steps. Can speed up inference with minimal quality loss. Set to 0 to disable (never skip). |
How It Works
The multimodal guider combines three guidance signals during each denoising step:
- CFG (Text Guidance): Steers generation toward the text prompt by computing
(cond - uncond_text). - STG (Perturbation Guidance): Improves structural coherence by perturbing specific transformer blocks and steering away from the perturbed prediction.
- Modality CFG: For joint audio-video generation, steers the model away from unsynced video and audio results.
Example Configuration
from ltx_core.components.guiders import MultiModalGuiderParams
# Video guider: moderate CFG, STG enabled, modality isolation
video_guider_params = MultiModalGuiderParams(
cfg_scale=3.0,
stg_scale=1.0,
rescale_scale=0.7,
modality_scale=3.0,
stg_blocks=[29],
)
# Audio guider: higher CFG for stronger prompt adherence
audio_guider_params = MultiModalGuiderParams(
cfg_scale=7.0,
stg_scale=1.0,
rescale_scale=0.7,
modality_scale=3.0,
stg_blocks=[29],
)
Tip: Start with the default values from
constants.pyand adjust based on your use case. Highercfg_scale= stronger prompt adherence but potentially less natural motion; higherstg_scale= better temporal coherence but slower inference (requires extra forward passes).Tip: When generating video with audio, set
modality_scale> 1.0 (e.g., 3.0) to improve audio-visual sync. If generating video-only, set it to 1.0 to disable.
โก Optimization Tips
Memory Optimization
FP8 Quantization (Lower Memory Footprint):
For smaller GPU memory footprint, use the --quantization flag and set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.
Two quantization policies are available:
| Policy | CLI Flag | Description |
|---|---|---|
| FP8 Cast | --quantization fp8-cast |
Downcasts transformer linear weights to FP8 during loading; upcasts on the fly during inference. No extra dependencies. |
| FP8 Scaled MM | --quantization fp8-scaled-mm |
Uses FP8 scaled matrix multiplication via TensorRT-LLM (tensorrt_llm must be installed). Best performance on Hopper GPUs. |
CLI:
# FP8 Cast (works on any GPU with FP8 support)
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python -m ltx_pipelines.ti2vid_two_stages \
--quantization fp8-cast --checkpoint-path=...
# FP8 Scaled MM (requires tensorrt_llm, best on Hopper GPUs)
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python -m ltx_pipelines.ti2vid_two_stages \
--quantization fp8-scaled-mm --checkpoint-path=...
Programmatically:
When authoring custom scripts, pass a QuantizationPolicy to pipeline classes:
from ltx_core.quantization import QuantizationPolicy
pipeline = TI2VidTwoStagesPipeline(
checkpoint_path=ltx_model_path,
distilled_lora=distilled_lora,
spatial_upsampler_path=upsampler_path,
gemma_root=gemma_root_path,
loras=[],
quantization=QuantizationPolicy.fp8_cast(), # or QuantizationPolicy.fp8_scaled_mm()
)
pipeline(...)
You still need to use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True when launching:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python my_denoising_pipeline.py
Memory Cleanup Between Stages:
By default, pipelines clean GPU memory (especially transformer weights) between stages. If you have enough memory, you can skip this cleanup to reduce running time:
# In pipeline implementations, memory cleanup happens automatically
# between stages. For custom pipelines, you can skip:
# utils.cleanup_memory() # Comment out if you have enough VRAM
Denoising Loop Optimization
Gradient Estimation Denoising Loop:
Instead of the standard Euler denoising loop, you can use gradient estimation for fewer steps (~20-30 instead of 40):
from ltx_pipelines.utils import gradient_estimating_euler_denoising_loop
# Use gradient estimation denoising loop
def denoising_loop(sigmas, video_state, audio_state, stepper):
return gradient_estimating_euler_denoising_loop(
sigmas=sigmas,
video_state=video_state,
audio_state=audio_state,
stepper=stepper,
denoise_fn=your_denoise_function,
ge_gamma=2.0, # Gradient estimation coefficient
)
This allows you to use 20-30 steps instead of 40 while maintaining quality. The gradient estimation function is available in pipeline_utils.py.
๐ง Requirements
- LTX-2 Model Checkpoint - Local
.safetensorsfile - Gemma Text Encoder - Local Gemma model directory
- Spatial Upscaler - Required for two-stage pipelines (except one-stage)
- Distilled LoRA - Required for two-stage pipelines (except one-stage and distilled)
๐ Example: Image-to-Video
from ltx_core.loader import LTXV_LORA_COMFY_RENAMING_MAP, LoraPathStrengthAndSDOps
from ltx_pipelines.ti2vid_two_stages import TI2VidTwoStagesPipeline
from ltx_core.components.guiders import MultiModalGuiderParams
distilled_lora = [
LoraPathStrengthAndSDOps(
"/path/to/distilled_lora.safetensors",
0.6,
LTXV_LORA_COMFY_RENAMING_MAP
),
]
pipeline = TI2VidTwoStagesPipeline(
checkpoint_path="/path/to/checkpoint.safetensors",
distilled_lora=distilled_lora,
spatial_upsampler_path="/path/to/upsampler.safetensors",
gemma_root="/path/to/gemma",
loras=[],
)
video_guider_params = MultiModalGuiderParams(
cfg_scale=3.0,
stg_scale=1.0,
rescale_scale=0.7,
modality_scale=3.0,
skip_step=0,
stg_blocks=[29],
)
audio_guider_params = MultiModalGuiderParams(
cfg_scale=7.0,
stg_scale=1.0,
rescale_scale=0.7,
modality_scale=3.0,
skip_step=0,
stg_blocks=[29],
)
# Generate video from image
pipeline(
prompt="A serene landscape with mountains in the background",
output_path="output.mp4",
seed=42,
height=512,
width=768,
num_frames=121,
frame_rate=25.0,
num_inference_steps=40,
video_guider_params=video_guider_params,
audio_guider_params=audio_guider_params,
images=[ImageConditioningInput("input_image.jpg", 0, 1.0, 33)], # Image at frame 0, strength 1.0, CRF 33
)
๐ Related Projects
- LTX-Core - Core model implementation and inference components (schedulers, guiders, noisers, patchifiers)
- LTX-Trainer - Training and fine-tuning tools
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ltx_pipelines-1.0.0.tar.gz.
File metadata
- Download URL: ltx_pipelines-1.0.0.tar.gz
- Upload date:
- Size: 43.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b69a2697d98aefd2222e83b1f88ef65a4b94b70207b78bf5877e374a5f137b9
|
|
| MD5 |
a69d0ba6d79056940d0b5069f3e60ad2
|
|
| BLAKE2b-256 |
e173be7a9dba20a77e6ede5884cd84a4fb47056b9ad9bba8ca25277c8cda34ef
|
File details
Details for the file ltx_pipelines-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ltx_pipelines-1.0.0-py3-none-any.whl
- Upload date:
- Size: 62.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2951ece2359de582e558874e9c155bd06dd73220a686fa978107384404f845f1
|
|
| MD5 |
16419900a38c4f28fa7f09ecfc7baa3a
|
|
| BLAKE2b-256 |
8bc3995daac35d2d228acdbc54df6829a455a74b23b6469bfd7d3b0cd09876ad
|