Data preparation for speech processing models training.

These details have not been verified by PyPI

Project description

Lhotse

Lhotse is a Python library aiming to make speech and audio data preparation flexible and accessible to a wider community. Alongside k2, it is a part of the next generation Kaldi speech processing library.

⚠️ Lhotse is not fully stable yet - while many features are already implemented, the APIs are still subject to change! ⚠️

About

Main goals

Attract a wider community to speech processing tasks with a Python-centric design.
Accommodate experienced Kaldi users with an expressive command-line interface.
Provide standard data preparation recipes for commonly used corpora.
Provide PyTorch Dataset classes for speech and audio related tasks.
Flexible data preparation for model training with the notion of audio cuts.
Efficiency, especially in terms of I/O bandwidth and storage capacity.

Main ideas

Like Kaldi, Lhotse provides standard data preparation recipes, but extends that with a seamless PyTorch integration through task-specific Dataset classes. The data and meta-data are represented in human-readable text manifests and exposed to the user through convenient Python classes.

Lhotse introduces the notion of audio cuts, designed to ease the training data construction with operations such as mixing, truncation and padding that are performed on-the-fly to minimize the amount of storage required. Data augmentation and feature extraction are supported both in pre-computed mode, with highly-compressed feature matrices stored on disk, and on-the-fly mode that computes the transformations upon request. Additionally, Lhotse introduces feature-space cut mixing to make the best of both worlds.

Installation

Lhotse supports Python version 3.6 and later.

Pip

Lhotse is available on PyPI:

pip install lhotse

To install the latest, unreleased version, do:

pip install git+https://github.com/lhotse-speech/lhotse

Development installation

For development installation, you can fork/clone the GitHub repo and install with pip:

git clone https://github.com/lhotse-speech/lhotse
cd lhotse
pip install -e '.[dev]'

# Running unit tests
pytest test

This is an editable installation (-e option), meaning that your changes to the source code are automatically reflected when importing lhotse (no re-install needed). The [dev] part means you're installing extra dependencies that are used to run tests, build documentation or launch jupyter notebooks.

Examples

We have example recipes showing how to prepare data and load it in Python as a PyTorch Dataset. They are located in the examples directory.

A short snippet to show how Lhotse can make audio data prepartion quick and easy:

from torch.utils.data import DataLoader
from lhotse import CutSet, Fbank
from lhotse.dataset import VadDataset, SingleCutSampler
from lhotse.recipes import prepare_switchboard

# Prepare data manifests from a raw corpus distribution.
# The RecordingSet describes the metadata about audio recordings;
# the sampling rate, number of channels, duration, etc.
# The SupervisionSet describes metadata about supervision segments:
# the transcript, speaker, language, and so on.
swbd = prepare_switchboard('/export/corpora3/LDC/LDC97S62')

# CutSet is the workhorse of Lhotse, allowing for flexible data manipulation.
# We create 5-second cuts by traversing SWBD recordings in windows.
# No audio data is actually loaded into memory or stored to disk at this point.  
cuts = CutSet.from_manifests(
    recordings=swbd['recordings'],
    supervisions=swbd['supervisions']
).cut_into_windows(duration=5)

# We compute the log-Mel filter energies and store them on disk;
# Then, we pad the cuts to 5 seconds to ensure all cuts are of equal length,
# as the last window in each recording might have a shorter duration.
# The padding will be performed once the features are loaded into memory.
cuts = cuts.compute_and_store_features(
    extractor=Fbank(),
    storage_path='feats',
    num_jobs=8
).pad(duration=5.0)

# Construct a Pytorch Dataset class for Voice Activity Detection task:
dataset = VadDataset(cuts)
sampler = SingleCutSampler(cuts)
dataloader = DataLoader(dataset, sampler=sampler, batch_size=None)
batch = next(iter(dataloader))

The VadDataset will yield a batch with pairs of feature and supervision tensors such as the following - the speech starts roughly at the first second (100 frames):

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.32.2

Jan 14, 2026

1.32.1

Nov 24, 2025

1.32.0

Nov 21, 2025

1.31.1

Sep 18, 2025

1.30.3

May 15, 2025

1.30.2

Apr 28, 2025

1.30.1

Apr 21, 2025

1.30.0

Mar 19, 2025

1.29.0

Dec 13, 2024

1.28.0

Nov 19, 2024

1.27.0

Aug 22, 2024

1.26.0

Jul 26, 2024

1.25.0

Jul 18, 2024

1.24.2

Jun 25, 2024

1.24.1

Jun 10, 2024

1.24.0

Jun 5, 2024

1.23.0

Apr 30, 2024

1.22.0

Mar 7, 2024

1.21.0

Feb 13, 2024

1.20.0

Jan 31, 2024

1.19.2

Jan 4, 2024

1.19.1

Jan 3, 2024

1.19.0

Jan 2, 2024

1.18.0

Dec 11, 2023

1.17.0

Oct 8, 2023

1.16.0

Aug 11, 2023

1.15.0

May 27, 2023

1.14.0

Apr 27, 2023

1.13.0

Mar 23, 2023

1.12.0

Jan 17, 2023

1.11.0

Dec 8, 2022

1.10.0

Nov 16, 2022

1.9.0

Oct 20, 2022

1.8.0

Sep 30, 2022

1.7.0

Sep 12, 2022

1.6.0

Aug 27, 2022

1.5.0

Aug 9, 2022

1.4.0

Jul 7, 2022

1.3.0

Jun 11, 2022

1.2.0

May 19, 2022

1.1.0

May 3, 2022

1.0.0

Apr 6, 2022

0.12.0

Nov 12, 2021

0.11.0

Nov 3, 2021

0.10.0

Oct 14, 2021

0.9.0

Sep 27, 2021

0.8.0

Aug 26, 2021

0.7.0

Jun 30, 2021

This version

0.6.0

Apr 26, 2021

0.5.0

Feb 27, 2021

0.4.0

Jan 12, 2021

0.3.0

Dec 9, 2020

0.2.2

Nov 19, 2020

0.2.1

Nov 18, 2020

0.2.0

Nov 18, 2020

0.1.1

Nov 3, 2020

0.1

Oct 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lhotse-0.6.0.tar.gz (176.3 kB view details)

Uploaded Apr 26, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lhotse-0.6.0-py3-none-any.whl (244.3 kB view details)

Uploaded Apr 26, 2021 Python 3

File details

Details for the file lhotse-0.6.0.tar.gz.

File metadata

Download URL: lhotse-0.6.0.tar.gz
Upload date: Apr 26, 2021
Size: 176.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for lhotse-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`bc219037ddb6d0268d6a1733e4cf2e428283829765fdee13c5354dde7a960174`
MD5	`36badfd9a6cfad93dc5f7e83266fd179`
BLAKE2b-256	`cd91640eef1e7fb74524b7265c41f31047f7f893ff392f982391666aa7be6f50`

See more details on using hashes here.

File details

Details for the file lhotse-0.6.0-py3-none-any.whl.

File metadata

Download URL: lhotse-0.6.0-py3-none-any.whl
Upload date: Apr 26, 2021
Size: 244.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for lhotse-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9762e861334a864abeef9650644bb929672c18021e9c9750d9b5ff36190333b9`
MD5	`7677f2af1fa8addc7899aedbfa223634`
BLAKE2b-256	`f21de264c64987882c7463bf65ef68a5d2d2c81602878dd4f87c343f74b59a68`

See more details on using hashes here.

lhotse 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Lhotse

About

Main goals

Main ideas

Installation

Pip

Development installation

Examples

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes