Skip to main content

Lightweight Python package to trim RTTM diarization files and audio files

Project description

rtrimmer

Lightweight Python package to trim RTTM diarization files and optionally audio files to a user-specified time range.

Features

  • Trim RTTM files to a specified time range
  • Adjust segment durations if they overlap the max duration
  • Optionally trim audio files using ffmpeg
  • Batch support for folders
  • CLI and Python API
  • Logging and input validation

Installation

pip install rtrimmer

Usage

CLI

Trim an RTTM file to the first 5 minutes (300 seconds):

rttm-trim --rttm input.rttm --output-rttm trimmed.rttm --duration 300

Trim both RTTM and audio file:

rttm-trim --rttm input.rttm --audio input.wav --output-rttm trimmed.rttm --output-audio trimmed.wav --duration 300

Batch trim all RTTM files in a folder:

rttm-trim --rttm-folder ./rttms --output-folder ./trimmed_rttms --duration 300

Trim starting from a specific time point (e.g., 60 seconds in) for 5 minutes:

rttm-trim --rttm input.rttm --output-rttm trimmed.rttm --start-time 60 --duration 300

Python API

from rtrimmer import trim_rttm, trim_audio, trim_rttm_folder

# Trim a single RTTM file
trim_rttm("session1.rttm", "session1_trimmed.rttm", max_duration=300)

# Trim starting from 60 seconds in
trim_rttm("session1.rttm", "session1_trimmed.rttm", max_duration=300, min_time=60)

# Trim audio file
trim_audio("session1.wav", "session1_trimmed.wav", duration=300, start_time=0)

# Batch process a folder of RTTM files
results = trim_rttm_folder("./rttms", "./trimmed_rttms", max_duration=300)
print(f"Processed {len(results)} files")

How It Works

RTTM Trimming Logic

The package handles various edge cases when trimming RTTM files:

  1. Segments fully within the target range: Kept as is
  2. Segments starting before the target range but extending into it: Start time adjusted to the beginning of the target range, duration shortened accordingly
  3. Segments extending beyond the target range: Duration shortened to end at the target range boundary
  4. Segments outside the target range: Excluded from the output

Audio Trimming

Audio trimming is performed using ffmpeg, which must be installed and available in your PATH. The package attempts to use the copy codec for faster processing, but falls back to re-encoding if needed.

Requirements

  • Python 3.8+
  • ffmpeg (for audio trimming, must be installed and in PATH)

Example

Suppose you have a diarization RTTM file and a corresponding WAV file for a 1-hour meeting, but you only want the first 5 minutes:

rttm-trim --rttm meeting.rttm --audio meeting.wav --output-rttm meeting_5min.rttm --output-audio meeting_5min.wav --duration 300

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtrimmer-0.1.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtrimmer-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file rtrimmer-0.1.0.tar.gz.

File metadata

  • Download URL: rtrimmer-0.1.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for rtrimmer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f5c7be10d3157f2d2eb5edd426d7179b169d369f01b770c5556e0d4043e91ded
MD5 bd91e7a8106c364d09f18c2dbee0ea69
BLAKE2b-256 451be533f953339ff4bb480eabb26f0692ff1f99d1def92d3c62585c2dcf1b08

See more details on using hashes here.

File details

Details for the file rtrimmer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rtrimmer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for rtrimmer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 101c210d71c3af39d29d762ed4ebc3c194d607ca89b623aee60b3b5481fd8cb6
MD5 e86680d6c115f30916a8117081af32f7
BLAKE2b-256 75ee733467a6350f176c52676f703de32c46df31e9a566d499f56cac6c096ab3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page