Audio/video chop → analyze toolkit.

These details have not been verified by PyPI

Project links

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Multimedia :: Sound/Audio :: Analysis

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

ChopShop — WIP audio/video "chop → analyze" toolkit

Status: very early work-in-progress. It's sharp around the edges and brittle in spots. I'll keep iterating, but expect breaking changes.

ChopShop's goal is to take input media (e.g., video), split it into constituent streams (video/audio/text), and generate rich features per stream. The current focus is audio — others will be added at a later point as I need them.

Current processes include:

Extract each audio stream from a video container to WAV
Diarize + transcribe multi-speaker audio (wrapping Mahmoud Ashraf's excellent whisper-diarization)
Export a timestamped transcript in multiple formats (CSV/SRT/TXT)
Build per-speaker WAVs from the transcript
Export Whisper encoder embeddings per transcript segment (CTranslate2 / faster-whisper)

Prerequisites

I'll admit, I'm not a guru here. The below items should be taken as "well, it works on my machine..." advice. Key specs that may or may not be important to match include:

OS: Linux (tested on Ubuntu 22.04)
Python: 3.10
GPU stack: CUDA 12.4 + cuDNN 9
FFmpeg binary: must be available on your PATH
- Ubuntu: sudo apt-get install ffmpeg
- macOS: brew install ffmpeg
- Windows: install FFmpeg and add it to PATH

I strongly recommend using a fresh virtual environment (e.g., python -m venv venv && source venv/bin/activate) for everything below.

To verify your CUDA runtime: nvidia-smi should show the driver and CUDA version. I'm only testing against CUDA 12.4 right now. It is unlikely that I will target any other CUDA versions in the foreseeable future. In theory, this will only impact which version of the dependencies you install, but your mileage may vary.

Install

As a note, I'm putting in some effort to get all of this packaged up nicely as a pip installable library. At the moment, you can try this:

pip install "chopshop[diarization,cuda]"

...followed by the three following pip installs:

pip install git+https://github.com/MahmoudAshraf97/demucs.git
pip install git+https://github.com/oliverguhr/deepmultilingualpunctuation.git
pip install git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git

...and lastly, followed by this:

pip install --force-reinstall --no-cache-dir \
  torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
  --index-url https://download.pytorch.org/whl/cu124

Also, again, be sure that you have ffpmeg installed if not already: sudo apt-get update && sudo apt-get install -y ffmpeg

Manual Installation of Dependencies

If you want to skip the potentially problematic .whl file dependencies, you can install manually. Note that it is highly likely that the order matters to avoid dependency pinning conflicts.

# (Recommended) fresh virtualenv
python -m venv venv-chopshop
source venv-chopshop/bin/activate

# 1) Core ASR/diarization deps
pip install "faster-whisper>=1.1.0"
pip install "nemo-toolkit[asr]>=2.dev"
pip install git+https://github.com/MahmoudAshraf97/demucs.git
pip install git+https://github.com/oliverguhr/deepmultilingualpunctuation.git
pip install git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git

# 2) cuDNN user-space libs for CUDA 12
pip install -U nvidia-cudnn-cu12

# (Optional) some systems also install the OS package:
# sudo apt-get -y install cudnn9-cuda-12

# 3) PyTorch built for CUDA 12.4 (exact versions I tested)
pip install --force-reinstall --no-cache-dir \
  torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
  --index-url https://download.pytorch.org/whl/cu124

If you hit cuDNN errors, it usually means the CUDA/cuDNN runtime and the PyTorch wheels don't match. Keep them consistent (CUDA 12.4 with the cu124 wheel index, and cuDNN 9 from either the nvidia-cudnn-cu12 wheel or your OS package).

Quick usage example

The filenames are intentionally generic. Adjust as needed.

from pathlib import Path
from chopshop.ChopShop import ChopShop

cs = ChopShop()

# 1) Split audio streams from a video/container
wav_list = cs.split_audio_streams(
    "sample_video.mp4",
    output_dir="audio/",
    sample_rate=48000,
    bit_depth=16,
    overwrite=True,
)
diar_input_audio = str(wav_list[0])  # pick a stream

# 2) Diarize + transcribe (wrapper around whisper-diarization)
tp = cs.diarize_with_thirdparty(
    input_audio=diar_input_audio,
    out_dir="transcripts/",
    repo_dir="path/to/whisper-diarization",  # if you have a local clone; otherwise leave default
    whisper_model="base",   # use whatever one you like; just be consistent
    language="en",
    device="cuda",
    batch_size=0,
    no_stem=False,
    suppress_numerals=False,
    parallel=False,
    use_custom=True,   # use my custom entry point that also writes a CSV
    keep_temp=False,   # if set to False, it will clean up after itself (good idea)
    num_speakers=2,    # optional hint; can be omitted to let diarization infer
)

# 3) Make per-speaker WAVs using the transcript CSV
cs.split_wav_by_speaker(
    source_wav=diar_input_audio,
    transcript_csv=tp.raw_files["csv"],
    out_dir="audio_split/",
    time_unit="ms",    # the transcript CSV uses milliseconds
    silence_ms=500,   # add 0.5s before/after each clip (1s between clips)
)

# 4) Export Whisper encoder embeddings per transcript segment
cs.export_embeddings(
    transcript_csv=tp.raw_files["csv"],
    source_wav=diar_input_audio,
    output_dir="whisper_embed/",
    model_name="base",
    device="cuda",
    compute_type="float16",  # good default on GPU
    time_unit="ms",
    run_in_subprocess=True,  # isolates the encoder to avoid cuDNN conflicts
)

Outputs (typical):

transcripts/… — CSV (timestamped), SRT, and TXT
audio_split/… — one WAV per detected speaker
whisper_embed/<source_stem>_embeddings.csv — one row per transcript span with e0..eN encoder features

Troubleshooting

Unable to load any of {libcudnn_ops.so…} The process can't find cuDNN shared libs. Ensure nvidia-cudnn-cu12 is installed and that your LD_LIBRARY_PATH includes its lib directory.
CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH Version mismatch between runtime cuDNN and the PyTorch wheels. Reinstall the PyTorch wheels for your CUDA minor version (e.g., cu124) and keep cuDNN aligned.
Diarization is slow / OOM
- Keep batch_size=0 (pipeline default)
- Skip source separation with no_stem=True
- Provide num_speakers if you know it (e.g., 2)
Embeddings CSV is empty
- Make sure you pass the correct time_unit for your transcript (I use ms which should be the default)
- Enable extractor debug logs by setting CHOPSHOP_DEBUG=1 in your environment to see row counts and shapes

Acknowledgments

Huge thanks to Mahmoud Ashraf for the outstanding whisper-diarization project, which I modify and wrap for diarization.

Roadmap

TBD. One of the big ones is to unify everything with defaults into a single function for the ChopShop() class.

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Multimedia :: Sound/Audio :: Analysis

Release history Release notifications | RSS feed

0.0.6

Sep 21, 2025

0.0.5

Sep 17, 2025

This version

0.0.2

Sep 12, 2025

0.0.1

Sep 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chopshop-0.0.2.tar.gz (393.2 kB view details)

Uploaded Sep 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chopshop-0.0.2-py3-none-any.whl (397.6 kB view details)

Uploaded Sep 12, 2025 Python 3

File details

Details for the file chopshop-0.0.2.tar.gz.

File metadata

Download URL: chopshop-0.0.2.tar.gz
Upload date: Sep 12, 2025
Size: 393.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for chopshop-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`256ad21584b786b0fa0699196eb5895d79091b1457d6592cf40b1417268ca448`
MD5	`f6f530c45919f3ad2d459fe38c2eb04b`
BLAKE2b-256	`167a484bb19a04891d92e64bff9a005e8724bff424ebc0f7b4f045e1ae8d31ac`

See more details on using hashes here.

File details

Details for the file chopshop-0.0.2-py3-none-any.whl.

File metadata

Download URL: chopshop-0.0.2-py3-none-any.whl
Upload date: Sep 12, 2025
Size: 397.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for chopshop-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7647a26463e813566f804ea06940ea3f36026d43288c8b27fbac177e7a8d925c`
MD5	`dad4136d1aa0f2acde88067765e8a0e7`
BLAKE2b-256	`8b259c2ece734128a9ae3ae2e5c63af682c6160977f28f2fb64f4d7ca94acfc3`

See more details on using hashes here.

chopshop 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ChopShop — WIP audio/video "chop → analyze" toolkit

Prerequisites

Install

Manual Installation of Dependencies

Quick usage example

Troubleshooting

Acknowledgments

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes