Skip to main content

Audio/video chop → analyze toolkit.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

ChopShop — WIP audio/video "chop → analyze" toolkit

Status: very early work-in-progress. It's sharp around the edges and brittle in spots. I'll keep iterating, but expect breaking changes.

ChopShop's goal is to take input media (e.g., video), split it into constituent streams (video/audio/text), and generate rich features per stream. The current focus is audio — others will be added at a later point as I need them.

Current processes include:

  • Extract each audio stream from a video container to WAV
  • Diarize + transcribe multi-speaker audio (wrapping Mahmoud Ashraf's excellent whisper-diarization)
  • Export a timestamped transcript in multiple formats (CSV/SRT/TXT)
  • Build per-speaker WAVs from the transcript
  • Export Whisper encoder embeddings per transcript segment (CTranslate2 / faster-whisper)

Prerequisites

I'll admit, I'm not a guru here. The below items should be taken as "well, it works on my machine..." advice. Key specs that may or may not be important to match include:

  • OS: Linux (tested on Ubuntu 22.04)

  • Python: 3.10

  • GPU stack: CUDA 12.4 + cuDNN 9

  • FFmpeg binary: must be available on your PATH

    • Ubuntu: sudo apt-get install ffmpeg
    • macOS: brew install ffmpeg
    • Windows: install FFmpeg and add it to PATH

I strongly recommend using a fresh virtual environment (e.g., python -m venv venv && source venv/bin/activate) for everything below.

To verify your CUDA runtime: nvidia-smi should show the driver and CUDA version. I'm only testing against CUDA 12.4 right now. It is unlikely that I will target any other CUDA versions in the foreseeable future. In theory, this will only impact which version of the dependencies you install, but your mileage may vary.


Install

As a note, I'm putting in some effort to get all of this packaged up nicely as a pip installable library. At the moment, you can try this:

pip install "chopshop[diarization,cuda]"

...followed by the three following pip installs:

pip install git+https://github.com/MahmoudAshraf97/demucs.git
pip install git+https://github.com/oliverguhr/deepmultilingualpunctuation.git
pip install git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git

...and lastly, followed by this:

pip install --force-reinstall --no-cache-dir \
  torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
  --index-url https://download.pytorch.org/whl/cu124

Also, again, be sure that you have ffpmeg installed if not already: sudo apt-get update && sudo apt-get install -y ffmpeg

Manual Installation of Dependencies

If you want to skip the potentially problematic .whl file dependencies, you can install manually. Note that it is highly likely that the order matters to avoid dependency pinning conflicts.

# (Recommended) fresh virtualenv
python -m venv venv-chopshop
source venv-chopshop/bin/activate

# 1) Core ASR/diarization deps
pip install "faster-whisper>=1.1.0"
pip install "nemo-toolkit[asr]>=2.dev"
pip install git+https://github.com/MahmoudAshraf97/demucs.git
pip install git+https://github.com/oliverguhr/deepmultilingualpunctuation.git
pip install git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git

# 2) cuDNN user-space libs for CUDA 12
pip install -U nvidia-cudnn-cu12

# (Optional) some systems also install the OS package:
# sudo apt-get -y install cudnn9-cuda-12

# 3) PyTorch built for CUDA 12.4 (exact versions I tested)
pip install --force-reinstall --no-cache-dir \
  torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
  --index-url https://download.pytorch.org/whl/cu124

If you hit cuDNN errors, it usually means the CUDA/cuDNN runtime and the PyTorch wheels don't match. Keep them consistent (CUDA 12.4 with the cu124 wheel index, and cuDNN 9 from either the nvidia-cudnn-cu12 wheel or your OS package).


Quick usage example

The filenames are intentionally generic. Adjust as needed.

from pathlib import Path
from chopshop.ChopShop import ChopShop

cs = ChopShop()

# 1) Split audio streams from a video/container
wav_list = cs.split_audio_streams(
    "sample_video.mp4",
    output_dir="audio/",
    sample_rate=48000,
    bit_depth=16,
    overwrite=True,
)
diar_input_audio = str(wav_list[0])  # pick a stream

# 2) Diarize + transcribe (wrapper around whisper-diarization)
tp = cs.diarize_with_thirdparty(
    input_audio=diar_input_audio,
    out_dir="transcripts/",
    repo_dir="path/to/whisper-diarization",  # if you have a local clone; otherwise leave default
    whisper_model="base",   # use whatever one you like; just be consistent
    language="en",
    device="cuda",
    batch_size=0,
    no_stem=False,
    suppress_numerals=False,
    parallel=False,
    use_custom=True,   # use my custom entry point that also writes a CSV
    keep_temp=False,   # if set to False, it will clean up after itself (good idea)
    num_speakers=2,    # optional hint; can be omitted to let diarization infer
)

# 3) Make per-speaker WAVs using the transcript CSV
cs.split_wav_by_speaker(
    source_wav=diar_input_audio,
    transcript_csv=tp.raw_files["csv"],
    out_dir="audio_split/",
    time_unit="ms",    # the transcript CSV uses milliseconds
    silence_ms=500,   # add 0.5s before/after each clip (1s between clips)
)

# 4) Export Whisper encoder embeddings per transcript segment
cs.export_embeddings(
    transcript_csv=tp.raw_files["csv"],
    source_wav=diar_input_audio,
    output_dir="whisper_embed/",
    model_name="base",
    device="cuda",
    compute_type="float16",  # good default on GPU
    time_unit="ms",
    run_in_subprocess=True,  # isolates the encoder to avoid cuDNN conflicts
)

Outputs (typical):

  • transcripts/… — CSV (timestamped), SRT, and TXT
  • audio_split/… — one WAV per detected speaker
  • whisper_embed/<source_stem>_embeddings.csv — one row per transcript span with e0..eN encoder features

Troubleshooting

  • Unable to load any of {libcudnn_ops.so…} The process can't find cuDNN shared libs. Ensure nvidia-cudnn-cu12 is installed and that your LD_LIBRARY_PATH includes its lib directory.

  • CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH Version mismatch between runtime cuDNN and the PyTorch wheels. Reinstall the PyTorch wheels for your CUDA minor version (e.g., cu124) and keep cuDNN aligned.

  • Diarization is slow / OOM

    • Keep batch_size=0 (pipeline default)
    • Skip source separation with no_stem=True
    • Provide num_speakers if you know it (e.g., 2)
  • Embeddings CSV is empty

    • Make sure you pass the correct time_unit for your transcript (I use ms which should be the default)
    • Enable extractor debug logs by setting CHOPSHOP_DEBUG=1 in your environment to see row counts and shapes

Acknowledgments

Huge thanks to Mahmoud Ashraf for the outstanding whisper-diarization project, which I modify and wrap for diarization.


Roadmap

TBD. One of the big ones is to unify everything with defaults into a single function for the ChopShop() class.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chopshop-0.0.2.tar.gz (393.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chopshop-0.0.2-py3-none-any.whl (397.6 kB view details)

Uploaded Python 3

File details

Details for the file chopshop-0.0.2.tar.gz.

File metadata

  • Download URL: chopshop-0.0.2.tar.gz
  • Upload date:
  • Size: 393.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for chopshop-0.0.2.tar.gz
Algorithm Hash digest
SHA256 256ad21584b786b0fa0699196eb5895d79091b1457d6592cf40b1417268ca448
MD5 f6f530c45919f3ad2d459fe38c2eb04b
BLAKE2b-256 167a484bb19a04891d92e64bff9a005e8724bff424ebc0f7b4f045e1ae8d31ac

See more details on using hashes here.

File details

Details for the file chopshop-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: chopshop-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 397.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for chopshop-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7647a26463e813566f804ea06940ea3f36026d43288c8b27fbac177e7a8d925c
MD5 dad4136d1aa0f2acde88067765e8a0e7
BLAKE2b-256 8b259c2ece734128a9ae3ae2e5c63af682c6160977f28f2fb64f4d7ca94acfc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page