Skip to main content

Composable lyric-audio alignment pipeline with staged execution, batch processing, and LRC/ASS export.

Project description

py-roller

py-roller is a local command-line pipeline for generating rolling lyric files from audio and plain-text lyrics.

It can split vocals, filter audio, transcribe local audio with faster-whisper or wav2vec2-style backends, parse lyrics, align lyric lines, and export LRC or ASS karaoke output.

  • Package name: py-roller
  • CLI command: py-roller
  • Python import package: pyroller

Install

Use a fresh virtual environment when possible. First install the lightweight base package from source:

pip install -e .

Then install the validated audio/transcriber runtime:

py-roller install
py-roller doctor

py-roller install installs a pinned Torch/Torchaudio/TorchVision profile first, then installs the bundled audio-core requirements with matching constraints, validates the environment, and runs py-roller doctor unless --skip-doctor is passed.

Install profiles:

  • auto default: try the best validated profile for this machine, then fall back to CPU if validation fails.
  • cpu: force the CPU-only profile.
  • cu124: force the CUDA 12.4 profile.

Useful install commands:

py-roller install --profile cpu
py-roller install --profile cu124
py-roller install --dry-run
py-roller install --no-reset-torch
py-roller doctor --output-format json
py-roller install --progress-format jsonl --output-format json

Machine-readable runtime checks and install progress

doctor can print a machine-readable report for GUI frontends and automated runtime checks:

py-roller doctor --output-format json

The JSON report includes ok, the Python executable and platform, one entry per runtime check, and a suggested_next_step when the environment needs repair. The default remains the terminal checklist:

py-roller doctor --output-format human

install now separates live progress from the final summary:

py-roller install --progress-format human --output-format human   # default
py-roller install --progress-format jsonl --output-format json    # GUI-friendly
py-roller install --progress-format both --output-format json     # debug both streams

--progress-format jsonl emits PYROLLER_EVENT lines for install lifecycle events, selected profiles, subprocess starts/completions, subprocess output, validation, doctor, completion, failures, and heartbeat messages when a pip subprocess is still running without output. --output-format json prints a final install report containing the requested profile, selected profile, step results, validation results, and doctor summary when doctor is run.

Quick start

Set --language explicitly whenever the song language is known. Use zh for Chinese, en for English, and mul only when you need the multilingual fallback.

Raw audio + lyrics -> LRC

py-roller run \
  --stages s,f,t,p,a,w \
  --audio ./song.mp3 \
  --lyrics ./song.txt \
  --language zh \
  --filter-chain noise_gate,dereverb \
  --output-roller ./song.lrc

Prepared vocal track + lyrics -> ASS karaoke

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Batch process directories by filename stem

py-roller batch \
  --stages t,p,a,w \
  --audio ./audio_dir \
  --lyrics ./lyrics_dir \
  --language zh \
  --output-roller ./out_dir

Pipeline model

py-roller runs a contiguous chain of stages in this fixed order:

s -> f -> t -> p -> a -> w
splitter -> filter -> transcriber -> parser -> aligner -> writer

Valid examples:

  • s,f,t,p,a,w: full pipeline from raw audio and lyrics.
  • t,p,a,w: start from prepared vocal/filtered audio.
  • a,w: start from existing timed_units and parsed_lyrics artifacts.
  • w: rewrite from an existing alignment_result artifact.

Invalid examples:

  • s,t,w: skips required intermediate stages.
  • s,p,a: skips required intermediate stages.

Legal chain-start inputs:

  • --audio: valid when the chain starts at s, f, or t.
  • --lyrics: required when the chain includes p.
  • --timed-units and --parsed-lyrics: valid when the chain starts at a.
  • --alignment-result: valid when the chain starts at w.

Final outputs are only the explicit --output-* paths:

  • --output-vocal-audio
  • --output-filtered-audio
  • --output-timed-units
  • --output-parsed-lyrics
  • --output-alignment-result
  • --output-roller

Intermediate files under --intermediate are temporary working state unless --cleanup never is used.

Common workflows

Start from raw audio

py-roller run \
  --stages s,f,t,p,a,w \
  --audio ./song.mp3 \
  --lyrics ./song.txt \
  --language zh \
  --output-roller ./song.lrc

Start from already-separated vocals

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --output-roller ./song.lrc

Start from aligner artifacts

py-roller run \
  --stages a,w \
  --timed-units ./song.timed_units.json \
  --parsed-lyrics ./song.parsed_lyrics.json \
  --output-roller ./song.lrc

For repeated or partially omitted lyrics, choose a repetition mode explicitly:

py-roller run \
  --stages a,w \
  --timed-units ./song.timed_units.json \
  --parsed-lyrics ./song.parsed_lyrics.json \
  --aligner-repetition few \
  --output-roller ./song.lrc

--aligner-repetition accepts:

  • none: default standard global_dp_v1 behavior; best when repeated lyric lines are written out in full.
  • few: uses global DP as a proposal, then repairs sparse repeated or omitted regions between trusted anchors.
  • full: uses per-line candidate generation plus beam search for highly repetitive or anchorless songs.

Rewrite only from an existing alignment result

py-roller run \
  --stages w \
  --alignment-result ./song.alignment.json \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Backends and defaults

Backend selection is language-aware. The default language is mul for compatibility, but zh or en gives clearer transcriber/parser defaults when the song language is known.

Transcriber defaults

  • zh -> faster_whisper
  • en -> faster_whisper
  • mul -> faster_whisper

Additional transcriber backends:

  • zh also supports --transcriber-backend mms_phonetic for the Chinese phonetic CTC path.
  • mul also supports --transcriber-backend wav2vec2_phoneme for the multilingual phoneme CTC fallback.

Parser defaults

  • zh -> zh_router_pinyin
  • en -> en_arpabet
  • mul -> mul_ipa

Other defaults

  • aligner backend -> global_dp_v1
  • aligner repetition mode -> none
  • writer backend -> lrc_ms
  • writer spacing -> keep
  • cleanup policy -> on-success
  • transcriber model store -> ~/.cache/py-roller/models/transcriber

Transcriber models and Hugging Face downloads

Transcriber execution is local. py-roller does not send audio to a cloud transcription API.

Model resolution order:

  1. Resolve --transcriber-model-name, or use the backend default model name.
  2. Look for the model in the py-roller transcriber model store.
  3. If not found and offline mode is not enabled, materialize/download the model into the model store.
  4. Load the resolved local model path for inference.

Useful model options:

  • --transcriber-model-path: local model store root.
  • --transcriber-model-name: model alias, Hugging Face repo id, or explicit local path.
  • --transcriber-local-files-only: refuse network access and use only local files/cache.

For faster_whisper, aliases such as large-v2, large-v3, and turbo resolve to the corresponding Systran/faster-whisper-* snapshots.

Example with a custom model store:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-model-path ./models/transcriber \
  --output-roller ./song.lrc

Offline run after the model already exists locally:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-model-path ./models/transcriber \
  --transcriber-local-files-only \
  --output-roller ./song.lrc

Restricted or unstable networks

Hugging Face model downloads can be affected by proxies, timeouts, and XET/CAS behavior. py-roller exposes the common controls directly:

  • --transcriber-hf-xet {auto,on,off}: use off when XET/CAS hangs or fails on your network.
  • --transcriber-hf-proxy URL: use one HTTP or SOCKS proxy for model downloads.
  • --transcriber-hf-etag-timeout SECONDS: metadata/etag timeout.
  • --transcriber-hf-download-timeout SECONDS: large file download timeout.
  • --transcriber-hf-max-workers INT: snapshot download parallelism; lower values such as 1 or 2 are often better for fragile proxies.

Avoid XET/CAS when it is unreliable:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-hf-xet off \
  --output-roller ./song.lrc

Use a local SOCKS proxy and conservative download settings:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-hf-proxy socks5://127.0.0.1:7890 \
  --transcriber-hf-download-timeout 120 \
  --transcriber-hf-etag-timeout 30 \
  --transcriber-hf-max-workers 2 \
  --output-roller ./song.lrc

audio-core installs SOCKS support through httpx[socks]. If the environment was installed manually and SOCKS support is missing, run:

py-roller install

or install the missing dependency directly:

pip install "httpx[socks]"

Writer behavior

LRC

The default writer is lrc_ms, which writes LRC lines with millisecond precision.

Supported writer backends:

  • lrc_ms: millisecond precision.
  • lrc_cs: centisecond precision.
  • lrc_compressed: millisecond precision, with consecutive identical timestamps compressed.
  • ass_karaoke: ASS dialogue output with karaoke timing tags.

ASS karaoke

ass_karaoke writes ASS dialogue lines with karaoke timing tags.

Current behavior:

  • structural/spacing line output follows --writer-spacing (keep by default).
  • display end time prefers matched unit timing instead of blindly extending to the next line.
  • unmatched lines receive a short visible-duration fallback.

Example:

py-roller run \
  --stages w \
  --alignment-result ./song.alignment.json \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Batch mode

batch uses the same stage semantics as run, but applies them to many tasks.

Directory pairing

Directory mode currently supports:

--pair-by stem

Default candidate globs:

  • --audio-glob "*.mp3"
  • --lyrics-glob "*.txt"

Matching is non-recursive.

Example:

py-roller batch \
  --stages t,p,a,w \
  --audio ./audio_dir \
  --lyrics ./lyrics_dir \
  --language zh \
  --output-roller ./out_dir

Batch controls

  • --jobs N: maximum number of parallel workers.
  • --continue-on-error: keep processing remaining tasks after failures.
  • --skip-existing: skip tasks whose declared final outputs already exist.
  • --manifest jobs.yaml: load explicit per-task paths from YAML instead of pairing by stem.

Parallelism guidance:

  • CPU-only: start with --jobs 1 or --jobs 2.
  • Single GPU: usually start with --jobs 1.

YAML manifest format

Manifest mode is useful when filenames do not match cleanly by stem.

The manifest defines per-task input and output paths only. It does not override stage selection, language, backend choice, filter settings, jobs, or other batch-level options.

Supported top-level forms:

tasks:
  - id: song01
    audio: ./audio/song01_master.mp3
    lyrics: ./lyrics/song01_final.txt
    output_roller: ./out/song01.lrc

or:

- id: song01
  audio: ./audio/song01_master.mp3
  lyrics: ./lyrics/song01_final.txt
  output_roller: ./out/song01.lrc

Allowed manifest input keys:

  • audio
  • lyrics
  • timed_units
  • parsed_lyrics
  • alignment_result

Allowed manifest output keys:

  • output_vocal_audio
  • output_filtered_audio
  • output_timed_units
  • output_parsed_lyrics
  • output_alignment_result
  • output_roller

Optional helper key:

  • id

Validation rules:

  • each task must be a mapping.
  • unknown keys are rejected.
  • inputs must match the selected chain start.
  • outputs must be valid final outputs for the selected chain.
  • task ids/stems must be unique.
  • final output paths must not conflict across tasks.
  • relative paths are resolved relative to the manifest file location.

YAML config for CLI defaults

Use --config to load YAML defaults.

Priority order:

built-in defaults < config YAML < explicit CLI arguments

Section model:

  • shared: defaults applied to both run and batch.
  • run: currently no extra keys beyond shared.
  • batch: defaults for batch-only options such as jobs and skip_existing.

Example:

shared:
  language: zh
  writer_spacing: keep
  writer_backend: lrc_ms
  intermediate: ./tmp/py-roller-artifacts
  cleanup: on-success
  transcriber_device: cpu
  transcriber_model_path: ~/.cache/py-roller/models/transcriber
  transcriber_local_files_only: false
  transcriber_hf_xet: auto
  transcriber_hf_proxy: null
  transcriber_hf_download_timeout: 120
  transcriber_hf_etag_timeout: 30
  transcriber_hf_max_workers: 2
  splitter_backend: demucs
  splitter_demucs_model: htdemucs
  splitter_demucs_device: cpu
  splitter_demucs_jobs: 0
  splitter_demucs_overlap: 0.25
  splitter_demucs_segment: 8
  filter_chain:
    - noise_gate
    - dereverb

batch:
  jobs: 2
  audio_glob: "*.mp3"
  lyrics_glob: "*.txt"
  timed_units_glob: "*.json"
  parsed_lyrics_glob: "*.json"
  alignment_result_glob: "*.json"
  continue_on_error: true

filter_chain can be written either as a comma-separated string or as a YAML list. Quote on or off for transcriber_hf_xet if your YAML parser treats them as booleans; py-roller also accepts boolean true/false there as on/off for convenience.

Progress, logs, and cleanup

The project exposes progress in two layers:

  • human-readable logs for normal terminal use;
  • optional machine-readable JSONL events for GUI frontends such as lrc-roller.

Use --progress-format to choose the progress output mode:

py-roller run ... --progress-format human   # default, terminal-friendly logs
py-roller run ... --progress-format jsonl   # structured PYROLLER_EVENT lines
py-roller run ... --progress-format both    # logs plus structured events

jsonl emits one parseable event per line with the PYROLLER_EVENT prefix, for example:

PYROLLER_EVENT {"type":"download_progress","stage":"model_download","parent_stage":"preflight","repo_id":"Systran/faster-whisper-large-v2","file":"model.bin","bytes_downloaded":1534203904,"bytes_total":3086912962,"progress":0.497}

This is intended for frontends that need reliable stage and download progress instead of parsing mixed logs from tqdm, Demucs, and huggingface_hub. Human-readable mode remains the default so existing CLI workflows are unchanged.

Structured events use progress as the canonical 0.0 to 1.0 field. A percent compatibility alias is still emitted for early GUI integrations. Standard stages are preflight, model_download, splitter, filter, transcriber, parser, aligner, and writer; model download events also include parent_stage: preflight.

Current progress coverage:

  • run lifecycle events: run_started, run_completed, and run_failed;
  • model preflight and Hugging Face model download events, including cache path, proxy/XET settings, file count, largest file name, bytes downloaded, total bytes when known, and estimated speed;
  • heartbeat events during long model downloads and faster-whisper transcription periods;
  • splitter/Demucs seconds-based progress as structured splitter events;
  • filter phase progress;
  • transcriber phase progress, including faster-whisper segment count, last processed audio time, duration hints, and text previews when available;
  • parser, aligner, and writer stage events;
  • artifact write events and failure events.

In single-task run, human progress is shown as terminal logs/progress bars when supported. In batch, per-task progress is logged to avoid multiple workers fighting for one terminal. GUI frontends should prefer --progress-format jsonl.

Intermediate files live under:

--intermediate/<task-id>/splitter
--intermediate/<task-id>/filter
--intermediate/<task-id>/logs

Default intermediate root:

<system temp>/py-roller-artifacts

Cleanup policy:

  • --cleanup on-success: remove per-task intermediate directories after successful tasks.
  • --cleanup never: keep intermediate audio and logs for inspection.

Troubleshooting

Check the environment

py-roller doctor

For integrations, use JSON output:

py-roller doctor --output-format json

doctor checks Python, Torch, Torchaudio, faster-whisper, CTranslate2, transformers, SOCKS proxy support, Demucs, and librosa.

If it reports a broken audio/transcriber environment, start with:

py-roller install

Hugging Face model download progress

For restricted networks, prefer disabling HF XET/CAS and using remote-DNS SOCKS:

py-roller run ... \
  --transcriber-hf-xet off \
  --transcriber-hf-proxy socks5h://127.0.0.1:9909

If a model has already been materialized into the py-roller model store, use local-only mode to avoid touching the network on later runs:

py-roller run ... \
  --transcriber-model-path ~/.cache/py-roller/models/transcriber \
  --transcriber-local-files-only

The Hugging Face file-count progress shown by huggingface_hub can appear stuck on large model files. Use --progress-format jsonl or both to get byte-level download_progress events with cache growth, speed, and total size when available.

Interruption and child process cleanup

Batch fail-fast actively stops launching further work after the first task failure when --continue-on-error is not set, and worker cleanup includes a Windows-specific process-tree branch.

For older runs or already orphaned processes, Linux/macOS cleanup examples are still useful:

pkill -TERM -f 'python .*pyroller'
pkill -TERM -f 'demucs.separate|demucs'

If anything still survives:

pkill -KILL -f 'python .*pyroller'
pkill -KILL -f 'demucs.separate|demucs'

Inspect candidates first with:

ps -ef | grep -E 'pyroller|demucs'

Dependency policy

py-roller install prefers the newest validated dependency line for this release:

  • Torch/TorchAudio/TorchVision are installed from the official 2.6.0 family for every built-in profile.
  • SOCKS proxy support is installed by default through httpx[socks], so Hugging Face downloads do not fail merely because socksio is missing.

If you upgrade or override audio/transcriber packages manually, run py-roller doctor before using transcription-heavy pipelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_roller-0.5.6.tar.gz (104.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_roller-0.5.6-py3-none-any.whl (131.9 kB view details)

Uploaded Python 3

File details

Details for the file py_roller-0.5.6.tar.gz.

File metadata

  • Download URL: py_roller-0.5.6.tar.gz
  • Upload date:
  • Size: 104.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for py_roller-0.5.6.tar.gz
Algorithm Hash digest
SHA256 d57230f029df506d33267366c3cace508fbc65aae7fdf67693ff85b3893df0eb
MD5 d0322e01069a807fc42e90079dc7a154
BLAKE2b-256 f51f88e3623fdd3f59d8c5a05c867c852f24276060f3e46c72a3404d8dea8981

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_roller-0.5.6.tar.gz:

Publisher: python-publish.yml on Harmonese/py-roller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_roller-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: py_roller-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 131.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for py_roller-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b642de9f4898fbbc793fa06e9e51659333614b351ec4dc35413ec1cb451833b5
MD5 1c81d5159146445604cb3bca39ea8ce4
BLAKE2b-256 25467edb731d43f5c7b5e7963b8239c475c3065078b266ec3361fb0d8c567b1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_roller-0.5.6-py3-none-any.whl:

Publisher: python-publish.yml on Harmonese/py-roller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page