Composable lyric-audio alignment pipeline with staged execution, batch processing, and LRC/ASS export.
Project description
py-roller
py-roller is a local command-line pipeline for generating rolling lyric files from audio and plain-text lyrics.
It can split vocals, filter audio, transcribe local audio with faster-whisper or wav2vec2-style backends, parse lyrics, align lyric lines, and export LRC or ASS karaoke output.
- Package name:
py-roller - CLI command:
py-roller - Python import package:
pyroller
Install
Use a fresh virtual environment when possible. First install the lightweight base package from source:
pip install -e .
Then install the validated audio/transcriber runtime:
py-roller install
py-roller doctor
py-roller install installs a pinned Torch/Torchaudio/TorchVision profile first, then installs the bundled audio-core requirements with matching constraints, validates the environment, and runs py-roller doctor unless --skip-doctor is passed.
Install profiles:
autodefault: try the best validated profile for this machine, then fall back to CPU if validation fails.cpu: force the CPU-only profile.cu124: force the CUDA 12.4 profile.
Useful install commands:
py-roller install --profile cpu
py-roller install --profile cu124
py-roller install --dry-run
py-roller install --no-reset-torch
py-roller doctor --output-format json
py-roller install --progress-format jsonl --output-format json
Machine-readable runtime checks and install progress
doctor can print a machine-readable report for GUI frontends and automated runtime checks:
py-roller doctor --output-format json
The JSON report includes ok, the Python executable and platform, one entry per runtime check, and a suggested_next_step when the environment needs repair. The default remains the terminal checklist:
py-roller doctor --output-format human
install now separates live progress from the final summary:
py-roller install --progress-format human --output-format human # default
py-roller install --progress-format jsonl --output-format json # GUI-friendly
py-roller install --progress-format both --output-format json # debug both streams
--progress-format jsonl emits PYROLLER_EVENT lines for install lifecycle events, selected profiles, subprocess starts/completions, subprocess output, validation, doctor, completion, failures, and heartbeat messages when a pip subprocess is still running without output. --output-format json prints a final install report containing the requested profile, selected profile, step results, validation results, and doctor summary when doctor is run.
Language / i18n
py-roller automatically detects the system locale and displays Chinese output when the environment is zh_CN.UTF-8 or similar. English is used as the fallback.
Override the language explicitly:
PYROLLER_LANG=zh py-roller --help # Chinese
PYROLLER_LANG=en py-roller --help # English
All user-facing strings are translated: CLI help, pipeline summaries, doctor reports, install progress, error messages, and argparse built-in strings.
Quick start
Set --language explicitly whenever the song language is known. Use zh for Chinese, en for English, and mul only when you need the multilingual fallback.
Raw audio + lyrics -> LRC
py-roller run \
--stages s,f,t,p,a,w \
--audio ./song.mp3 \
--lyrics ./song.txt \
--language zh \
--filter-chain noise_gate,dereverb \
--output-roller ./song.lrc
Prepared vocal track + lyrics -> ASS karaoke
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--language zh \
--writer-backend ass_karaoke \
--output-roller ./song.ass
Batch process directories by filename stem
py-roller batch \
--stages t,p,a,w \
--audio ./audio_dir \
--lyrics ./lyrics_dir \
--language zh \
--output-roller ./out_dir
Pipeline model
py-roller runs a contiguous chain of stages in this fixed order:
s -> f -> t -> p -> a -> w
splitter -> filter -> transcriber -> parser -> aligner -> writer
Valid examples:
s,f,t,p,a,w: full pipeline from raw audio and lyrics.t,p,a,w: start from prepared vocal/filtered audio.a,w: start from existingtimed_unitsandparsed_lyricsartifacts.w: rewrite from an existingalignment_resultartifact.
Invalid examples:
s,t,w: skips required intermediate stages.s,p,a: skips required intermediate stages.
Legal chain-start inputs:
--audio: valid when the chain starts ats,f, ort.--lyrics: required when the chain includesp.--timed-unitsand--parsed-lyrics: valid when the chain starts ata.--alignment-result: valid when the chain starts atw.
Final outputs are only the explicit --output-* paths:
--output-vocal-audio--output-filtered-audio--output-timed-units--output-parsed-lyrics--output-alignment-result--output-roller
Intermediate files under --intermediate are temporary working state unless --cleanup never is used.
Common workflows
Start from raw audio
py-roller run \
--stages s,f,t,p,a,w \
--audio ./song.mp3 \
--lyrics ./song.txt \
--language zh \
--output-roller ./song.lrc
Start from already-separated vocals
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--language zh \
--output-roller ./song.lrc
Start from aligner artifacts
py-roller run \
--stages a,w \
--timed-units ./song.timed_units.json \
--parsed-lyrics ./song.parsed_lyrics.json \
--output-roller ./song.lrc
For repeated or partially omitted lyrics, choose a repetition mode explicitly:
py-roller run \
--stages a,w \
--timed-units ./song.timed_units.json \
--parsed-lyrics ./song.parsed_lyrics.json \
--aligner-repetition few \
--output-roller ./song.lrc
--aligner-repetition accepts:
none: default standardglobal_dp_v1behavior; best when repeated lyric lines are written out in full.few: uses global DP as a proposal, then repairs sparse repeated or omitted regions between trusted anchors.full: uses per-line candidate generation plus beam search for highly repetitive or anchorless songs.
Rewrite only from an existing alignment result
py-roller run \
--stages w \
--alignment-result ./song.alignment.json \
--writer-backend ass_karaoke \
--output-roller ./song.ass
Backends and defaults
Backend selection is language-aware. The default language is mul for compatibility, but zh or en gives clearer transcriber/parser defaults when the song language is known.
Transcriber defaults
zh->faster_whisperen->faster_whispermul->faster_whisper
Additional transcriber backends:
zhalso supports--transcriber-backend mms_phoneticfor the Chinese phonetic CTC path.mulalso supports--transcriber-backend wav2vec2_phonemefor the multilingual phoneme CTC fallback.
Parser defaults
zh->zh_router_pinyinen->en_arpabetmul->mul_ipa
Other defaults
- aligner backend ->
global_dp_v1 - aligner repetition mode ->
none - writer backend ->
lrc_ms - writer spacing ->
keep - cleanup policy ->
on-success - transcriber model store ->
~/.cache/py-roller/models/transcriber - transcriber device -> auto-detected (CUDA GPU if available, otherwise CPU)
- transcriber VAD filter -> enabled (skips silence to speed up transcription)
Transcriber models and Hugging Face downloads
Transcriber execution is local. py-roller does not send audio to a cloud transcription API.
Model resolution order:
- Resolve
--transcriber-model-name, or use the backend default model name. - Look for the model in the py-roller transcriber model store.
- If not found and offline mode is not enabled, materialize/download the model into the model store.
- Load the resolved local model path for inference.
Useful model options:
--transcriber-model-path: local model store root.--transcriber-model-name: model alias, Hugging Face repo id, or explicit local path.--transcriber-local-files-only: refuse network access and use only local files/cache.
For faster_whisper, aliases such as large-v2, large-v3, and turbo resolve to the corresponding Systran/faster-whisper-* snapshots.
Example with a custom model store:
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--language zh \
--transcriber-model-path ./models/transcriber \
--output-roller ./song.lrc
Offline run after the model already exists locally:
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--language zh \
--transcriber-model-path ./models/transcriber \
--transcriber-local-files-only \
--output-roller ./song.lrc
Restricted or unstable networks
Hugging Face model downloads can be affected by proxies, timeouts, and XET/CAS behavior. py-roller exposes the common controls directly:
--transcriber-hf-xet {auto,on,off}: useoffwhen XET/CAS hangs or fails on your network.--transcriber-hf-proxy URL: use one HTTP or SOCKS proxy for model downloads.--transcriber-hf-etag-timeout SECONDS: metadata/etag timeout.--transcriber-hf-download-timeout SECONDS: large file download timeout.--transcriber-hf-max-workers INT: snapshot download parallelism; lower values such as1or2are often better for fragile proxies.
Avoid XET/CAS when it is unreliable:
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--language zh \
--transcriber-hf-xet off \
--output-roller ./song.lrc
Use a local SOCKS proxy and conservative download settings:
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--language zh \
--transcriber-hf-proxy socks5://127.0.0.1:7890 \
--transcriber-hf-download-timeout 120 \
--transcriber-hf-etag-timeout 30 \
--transcriber-hf-max-workers 2 \
--output-roller ./song.lrc
audio-core installs SOCKS support through httpx[socks]. If the environment was installed manually and SOCKS support is missing, run:
py-roller install
or install the missing dependency directly:
pip install "httpx[socks]"
VAD filtering
faster-whisper VAD (Voice Activity Detection) filtering skips silent sections during transcription, reducing processing time by 20–40% for songs with pauses or instrumental breaks. It is enabled by default.
Disable VAD filtering if you need word-level timestamps for every segment, or if the VAD model is cutting audio too aggressively:
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--language zh \
--no-transcriber-vad-filter \
--output-roller ./song.lrc
GPU auto-detection
When --transcriber-device is not explicitly set, py-roller automatically checks for an available CUDA GPU. If found, the transcriber defaults to device=cuda with compute_type=float16 for significantly faster inference. This can be overridden with --transcriber-device cpu or --transcriber-compute-type int8.
Model pre-download
Pre-download a transcriber model into the local model store so that later pipeline runs can use --transcriber-local-files-only without touching the network:
py-roller cache-model --language zh
py-roller cache-model --language zh --transcriber-model-name large-v3
py-roller cache-model --language zh --transcriber-hf-xet off --transcriber-hf-proxy socks5://127.0.0.1:7890
Writer behavior
LRC
The default writer is lrc_ms, which writes LRC lines with millisecond precision.
Supported writer backends:
lrc_ms: millisecond precision.lrc_cs: centisecond precision.lrc_compressed: millisecond precision, with consecutive identical timestamps compressed.ass_karaoke: ASS dialogue output with karaoke timing tags.
ASS karaoke
ass_karaoke writes ASS dialogue lines with karaoke timing tags.
Current behavior:
- structural/spacing line output follows
--writer-spacing(keepby default). - display end time prefers matched unit timing instead of blindly extending to the next line.
- unmatched lines receive a short visible-duration fallback.
Example:
py-roller run \
--stages w \
--alignment-result ./song.alignment.json \
--writer-backend ass_karaoke \
--output-roller ./song.ass
Batch mode
batch uses the same stage semantics as run, but applies them to many tasks.
Directory pairing
Directory mode currently supports:
--pair-by stem
Default candidate globs:
--audio-glob "*.mp3"--lyrics-glob "*.txt"
Matching is non-recursive.
Example:
py-roller batch \
--stages t,p,a,w \
--audio ./audio_dir \
--lyrics ./lyrics_dir \
--language zh \
--output-roller ./out_dir
Batch controls
--jobs N: maximum number of parallel workers.--continue-on-error: keep processing remaining tasks after failures.--skip-existing: skip tasks whose declared final outputs already exist.--manifest jobs.yaml: load explicit per-task paths from YAML instead of pairing by stem.
Parallelism guidance:
- CPU-only: start with
--jobs 1or--jobs 2. - Single GPU: usually start with
--jobs 1.
YAML manifest format
Manifest mode is useful when filenames do not match cleanly by stem.
The manifest defines per-task input and output paths only. It does not override stage selection, language, backend choice, filter settings, jobs, or other batch-level options.
Supported top-level forms:
tasks:
- id: song01
audio: ./audio/song01_master.mp3
lyrics: ./lyrics/song01_final.txt
output_roller: ./out/song01.lrc
or:
- id: song01
audio: ./audio/song01_master.mp3
lyrics: ./lyrics/song01_final.txt
output_roller: ./out/song01.lrc
Allowed manifest input keys:
audiolyricstimed_unitsparsed_lyricsalignment_result
Allowed manifest output keys:
output_vocal_audiooutput_filtered_audiooutput_timed_unitsoutput_parsed_lyricsoutput_alignment_resultoutput_roller
Optional helper key:
id
Validation rules:
- each task must be a mapping.
- unknown keys are rejected.
- inputs must match the selected chain start.
- outputs must be valid final outputs for the selected chain.
- task ids/stems must be unique.
- final output paths must not conflict across tasks.
- relative paths are resolved relative to the manifest file location.
YAML config for CLI defaults
Use --config to load YAML defaults.
Priority order:
built-in defaults < config YAML < explicit CLI arguments
Section model:
shared: defaults applied to bothrunandbatch.run: currently no extra keys beyondshared.batch: defaults for batch-only options such asjobsandskip_existing.
Example:
shared:
language: zh
writer_spacing: keep
writer_backend: lrc_ms
intermediate: ./tmp/py-roller-artifacts
cleanup: on-success
transcriber_device: cpu
transcriber_model_path: ~/.cache/py-roller/models/transcriber
transcriber_local_files_only: false
transcriber_vad_filter: true
transcriber_hf_xet: auto
transcriber_hf_proxy: null
transcriber_hf_download_timeout: 120
transcriber_hf_etag_timeout: 30
transcriber_hf_max_workers: 2
splitter_backend: demucs
splitter_demucs_model: htdemucs
splitter_demucs_device: cpu
splitter_demucs_jobs: 0
splitter_demucs_overlap: 0.25
splitter_demucs_segment: 8
filter_chain:
- noise_gate
- dereverb
batch:
jobs: 2
audio_glob: "*.mp3"
lyrics_glob: "*.txt"
timed_units_glob: "*.json"
parsed_lyrics_glob: "*.json"
alignment_result_glob: "*.json"
continue_on_error: true
filter_chain can be written either as a comma-separated string or as a YAML list. Quote on or off for transcriber_hf_xet if your YAML parser treats them as booleans; py-roller also accepts boolean true/false there as on/off for convenience.
Progress, logs, and cleanup
The project exposes progress in two layers:
- human-readable logs for normal terminal use;
- optional machine-readable JSONL events for GUI frontends such as lrc-roller.
Use --progress-format to choose the progress output mode:
py-roller run ... --progress-format human # default, terminal-friendly logs
py-roller run ... --progress-format jsonl # structured PYROLLER_EVENT lines
py-roller run ... --progress-format both # logs plus structured events
jsonl emits one parseable event per line with the PYROLLER_EVENT prefix, for example:
PYROLLER_EVENT {"type":"download_progress","stage":"model_download","parent_stage":"preflight","repo_id":"Systran/faster-whisper-large-v2","file":"model.bin","bytes_downloaded":1534203904,"bytes_total":3086912962,"progress":0.497}
This is intended for frontends that need reliable stage and download progress instead of parsing mixed logs from tqdm, Demucs, and huggingface_hub. Human-readable mode remains the default so existing CLI workflows are unchanged.
Structured events use progress as the canonical 0.0 to 1.0 field. A percent compatibility alias is still emitted for early GUI integrations. Standard stages are preflight, model_download, splitter, filter, transcriber, parser, aligner, and writer; model download events also include parent_stage: preflight.
Current progress coverage:
- run lifecycle events:
run_started,run_completed, andrun_failed; - model preflight and Hugging Face model download events, including cache path, proxy/XET settings, file count, largest file name, bytes downloaded, total bytes when known, and estimated speed;
- heartbeat events during long model downloads and faster-whisper transcription periods;
- splitter/Demucs seconds-based progress as structured
splitterevents; - filter phase progress;
- transcriber phase progress, including faster-whisper segment count, last processed audio time, duration hints, and text previews when available;
- parser, aligner, and writer stage events;
- artifact write events and failure events.
In single-task run, human progress is shown as terminal logs/progress bars when supported. In batch, per-task progress is logged to avoid multiple workers fighting for one terminal. GUI frontends should prefer --progress-format jsonl.
Intermediate files live under:
--intermediate/<task-id>/splitter
--intermediate/<task-id>/filter
--intermediate/<task-id>/logs
Default intermediate root:
<system temp>/py-roller-artifacts
Cleanup policy:
--cleanup on-success: remove per-task intermediate directories after successful tasks.--cleanup never: keep intermediate audio and logs for inspection.
Troubleshooting
Check the environment
py-roller doctor
For integrations, use JSON output:
py-roller doctor --output-format json
doctor checks Python, Torch, Torchaudio, faster-whisper, CTranslate2, transformers, SOCKS proxy support, Demucs, and librosa.
If it reports a broken audio/transcriber environment, start with:
py-roller install
Hugging Face model download progress
For restricted networks, prefer disabling HF XET/CAS and using remote-DNS SOCKS:
py-roller run ... \
--transcriber-hf-xet off \
--transcriber-hf-proxy socks5h://127.0.0.1:9909
If a model has already been materialized into the py-roller model store, use local-only mode to avoid touching the network on later runs:
py-roller run ... \
--transcriber-model-path ~/.cache/py-roller/models/transcriber \
--transcriber-local-files-only
The Hugging Face file-count progress shown by huggingface_hub can appear stuck on large model files. Use --progress-format jsonl or both to get byte-level download_progress events with cache growth, speed, and total size when available.
Interruption and child process cleanup
Batch fail-fast actively stops launching further work after the first task failure when --continue-on-error is not set, and worker cleanup includes a Windows-specific process-tree branch.
For older runs or already orphaned processes, Linux/macOS cleanup examples are still useful:
pkill -TERM -f 'python .*pyroller'
pkill -TERM -f 'demucs.separate|demucs'
If anything still survives:
pkill -KILL -f 'python .*pyroller'
pkill -KILL -f 'demucs.separate|demucs'
Inspect candidates first with:
ps -ef | grep -E 'pyroller|demucs'
Dependency policy
py-roller install prefers the newest validated dependency line for this release:
- Torch/TorchAudio/TorchVision are installed from the official 2.6.0 family for every built-in profile.
- SOCKS proxy support is installed by default through
httpx[socks], so Hugging Face downloads do not fail merely becausesocksiois missing.
If you upgrade or override audio/transcriber packages manually, run py-roller doctor before using transcription-heavy pipelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_roller-0.6.0.tar.gz.
File metadata
- Download URL: py_roller-0.6.0.tar.gz
- Upload date:
- Size: 268.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f69ab4285580931baabb431556fdd2f1cd99d4943a724920c92f97e88b1f832
|
|
| MD5 |
283de55c682b5d0643eb08b994b85fa3
|
|
| BLAKE2b-256 |
dfb5ccb2649ede0ba2e1a0d82b1a94c46737f3df824672e0638e72905572f6d6
|
Provenance
The following attestation bundles were made for py_roller-0.6.0.tar.gz:
Publisher:
python-publish.yml on Harmonese/py-roller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_roller-0.6.0.tar.gz -
Subject digest:
4f69ab4285580931baabb431556fdd2f1cd99d4943a724920c92f97e88b1f832 - Sigstore transparency entry: 1555023315
- Sigstore integration time:
-
Permalink:
Harmonese/py-roller@6e2028cd243f6d5b2b65276d108b577fbb183849 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/Harmonese
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6e2028cd243f6d5b2b65276d108b577fbb183849 -
Trigger Event:
release
-
Statement type:
File details
Details for the file py_roller-0.6.0-py3-none-any.whl.
File metadata
- Download URL: py_roller-0.6.0-py3-none-any.whl
- Upload date:
- Size: 300.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5155d3213ca54578133b4834d0040c7ea47a1eedddff18c3e3d9e2f09f5ad1cb
|
|
| MD5 |
5794340792a8f15c0d3c4e34c8c620ac
|
|
| BLAKE2b-256 |
df9d9544937cc898899470e9e885b34972c5a2eb6f74e6363a1d0f5f5724faf9
|
Provenance
The following attestation bundles were made for py_roller-0.6.0-py3-none-any.whl:
Publisher:
python-publish.yml on Harmonese/py-roller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_roller-0.6.0-py3-none-any.whl -
Subject digest:
5155d3213ca54578133b4834d0040c7ea47a1eedddff18c3e3d9e2f09f5ad1cb - Sigstore transparency entry: 1555023323
- Sigstore integration time:
-
Permalink:
Harmonese/py-roller@6e2028cd243f6d5b2b65276d108b577fbb183849 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/Harmonese
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@6e2028cd243f6d5b2b65276d108b577fbb183849 -
Trigger Event:
release
-
Statement type: