Composable lyric-audio alignment pipeline with staged execution, batch processing, and LRC/ASS export.
Project description
py-roller
py-roller is a CLI Solution for automatic rolling lyrics generating.
To be specific, py-roller is a composable lyric-audio alignment pipeline CLI for staged execution, batch processing, and LRC/ASS export. Designed to support multiple local transcriber back-ends including faster-whisper and wav2vec2.
- Package name:
py-roller - CLI command:
py-roller - Python import package:
pyroller
Quick overview
py-roller treats alignment as a contiguous stage chain:
s -> f -> t -> p -> a -> w
splitter -> filter -> transcriber -> parser -> aligner -> writer
Core modes:
run: execute one contiguous stage chain for one taskbatch: execute the same contiguous stage chain across many tasks
Core artifact types:
- input audio / lyrics
- intermediate vocal and filtered audio
timed_unitsparsed_lyricsalignment_result- roller outputs such as LRC or ASS
Installation
From source, first install the lightweight base package:
pip install -e .
Then let py-roller install the validated audio/transcriber runtime for your machine:
py-roller install
Profiles:
auto(default): choose the best validated profile for this machine, then automatically fall back to CPU if validation failscpu: official CPU-only stable profilecu124: official CUDA 12.4 profile
Useful variants:
py-roller install --profile cpu
py-roller install --profile cu124
py-roller install --dry-run
py-roller doctor
Notes:
py-roller installstill usespipunderneath, but it does not rely on pip to guess the correct Torch/Torchaudio flavor.- The command first installs the selected Torch profile, then installs the bundled
audio-coreruntime requirements with the matching constraints file, validates the resulting environment, and finally runspy-roller doctorunless you pass--skip-doctor.
After installation, the CLI command is:
py-roller
Quick start
Full pipeline: audio + lyrics -> LRC
py-roller run \
--stages s,f,t,p,a,w \
--audio ./song.mp3 \
--lyrics ./song.txt \
--filter-chain noise_gate,dereverb \
--output-roller ./song.lrc
--language zh # Choose as you like
Start from a prepared vocal track
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--writer-backend ass_karaoke \
--output-roller ./song.ass
--language zh # Choose as you like
Batch processing by stem
py-roller batch \
--stages t,p,a,w \
--audio ./audio_dir \
--lyrics ./lyrics_dir \
--output-roller ./out_dir
--language zh # Choose as you like
Core execution model
Contiguous stage chains only
The CLI only accepts contiguous subchains of the canonical order.
Valid examples:
s,f,t,p,a,wt,p,a,wa,ww
Invalid examples:
s,t,ws,p,a
Legal chain starts
Explicit artifact inputs are only valid at the correct chain start:
--audiois valid when the chain starts ats,f, ort--lyricsis valid when the chain includesp--timed-unitsand--parsed-lyricsare only valid when the chain starts ata--alignment-resultis only valid when the chain starts atw
Final outputs vs intermediate artifacts
Final user-requested outputs are only the explicit --output-* paths:
--output-vocal-audio--output-filtered-audio--output-timed-units--output-parsed-lyrics--output-alignment-result--output-roller
Everything else created under --intermediate is treated as intermediate state.
Common workflows
Start from raw audio
Use the full chain when you want splitting, filtering, transcription, alignment, and final writing in one command.
py-roller run \
--stages s,f,t,p,a,w \
--audio ./song.mp3 \
--lyrics ./song.txt \
--output-roller ./song.lrc
Start from filtered or vocal audio
Skip splitter/filter when you already have a suitable track for transcription.
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--output-roller ./song.lrc
Start from aligner artifacts
py-roller run \
--stages a,w \
--timed-units ./song.timed_units.json \
--parsed-lyrics ./song.parsed_lyrics.json \
--output-roller ./song.lrc
For repeated or partially omitted lyrics, explicitly choose the repetition strategy:
py-roller run \
--stages a,w \
--timed-units ./song.timed_units.json \
--parsed-lyrics ./song.parsed_lyrics.json \
--aligner-repetition few \
--output-roller ./song.lrc
--aligner-repetition accepts:
none: default; preserves the existingglobal_dp_v1behavior for lyrics where repeated lines are fully written out.few: uses global DP as a proposal, then repairs weak repeated/omitted regions between trusted anchors with a local candidate lattice.full: skips anchor reliance and uses per-line top-k candidate generation plus beam search for highly repetitive or anchorless songs.
Rewrite only from an existing alignment result
py-roller run \
--stages w \
--alignment-result ./song.alignment.json \
--writer-backend ass_karaoke \
--output-roller ./song.ass
Backend defaults
Default backend selection is language-aware. Please note that the default selection of language is "mul" which works poorly when tested on both Chinese and English, please use the --language flag to specify the desired language if the language is directly supported.
Transcriber defaults
zh->faster_whisperen->faster_whispermul->faster_whisper
Optional transcriber backends:
zhalso supports--transcriber-backend mms_phoneticfor the existing Chinese phonetic CTC pathmulalso supports--transcriber-backend wav2vec2_phonemefor the existing multilingual phoneme-CTC fallback
Parser defaults
zh->zh_router_pinyinen->en_arpabetmul->mul_ipa
Other defaults
- aligner backend ->
global_dp_v1 - aligner repetition mode ->
none - writer backend ->
lrc_ms - language ->
mul writer_spacing-> keepcleanup->on-success- transcriber model store ->
~/.cache/py-roller/models/transcriber
Transcriber model store and offline behavior
Transcriber execution is local-only. py-roller does not send audio to a cloud transcription API.
Model resolution follows this order:
- resolve
transcriber_model_name(or the backend default model name) - look for the model in the py-roller transcriber model store
- if not found and local-only mode is disabled, download or materialize it into the model store
- load the resolved local model path for inference
Useful options:
--transcriber-model-path: choose the py-roller transcriber model store root--transcriber-model-name: choose a model alias, model repo id, or an explicit local model path- for
faster_whisper, bare aliases likelarge-v2,large-v3, orturboresolve toSystran/faster-whisper-*snapshots
- for
--transcriber-local-files-only: refuse network access and read only from local files/cache
audio-core installs the project's official audio feature set around the faster-whisper and CTranslate2 local transcription stack.
Examples:
py-roller run --stages t,p,a,w --audio ./vocals.wav --lyrics ./song.txt --transcriber-model-path ./models/transcriber --output-roller ./song.lrc
py-roller run --stages t,p,a,w --audio ./vocals.wav --lyrics ./song.txt --transcriber-model-path ./models/transcriber --transcriber-local-files-only --output-roller ./song.lrc
If you are on a restricted network, pre-populate the model store and then rerun with --transcriber-local-files-only.
Writer behavior
LRC
The default writer is lrc_ms which writes LRC lines with millisecond precision. Other supported writer backends are:
lrc_cs: writes LRC lines with centiscond precisionlrc_compressed: writes LRC lines with millisecond precision, but compresses consecutive lines with the same timestampass_karaoke: see below
ASS karaoke
ass_karaoke writes ASS dialogue lines with karaoke timing tags.
Current defaults:
- structural / spacing line output follows
writer_spacing(keepby default) - display end time prefers matched unit timing instead of blindly extending to the next line
- unmatched lines receive a short visible duration fallback
Example:
py-roller run \
--stages w \
--alignment-result ./song.alignment.json \
--writer-backend ass_karaoke \
--output-roller ./song.ass
Progress reporting
The project exposes a reusable progress-reporting interface so CLI and future GUI frontends can share the same stage updates.
Current behavior:
- splitter: Demucs progress plus wrapper stage progress
- filter: phase progress
- transcriber: phase progress
- aligner: phase progress plus DP row progress
In single-task run, progress is shown as CLI progress bars when the terminal supports it. In batch, per-task progress is logged to avoid multiple workers fighting for one terminal.
Intermediate files and cleanup
Intermediate files live under:
--intermediate/<task-id>/splitter
--intermediate/<task-id>/filter
--intermediate/<task-id>/logs
Default intermediate root:
<system temp>/py-roller-artifacts
Cleanup policy:
--cleanup on-successkeeps successful runs tidy by removing per-task intermediate directories--cleanup neverkeeps intermediate audio and logs for inspection
Batch mode
batch uses the same stage semantics as run, but applies them to many tasks.
Directory pairing
Directory mode currently supports:
--pair-by stem
Default candidate globs:
--audio-glob "*.mp3"--lyrics-glob "*.txt"
Matching is non-recursive.
Example:
py-roller batch \
--stages t,p,a,w \
--audio ./audio_dir \
--lyrics ./lyrics_dir \
--output-roller ./out_dir
Batch controls
--jobs N: maximum number of parallel workers--continue-on-error: keep processing remaining tasks after failures--skip-existing: skip tasks whose declared final outputs already exist--manifest jobs.yaml: load explicit per-task paths from YAML instead of pairing by stem
Parallelism guidance
--jobs controls how many tasks run at the same time. This is separate from any model-level batch size.
Recommended starting point:
- CPU-only:
--jobs 1or--jobs 2 - single GPU: usually
--jobs 1
YAML manifest format
Manifest mode is useful when filenames do not match cleanly by stem.
The manifest defines per-task input and output paths only. It does not override stage selection, language, backend choice, filter settings, jobs, or other batch-level options.
Supported top-level forms:
tasks:
- id: song01
audio: ./audio/song01_master.mp3
lyrics: ./lyrics/song01_final.txt
output_roller: ./out/song01.lrc
or:
- id: song01
audio: ./audio/song01_master.mp3
lyrics: ./lyrics/song01_final.txt
output_roller: ./out/song01.lrc
Allowed manifest input keys:
audiolyricstimed_unitsparsed_lyricsalignment_result
Allowed manifest output keys:
output_vocal_audiooutput_filtered_audiooutput_timed_unitsoutput_parsed_lyricsoutput_alignment_resultoutput_roller
Optional helper key:
id
Validation rules:
- each task must be a mapping
- unknown keys are rejected
- inputs must match the selected chain start
- outputs must be valid final outputs for the selected chain
- task ids / stems must be unique
- final output paths must not conflict across tasks
- relative paths are resolved relative to the manifest file location
YAML config for default CLI options
Use --config to load YAML defaults.
Priority order:
built-in defaults < config YAML < explicit CLI arguments
Section model:
shared: defaults applied to bothrunandbatchrun: currently no extra keys beyondsharedbatch: defaults for batch-only options such asjobsandskip_existing
Example:
shared:
language: mul
writer_spacing: keep
writer_backend: lrc_ms
intermediate: ./tmp/py-roller-artifacts
cleanup: on-success
transcriber_device: cpu
transcriber_model_path: ~/.cache/py-roller/models/transcriber
transcriber_local_files_only: false
splitter_backend: demucs
splitter_demucs_model: htdemucs
splitter_demucs_device: cpu
splitter_demucs_jobs: 0
splitter_demucs_overlap: 0.25
splitter_demucs_segment: 8
filter_chain:
- noise_gate
- dereverb
batch:
jobs: 2
audio_glob: "*.mp3"
lyrics_glob: "*.txt"
timed_units_glob: "*.json"
parsed_lyrics_glob: "*.json"
alignment_result_glob: "*.json"
continue_on_error: true
filter_chain can be written either as a comma-separated string or as a YAML list.
Troubleshooting
Interruption and child process cleanup
Batch fail-fast actively stops launching further work after the first task failure when --continue-on-error is not set, and worker cleanup includes a Windows-specific process-tree branch.
For older runs or already orphaned processes, Linux/macOS cleanup examples are still useful:
pkill -TERM -f 'python .*pyroller'
pkill -TERM -f 'demucs.separate|demucs'
If anything still survives:
pkill -KILL -f 'python .*pyroller'
pkill -KILL -f 'demucs.separate|demucs'
Inspect candidates first with:
ps -ef | grep -E 'pyroller|demucs'
Dependency policy
py-roller install now prefers the newest validated dependency line instead of the older pre-2.6 Torch family. In practice this means:
- Torch/TorchAudio/TorchVision are installed from the official 2.6.0 family for every built-in profile.
- SOCKS-proxy support is installed by default through
httpx[socks], so Hugging Face downloads do not fail just becausesocksiois missing.
If you upgrade or override these packages manually, run py-roller doctor before using transcription-heavy pipelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_roller-0.5.0.tar.gz.
File metadata
- Download URL: py_roller-0.5.0.tar.gz
- Upload date:
- Size: 88.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0caf4e1853362dc9dd65971be82c077b1511297074828ae88a3146968c4ceef1
|
|
| MD5 |
92d3d6006d3d86601c6970dd2eb0637c
|
|
| BLAKE2b-256 |
f040bcb0f9d63ac73ea9f22f2cefa49ee2544d1234c8642353d9fe4883a7e954
|
Provenance
The following attestation bundles were made for py_roller-0.5.0.tar.gz:
Publisher:
python-publish.yml on Harmonese/py-roller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_roller-0.5.0.tar.gz -
Subject digest:
0caf4e1853362dc9dd65971be82c077b1511297074828ae88a3146968c4ceef1 - Sigstore transparency entry: 1502200641
- Sigstore integration time:
-
Permalink:
Harmonese/py-roller@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc -
Branch / Tag:
refs/tags/v0.4.9-p - Owner: https://github.com/Harmonese
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc -
Trigger Event:
release
-
Statement type:
File details
Details for the file py_roller-0.5.0-py3-none-any.whl.
File metadata
- Download URL: py_roller-0.5.0-py3-none-any.whl
- Upload date:
- Size: 116.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
332be1d44d04ab0794553caf0810fc121cca3fb6a20420fe66cfa78e3a3f2905
|
|
| MD5 |
92edef34a5a36d44518783d4eb86c7e8
|
|
| BLAKE2b-256 |
ff9741dbdcd9a10dc79a8a88b6769dbd6fb46cadfe0061c752af4d7f115bde81
|
Provenance
The following attestation bundles were made for py_roller-0.5.0-py3-none-any.whl:
Publisher:
python-publish.yml on Harmonese/py-roller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_roller-0.5.0-py3-none-any.whl -
Subject digest:
332be1d44d04ab0794553caf0810fc121cca3fb6a20420fe66cfa78e3a3f2905 - Sigstore transparency entry: 1502200969
- Sigstore integration time:
-
Permalink:
Harmonese/py-roller@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc -
Branch / Tag:
refs/tags/v0.4.9-p - Owner: https://github.com/Harmonese
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc -
Trigger Event:
release
-
Statement type: