Composable lyric-audio alignment pipeline with staged execution, batch processing, and LRC/ASS export.
Project description
py-roller
py-roller is a CLI Solution for automatic rolling lyrics generating.
To be specific, py-roller is a composable lyric-audio alignment pipeline CLI for staged execution, batch processing, and LRC/ASS export. Designed to support multiple transcriber back-ends including WhisperX and wav2vec2.
- Package name:
py-roller - CLI command:
py-roller - Python import package:
pyroller
Quick overview
py-roller treats alignment as a contiguous stage chain:
s -> f -> t -> p -> a -> w
splitter -> filter -> transcriber -> parser -> aligner -> writer
Core modes:
run: execute one contiguous stage chain for one taskbatch: execute the same contiguous stage chain across many tasks
Core artifact types:
- input audio / lyrics
- intermediate vocal and filtered audio
timed_unitsparsed_lyricsalignment_result- roller outputs such as LRC or ASS
Installation
From source:
pip install .
With audio backends and heavy model dependencies:
pip install .[audio]
After installation, the CLI command is:
py-roller
Quick start
Full pipeline: audio + lyrics -> LRC
py-roller run \
--stages s,f,t,p,a,w \
--audio ./song.mp3 \
--lyrics ./song.txt \
--filter-chain noise_gate,dereverb \
--output-roller ./song.lrc
--language zh # Choose as you like
Start from a prepared vocal track
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--writer-backend ass_karaoke \
--output-roller ./song.ass
--language zh # Choose as you like
Batch processing by stem
py-roller batch \
--stages t,p,a,w \
--audio ./audio_dir \
--lyrics ./lyrics_dir \
--output-roller ./out_dir
--language zh # Choose as you like
Core execution model
Contiguous stage chains only
The CLI only accepts contiguous subchains of the canonical order.
Valid examples:
s,f,t,p,a,wt,p,a,wa,ww
Invalid examples:
s,t,ws,p,a
Legal chain starts
Explicit artifact inputs are only valid at the correct chain start:
--audiois valid when the chain starts ats,f, ort--lyricsis valid when the chain includesp--timed-unitsand--parsed-lyricsare only valid when the chain starts ata--alignment-resultis only valid when the chain starts atw
Final outputs vs intermediate artifacts
Final user-requested outputs are only the explicit --output-* paths:
--output-vocal-audio--output-filtered-audio--output-timed-units--output-parsed-lyrics--output-alignment-result--output-roller
Everything else created under --intermediate is treated as intermediate state.
Common workflows
Start from raw audio
Use the full chain when you want splitting, filtering, transcription, alignment, and final writing in one command.
py-roller run \
--stages s,f,t,p,a,w \
--audio ./song.mp3 \
--lyrics ./song.txt \
--output-roller ./song.lrc
Start from filtered or vocal audio
Skip splitter/filter when you already have a suitable track for transcription.
py-roller run \
--stages t,p,a,w \
--audio ./vocals.wav \
--lyrics ./song.txt \
--output-roller ./song.lrc
Start from aligner artifacts
py-roller run \
--stages a,w \
--timed-units ./song.timed_units.json \
--parsed-lyrics ./song.parsed_lyrics.json \
--output-roller ./song.lrc
Rewrite only from an existing alignment result
py-roller run \
--stages w \
--alignment-result ./song.alignment.json \
--writer-backend ass_karaoke \
--output-roller ./song.ass
Backend defaults
Default backend selection is language-aware. Please note that the default selection of language is "mul" which works poorly when tested on both Chinese and English, please use the --language flag to specify the desired language if the language is directly supported.
Transcriber defaults
zh->mms_phoneticen->whisperxmul->wav2vec2_phoneme
Parser defaults
zh->zh_router_pinyinen->en_arpabetmul->mul_ipa
Other defaults
- aligner backend ->
global_dp_v1 - writer backend ->
lrc_ms - language ->
mul writer_spacing-> keepcleanup->on-success
Writer behavior
LRC
The default writer is lrc_ms which writes LRC lines with millisecond precision. Other supported writer backends are:
lrc_cs: writes LRC lines with centiscond precisionlrc_compressed: writes LRC lines with millisecond precision, but compresses consecutive lines with the same timestampass_karaoke: see below
ASS karaoke
ass_karaoke writes ASS dialogue lines with karaoke timing tags.
Current defaults:
- structural / spacing lines are skipped by default
- display end time prefers matched unit timing instead of blindly extending to the next line
- unmatched lines receive a short visible duration fallback
Example:
py-roller run \
--stages w \
--alignment-result ./song.alignment.json \
--writer-backend ass_karaoke \
--output-roller ./song.ass
Progress reporting
The project exposes a reusable progress-reporting interface so CLI and future GUI frontends can share the same stage updates.
Current behavior:
- splitter: Demucs progress plus wrapper stage progress
- filter: phase progress
- transcriber: phase progress
- aligner: phase progress plus DP row progress
In single-task run, progress is shown as CLI progress bars when the terminal supports it. In batch, per-task progress is logged to avoid multiple workers fighting for one terminal.
Intermediate files and cleanup
Intermediate files live under:
--intermediate/<task-id>/splitter
--intermediate/<task-id>/filter
--intermediate/<task-id>/logs
Default intermediate root:
<system temp>/py-roller-artifacts
Cleanup policy:
--cleanup on-successkeeps successful runs tidy by removing per-task intermediate directories--cleanup neverkeeps intermediate audio and logs for inspection
Batch mode
batch uses the same stage semantics as run, but applies them to many tasks.
Directory pairing
Directory mode currently supports:
--pair-by stem
Default candidate globs:
--audio-glob "*.mp3"--lyrics-glob "*.txt"
Matching is non-recursive.
Example:
py-roller batch \
--stages t,p,a,w \
--audio ./audio_dir \
--lyrics ./lyrics_dir \
--output-roller ./out_dir
Batch controls
--jobs N: maximum number of parallel workers--continue-on-error: keep processing remaining tasks after failures--skip-existing: skip tasks whose declared final outputs already exist--manifest jobs.yaml: load explicit per-task paths from YAML instead of pairing by stem
Parallelism guidance
--jobs controls how many tasks run at the same time. This is separate from any model-level batch size.
Recommended starting point:
- CPU-only:
--jobs 1or--jobs 2 - single GPU: usually
--jobs 1
YAML manifest format
Manifest mode is useful when filenames do not match cleanly by stem.
The manifest defines per-task input and output paths only. It does not override stage selection, language, backend choice, filter settings, jobs, or other batch-level options.
Supported top-level forms:
tasks:
- id: song01
audio: ./audio/song01_master.mp3
lyrics: ./lyrics/song01_final.txt
output_roller: ./out/song01.lrc
or:
- id: song01
audio: ./audio/song01_master.mp3
lyrics: ./lyrics/song01_final.txt
output_roller: ./out/song01.lrc
Allowed manifest input keys:
audiolyricstimed_unitsparsed_lyricsalignment_result
Allowed manifest output keys:
output_vocal_audiooutput_filtered_audiooutput_timed_unitsoutput_parsed_lyricsoutput_alignment_resultoutput_roller
Optional helper key:
id
Validation rules:
- each task must be a mapping
- unknown keys are rejected
- inputs must match the selected chain start
- outputs must be valid final outputs for the selected chain
- task ids / stems must be unique
- final output paths must not conflict across tasks
- relative paths are resolved relative to the manifest file location
YAML config for default CLI options
Use --config to load YAML defaults.
Priority order:
built-in defaults < config YAML < explicit CLI arguments
Section model:
shared: defaults applied to bothrunandbatchrun: currently no extra keys beyondsharedbatch: defaults for batch-only options such asjobsandskip_existing
Example:
shared:
language: mul
writer_spacing: keep
writer_backend: lrc_ms
intermediate: ./tmp/py-roller-artifacts
cleanup: on-success
transcriber_device: cpu
splitter_backend: demucs
splitter_demucs_model: htdemucs
splitter_demucs_device: cpu
splitter_demucs_jobs: 0
splitter_demucs_overlap: 0.25
splitter_demucs_segment: 8
filter_chain:
- noise_gate
- dereverb
batch:
jobs: 2
audio_glob: "*.mp3"
lyrics_glob: "*.txt"
timed_units_glob: "*.json"
parsed_lyrics_glob: "*.json"
alignment_result_glob: "*.json"
continue_on_error: true
filter_chain can be written either as a comma-separated string or as a YAML list.
Troubleshooting
Interruption and child process cleanup
Batch fail-fast actively stops launching further work after the first task failure when --continue-on-error is not set, and worker cleanup includes a Windows-specific process-tree branch.
For older runs or already orphaned processes, Linux/macOS cleanup examples are still useful:
pkill -TERM -f 'python .*pyroller'
pkill -TERM -f 'demucs.separate|demucs'
If anything still survives:
pkill -KILL -f 'python .*pyroller'
pkill -KILL -f 'demucs.separate|demucs'
Inspect candidates first with:
ps -ef | grep -E 'pyroller|demucs'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_roller-0.4.1.tar.gz.
File metadata
- Download URL: py_roller-0.4.1.tar.gz
- Upload date:
- Size: 65.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03aefc58b9c6e31649526bb4d38e8c75a5676366a13ca2eecda85dbc0152fed7
|
|
| MD5 |
6ab83989c91b6752b068620edc1069b8
|
|
| BLAKE2b-256 |
e84a02915614314d7c620f21832adbd35f5a3f0582abe44b9b7bd1aefd204424
|
Provenance
The following attestation bundles were made for py_roller-0.4.1.tar.gz:
Publisher:
python-publish.yml on Harmonese/py-roller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_roller-0.4.1.tar.gz -
Subject digest:
03aefc58b9c6e31649526bb4d38e8c75a5676366a13ca2eecda85dbc0152fed7 - Sigstore transparency entry: 1268004962
- Sigstore integration time:
-
Permalink:
Harmonese/py-roller@80f772d1cc8b6822d9d14d18b1dad67e5de12689 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/Harmonese
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@80f772d1cc8b6822d9d14d18b1dad67e5de12689 -
Trigger Event:
release
-
Statement type:
File details
Details for the file py_roller-0.4.1-py3-none-any.whl.
File metadata
- Download URL: py_roller-0.4.1-py3-none-any.whl
- Upload date:
- Size: 85.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cf9028caf7312d9a39ff846dc82752ad67c268c5523609f080102c747bf979a
|
|
| MD5 |
7301c98a554edbfddba87c3a06bc4f0b
|
|
| BLAKE2b-256 |
2c149e54bade04923e2703bd559f306aebc153a9cb2e07c03f4dfbbebced8022
|
Provenance
The following attestation bundles were made for py_roller-0.4.1-py3-none-any.whl:
Publisher:
python-publish.yml on Harmonese/py-roller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_roller-0.4.1-py3-none-any.whl -
Subject digest:
3cf9028caf7312d9a39ff846dc82752ad67c268c5523609f080102c747bf979a - Sigstore transparency entry: 1268005079
- Sigstore integration time:
-
Permalink:
Harmonese/py-roller@80f772d1cc8b6822d9d14d18b1dad67e5de12689 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/Harmonese
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@80f772d1cc8b6822d9d14d18b1dad67e5de12689 -
Trigger Event:
release
-
Statement type: