Music lyrics transcription tool using AI
Project description
🎵 Kamasi
Kamasi is a local automated music lyric transcription tool. It leverages state-of-the-art AI models to isolate vocals, transcribe them with high precision, and refine the final text using local LLMs.
🚀 Features
- Vocal Separation: Isolate vocals from background music using Demucs to ensure the highest transcription accuracy.
- High-Speed Transcription: Powered by Faster-Whisper for efficient and precise speech-to-text conversion.
- LLM Refinement: Automatically fix punctuation, formatting, and transcription hallucinations using Ollama.
- Privacy-First (100% Local): All processing happens on your machine. No data or audio files are ever uploaded to the cloud.
- YAML Driven: Fully customizable workflow via a simple
config.yamlfile.
🛠 Installation
This project uses uv for lightning-fast Python dependency management.
- Clone the repository:
git clone https://codeberg.org/ley0x/kamasi.git
cd kamasi
- Install dependencies:
uv sync
- External Requirements:
- FFmpeg: Required for audio processing and conversion.
- Ollama: Required if you enable the LLM refinement stage.
⚙️ Configuration
Create or edit the config.yaml file in the root directory to set your preferences:
# Note: The input file is now specified as a command-line argument
# Usage: uv run kamasi <audio_file.mp3>
audio:
separate_vocals: true # Activer/Désactiver Demucs
device: "cuda" # "cuda" pour GPU NVIDIA ou "cpu"
model: "htdemucs" # htdemucs, htdemucs_ft, mdx_extra
transcription:
model_size: "tiny" # tiny, base, small, medium, large-v3
language: "fr" # Code langue (fr, en, etc.) ou null pour auto-detect
compute_type: "int8" # float16 pour GPU, int8 pour CPU
refinement:
enabled: true
ollama_url: "http://localhost:11434"
model_name: "mistral-nemo:12b"
prompt: |
Write here your instruction for the LLM to follow.
📖 Usage
Basic Usage
Transcribe an audio file using the default config:
uv run kamasi audio.mp3
uv run kamasi "path/to/song.mp3"
CLI Options
Specify a custom configuration file:
uv run kamasi --config custom.yaml audio.mp3
uv run kamasi -c custom.yaml audio.mp3
Enable verbose logging (show detailed progress):
uv run kamasi --verbose audio.mp3
uv run kamasi -v audio.mp3
Show version information:
uv run kamasi --version
List available Ollama models:
uv run kamasi --list-models
uv run kamasi --list-models -c custom.yaml
Combine options:
uv run kamasi -v -c my-config.yaml audio.mp3
View help:
uv run kamasi --help
Batch Processing
Process multiple files:
for file in input/*.mp3; do
uv run kamasi "$file"
done
The final lyrics will be saved as a .txt file in the current directory.
Note: By default, only errors are shown. Use --verbose / -v to see detailed progress including INFO and DEBUG messages.
📝 Recommendations
Some audio separation models recommandations:
- htdemucs (current choice) ✅ - Best overall choice
- Fast processing
- Excellent vocal separation quality
- Well-balanced
- htdemucs_ft - If you want maximum quality and don't mind 4x slower processing
- mdx_extra - Alternative if htdemucs doesn't work well for your music style
Some ollama models recommandations:
- Llama 3.1 (8B): All types of standard corrections.
- Mistral-Nemo (12B): French language, French songs, rap (slang).
- Qwen 2.5 (14B): Long texts, strict formatting required.
- Gemma 2 (9B): Creativity, poetic or abstract texts.
Some prompt examples:
- French songs:
**Rôle** : Tu es un expert en édition musicale et correction de paroles. Voici un texte brut issu d'une transcription automatique. Il contient des erreurs phonétiques (homophones) et de ponctuation.
Tes instructions :
- Corrige l'orthographe et la grammaire.
- Déduis les mots mal transcrits en te basant sur le contexte de la phrase et les rimes probables.
- Ajoute la ponctuation et respecte les sauts de ligne (format paroles de chanson).
- **IMPORTANT** : Ne réécris pas le style et n'invente pas de nouvelles phrases. Reste fidèle à l'audio original supposé.
Ne donne aucune explication, sors uniquement le texte corrigé.
Texte à corriger :
- French songs (rap / urban):
**Rôle** :Tu es un expert de la culture Hip-hop, de l'argot urbain (français/anglais) et un éditeur de paroles professionnel.
**Contexte** : Voici une transcription brute d'un morceau de rap générée par une IA. Elle contient des erreurs phonétiques, rate souvent les mots d'argot ou les noms propres, et la mise en page est inexistante.
Tes instructions :
- **Correction Phonétique Intelligente** : Corrige les mots mal entendus en te basant sur le contexte "Street" et la rime. (Exemple : Si l'audio transcrit "le ter-ter", ne corrige pas en "la terre", garde "le ter-ter").
- **Respect de la Langue** : NE CORRIGE PAS la grammaire si c'est une faute volontaire de style (ex: "J'ai pas" au lieu de "Je n'ai pas", "C'est nous les meilleurs" au lieu de "Ce sont nous..."). Laisse le verlan et l'argot tels quels.
- **Structure et Flow** : Formate le texte pour refléter le rythme. Fais des sauts de ligne courts. Essaie d'identifier et de marquer les sections : [Couplet], [Refrain], [Pont], [Outro].
- **Ad-libs** : Si tu repères des interjections d'ambiance (Yeah, Han, Skrt), mets-les entre parenthèses ou en fin de ligne, ou supprime-les si elles nuisent à la lecture.
- **Noms Propres** : Sois vigilant sur les noms de rappeurs, de marques, de villes ou de quartiers souvent cités dans le rap.
Contraintes :
- Ne donne aucune explication avant ou après le texte.
- Ne censure pas les vulgarités.
- Affiche uniquement les paroles finales formatées.
Texte brut à traiter :
- English songs:
Role: You are an expert music editor and lyrics corrector.
Context: Below is raw text generated by an automatic audio transcription tool. It contains phonetic errors (homophones), missing punctuation, and lacks formatting.
Your Instructions:
Correction: Fix spelling and grammar errors caused by the transcription software.
Contextual Deduction: Correct mistranscribed words based on the context of the sentence and probable rhymes.
Formatting: Add proper punctuation and capitalization. Structure the text as song lyrics (short lines, stanzas, spacing between verses).
Fidelity: IMPORTANT: Do not rewrite the style or change the meaning. Do not invent new lines. Stick as close to the phonetic audio as possible while making it readable.
Output: Provide only the corrected lyrics. Do not add any conversational text or explanations.
Raw Text to Process:
- English songs (rap / urban):
**Role**: You are an expert in Hip-Hop culture, AAVE (African American Vernacular English), urban slang, and a professional lyrics editor.
**Context**: Below is a raw transcription of a Rap song. The audio tool struggled with the speed, slang, and flow.
Your Instructions:
- **Smart Phonetic Correction**: Fix words that were misheard based on "Street" context and rhyme schemes. (e.g., if the text says "trap house," do not change it to "trap mouse").
- **Respect the Dialect**: DO NOT "fix" the grammar if it is intentional slang or AAVE (e.g., keep "I ain't got no" instead of changing it to "I do not have any"). Preserve contractions and street vernacular.
- **Structure & Flow**: Format the text to reflect the rhythm (bars). Use short line breaks. Try to identify and label sections if clear: [Verse], [Chorus], [Bridge], [Outro].
**Ad-libs**: If you detect background ad-libs (Yeah, Uh, Skrt), place them in parentheses (Yeah) or at the end of lines.
- **Cultural Accuracy**: Be vigilant with proper nouns—names of rappers, luxury brands, cities, or specific neighborhoods often mentioned in Rap.
Constraints:
- Do not provide any introductory text or summary.
- Do not censor explicit language or profanity.
- Output only the final formatted lyrics.
Raw Text to Process:
Recommanded AI temperatures: between 0.2 and 0.3.
🛠 Project Structure
The project follows a functional programming approach for clarity and modularity:
audio_processing.py: Vocal separation and audio cleaning.transcription.py: Whisper engine logic.llm_refinement.py: Ollama API integration.config_loader.py: YAML settings management.
🧑💻 Development
Code Quality Tools
This project uses Ruff for linting and code formatting to maintain consistent code quality.
Install pre-commit hooks:
uvx pre-commit install
Run linting with auto-fix:
uv run ruff check --fix .
Format code:
uv run ruff format .
The pre-commit hook will automatically run ruff checks before each commit. If you need to bypass the hook temporarily, use git commit --no-verify (not recommended).
⚖️ License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kamasi-0.1.1.tar.gz.
File metadata
- Download URL: kamasi-0.1.1.tar.gz
- Upload date:
- Size: 60.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb29d469088894e7af2c022ad0067cbe9a3789e406c162e43b0a3d0c3ceb332d
|
|
| MD5 |
5c8555090b86f639ba50b4b056285d59
|
|
| BLAKE2b-256 |
9c4a8f518386fc6d180dbb144cf46a5efe7ca75728dee3d10275d6a9a2ab82d9
|
File details
Details for the file kamasi-0.1.1-py3-none-any.whl.
File metadata
- Download URL: kamasi-0.1.1-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12ca4c17d61e29e2099608a18afaac444cbc7182b4459fd5e5af29d2ad5b71db
|
|
| MD5 |
d7228aecbae1d55bb28252a589b322ec
|
|
| BLAKE2b-256 |
69c8a170b81c0e56e802023e0324b97b1f487de9dbe56f544e2923e9742d2a0f
|