BeatBot CLI — local audio feature extraction for cloud cue-point prediction
Project description
BeatBot — AI-Powered DJ Mixing Tool
BeatBot is an AI-powered mixing assistant that analyses house music tracks and automatically selects the optimal entry and exit cue points for seamless DJ transitions. A React frontend gives the user full visibility into the model's predictions and lets them trigger or override crossfades in real time.
Table of Contents
Use Case
A user loads a playlist of house music tracks into the queue. BeatBot:
- Analyses each track with
librosaand theFeatureExtractorpipeline. - Scores every bar in the track with the trained dual LambdaRank model.
- Surfaces the best entry and exit bar for each track in the UI.
- Automatically crossfades to the next track when the exit cue approaches, or immediately on user request.
The user can inspect the scoring charts, manually drag cue points, skip the upcoming track, reorder the queue, and adjust the crossfade duration — all without touching the model.
Project Structure
BeatBot/
├── src/ # Python back-end
│ ├── api/ # FastAPI application
│ │ ├── main.py # App factory, CORS, router registration
│ │ ├── state.py # Shared runtime state (queue, cue cache, predict_cues)
│ │ ├── schemas.py # Pydantic request / response models
│ │ ├── ws_manager.py # WebSocket connection manager
│ │ └── routes/
│ │ ├── audio.py # GET /audio/{track_id} — streams MP3
│ │ ├── tracks.py # GET /tracks — lists library
│ │ ├── queue.py # Queue CRUD + reorder
│ │ ├── predict.py # POST /predict/{track_id}
│ │ ├── cues.py # PATCH /cues/{track_id}
│ │ ├── session.py # WebSocket /ws/session
│ │ └── transition.py # POST /transition/now
│ ├── model/
│ │ └── lightgbm.py # BeatBotModel — dual LambdaRank wrapper
│ ├── extractor/
│ │ └── extractor.py # Audio → Track pipeline (librosa)
│ ├── features.py # FeatureExtractor — 40+ features per bar
│ ├── track.py # Track dataclass
│ └── annotator.py # Annotation helper (JAMS format)
│
├── frontend/ # React + TypeScript UI
│ └── src/
│ ├── App.tsx # Root: queue state, deck routing, crossfade logic
│ ├── api/client.ts # Typed fetch helpers for every API route
│ ├── hooks/
│ │ ├── useAudioEngine.ts # Web Audio API playback engine
│ │ └── useWebSocket.ts # WS client with exponential-backoff reconnect
│ ├── components/
│ │ ├── Deck.tsx # NOW PLAYING / UP NEXT panel
│ │ ├── CueChart.tsx # Recharts score visualisation (entry + exit)
│ │ ├── FeatureCharts.tsx # Energy, beat strength, vocal confidence
│ │ ├── WaveformView.tsx # WaveSurfer.js waveform with cue markers
│ │ ├── Queue.tsx # Drag-and-drop queue list
│ │ ├── Transport.tsx # Play / Stop / Mix Now controls
│ │ └── ErrorBoundary.tsx
│ └── types/ # Shared TypeScript interfaces
│
├── data/
│ ├── custom/
│ │ ├── house_music_personal.csv # Personal track library
│ │ └── annotations/ # JAMS annotation files
│ ├── M-DJCUE/ # Academic dataset (EDM)
│ ├── models/ # Serialised model runs (.pkl)
│ └── processed/ # Pre-extracted feature cache
│
├── mds/ # Design and architecture notes
├── pyproject.toml
└── makefile
The Model
BeatBot uses a Learning-to-Rank (LambdaRank) approach implemented in LightGBM (src/model/lightgbm.py).
Why Learning-to-Rank?
DJing is inherently a ranking problem, not a classification one. Some bars are perfect cue points, others are acceptable, and most are irrelevant. LambdaRank directly optimises NDCG (Normalized Discounted Cumulative Gain), which rewards pushing the best bars to the top of the ranked list.
Dual Rankers
Two separate models are trained for the two halves of the mixing decision:
| Model | Goal | Configuration |
|---|---|---|
| Entry Ranker | Structural beginnings — intros, breakdowns | High regularisation (reg_lambda=15), shallow trees (max_depth=3) to learn general structural rules rather than overfitting |
| Exit Ranker | Structural endings — outros, post-chorus | Lower regularisation (reg_lambda=5), deeper trees (max_depth=4) to capture complex energy dynamics |
Training Labels
Each bar in a training track is given a graded relevance label:
2— Perfect cue (exact human annotation)1— Acceptable (within ±2 bars of annotation)0— Not a cue point
Inference
At inference time (src/api/state.py → predict_cues):
FeatureExtractor.extract(track)produces a feature matrix (one row per bar).- Both rankers score every bar.
- A positional weight discourages exit cues in the final ~15% of the track (where the model would otherwise exploit the structural similarity of outros).
- If the selected entry and exit are implausibly close, the exit score is masked within
min_sep_barsof the entry and the best remaining candidate is chosen. - Results are cached per track and returned to the frontend within ~200 ms.
Model Artefacts
Trained models are saved under data/models/. Each run directory contains:
beatbot_model.pkl— serialisedBeatBotModelevaluation.json— NDCG scores and feature importancesfigures/— training curves and prediction plots
Feature Engineering
src/features.py computes 40+ features per bar, organised into 9 tiers:
| Tier | Features | Purpose |
|---|---|---|
| 1 – Structure | bar_pos_norm, dist_to_section, phrase_pos, duration |
"Where am I in the song?" |
| 2 – Energy | energy_prev_8, energy_next_8, energy_volatility, energy_derivative, beat_strength |
"How energetic is this section?" |
| 3 – Timbre | spectral_centroid, vocal_conf, harmonic_ratio, high_band_energy |
"What does it sound like?" |
| 4 – Chroma | chroma_rel_0/3/7/9/11 |
Key-invariant harmonic function (Tonic, Minor-3rd, Dominant…) |
| 5 – Rhythmic Grid | is_4_bar, bar_mod_8/16/32 |
Phrasing alignment — mixes should land on the "1" |
| 6 – Flux | energy_flux, spectral_flux |
Instantaneous change (drops, crashes) |
| 7 – Advanced Context | energy_contrast_future, is_likely_breakdown, vocal_future_8, vocal_past_8 |
Look-ahead / look-behind "human" features |
| 8 – Metadata | is_section_start, beat_consistency, percussion_intensity, spectral_rolloff |
Structural and rhythmic metadata |
| 9 – Composite | phrase_boundary_strength |
Count of grid alignments (0–5) — strong downbeat signal |
Chroma features are key-invariant: the raw 12-bin chroma vector is rotated by the track's detected tonic so the model learns harmonic function (Dominant, Subdominant) rather than absolute pitch class.
API
The backend is a FastAPI app (src/api/) served by uvicorn.
PYTHONPATH=src .venv/bin/uvicorn api.main:app --reload --app-dir src
# Runs on http://localhost:8000
Key endpoints:
| Method | Path | Description |
|---|---|---|
GET |
/tracks |
List all tracks in the library |
GET |
/audio/{track_id} |
Stream the MP3 file |
POST |
/predict/{track_id} |
Run cue prediction; returns scores + selected cues |
PATCH |
/cues/{track_id} |
Override a cue point; validates and broadcasts via WS |
GET/POST/DELETE |
/queue |
Queue management |
PATCH |
/queue/reorder |
Reorder two queue positions |
POST |
/transition/now |
Trigger immediate crossfade |
WS |
/ws/session |
Real-time push events (queue.updated, cues.accepted) |
Frontend
The UI is a React 19 + TypeScript single-page app built with Vite 6.
cd frontend && pnpm install && pnpm dev
# Runs on http://localhost:5173
Key design decisions:
- Two physical decks (A / B) alternate roles as NOW PLAYING and UP NEXT. The
activeDeckref drives all routing logic so async crossfades never touch the wrong slot. - Web Audio engine (
useAudioEngine) handles all playback, crossfading, and elapsed-time reporting. - WaveSurfer.js renders the waveform but is staggered 3.5 s after deck load to avoid a simultaneous double PCM-decode that triggers Chrome OOM crashes.
- WebSocket (
useWebSocket) reconnects with exponential backoff (150 ms → 5 s cap) so uvicorn--reloadrestarts are transparent. - Recharts charts (cue scores + feature charts) share a
syncIdfor synchronised hover cursors and render a live playheadReferenceLine.
Running Locally
Prerequisites: Python ≥ 3.13, Node.js ≥ 20, pnpm.
# 1. Python environment
python3.13 -m venv .venv
source .venv/bin/activate
pip install -e .
# 2. Backend
PYTHONPATH=src uvicorn api.main:app --reload --app-dir src
# 3. Frontend (separate terminal)
cd frontend
pnpm install
pnpm dev
Open http://localhost:5173.
Data
| Path | Contents |
|---|---|
data/custom/annotations/ |
JAMS files — manually annotated cue points for ~100 house tracks |
data/custom/house_music_personal.csv |
Track metadata (BPM, key, duration, file path) |
data/M-DJCUE/ |
Academic EDM dataset used for additional training signal |
data/models/ |
Serialised model runs; the active model path is configured in src/api/state.py |
data/processed/ |
Pre-extracted feature DataFrames cached as Parquet — regenerated by src/extractor/extractor.py if missing |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file beatbot-0.1.2.tar.gz.
File metadata
- Download URL: beatbot-0.1.2.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7aed3b16a43989258831a24c4e4d840b036148d4a136e6f9d8c4daa3fc64b4b
|
|
| MD5 |
53e83ebe8dd06249a2827a2bf10f9cdd
|
|
| BLAKE2b-256 |
50f12ea6f895e1472dd6ad64558efff534df01df5a6fa57d76ead63f35be7619
|
File details
Details for the file beatbot-0.1.2-py3-none-any.whl.
File metadata
- Download URL: beatbot-0.1.2-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d32d041e4e836d10d82bf90cd2a24c6b1bbd5a06b72fed593bafe62963e682ac
|
|
| MD5 |
34c77f50493c42dff1ea29747a5f852a
|
|
| BLAKE2b-256 |
f65d7f5621bb91b19b4703374f82a7df166557662a16d6289d25bcc125c85f61
|