Skip to main content

A Python tool for music emotion and perceptual feature prediction

Project description

VibeNet: Music Emotion Prediction

VibeNet Flow

A while ago I canceled my Spotify subscription and began building my own offline music library. But something I always missed was Spotify's ability to generate smart playlists like Morning Mix or Driving Vibes. Spotify doesn't publish how its algorithm creates these playlists, so I set out to build my own system.

As humans, music can make us feel happy, sad, energetic, angry, calm, or a variety of other emotions. Can music make computers feel these emotions too? Well, maybe not, but we can teach them to recognize and quantify the qualities of music that drive those feelings. Then, we could use our computers to classify and organize our musical tracks by the emotions they make us feel and recommend songs that match a given mood.

VibeNet is a lightweight Python package and CLI that predicts musical emotions and attributes (valence, energy, danceability, acousticness, etc.) directly from raw audio. It utilizes a distilled EfficientNet student model trained with teacher-student distillation on the Free Music Archive (FMA) dataset.

What attributes are predicted?

VibeNet predicts 7 continuous attributes for each audio track: acousticness, danceability, energy, instrumentalness, liveness, speechiness, and valence.

Some features (like acousticness, instrumentalness, and liveness) are likelihoods: they represent probabilities of that characteristic being present (e.g. probability the track is acoustic). Others (like danceability, energy, valence) are continuous descriptors: they describe how much of the quality the track has.

For example, an acousticness value of 0.85 doesn't mean that 85% of the track is composed of acoustic instruments. It means that it's highly likely that the recording is acoustic and not electronic.

Conversely, an energy value of 0.15 doesn't mean that it's highly unlikely that the song is energetic. It reflects a degree of the quality itself, meaning that the track is overall perceived as having very low intensity.

Below is a table describing each attribute in more detail:

Attribute Type Description
Acousticness Likelihood A measure of how likely a track is to be acoustic rather than electronically produced. High values indicate recordings that rely on natural, unprocessed sound sources (e.g. solo guitar, piano ballad, etc.). Low values indicate tracks that are primarily electronic or produced with synthetic instrumentation (e.g. EDM, trap, etc.)
Instrumentalness Likelihood Predicts whether a track contains no vocals. Higher values suggest that the track contains no vocal content (e.g. symphonies), while lower values indicate the the presence of sung or spoken vocals (e.g. rap).
Liveness Likelihood A measure of how likely the track is to be a recording of a live performance. Higher values suggest the presence of live-performance characteristics (crowd noise, reverberation, stage acoustics), while lower values suggest a studio recording.
Danceability Descriptor Describes how suitable a track is for dancing. Tracks with higher values (closer to 1.0) tend to feel more danceable while, lower values (closer to 0.0) may feel less danceable.
Energy Descriptor Also known as arousal. Measures the perceived intensity and activity level of a track. Higher values indicate tracks that feel fast, loud, and powerful, while lower values indicate tracks that feel calm, soft, or subdued.
Valence Descriptor Measures the musical positivity conveyed by a track. Higher values indicate tracks that sound more positive (e.g. happy, cheerful, euphoric), while lower values indicate tracks that sound more negative (e.g. sad, depressed, angry).
Speechiness Descriptor Measures the presence of spoken words in a track. Higher values indicate that the recording is more speech-like (e.g. podcasts), while lower values suggest that the audio is more musical, with singing or purely instrumental content. Mid-range values often correspond to tracks that mix both elements, such as spoken-word poetry layered over music.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibenet-0.0.1.tar.gz (16.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibenet-0.0.1-py3-none-any.whl (16.4 MB view details)

Uploaded Python 3

File details

Details for the file vibenet-0.0.1.tar.gz.

File metadata

  • Download URL: vibenet-0.0.1.tar.gz
  • Upload date:
  • Size: 16.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for vibenet-0.0.1.tar.gz
Algorithm Hash digest
SHA256 bfcca88a2b36131723e27e2d3480a797d5dac2a06626f87a512b914f8a4078be
MD5 2cf47b3380f9d3fd132b359c646f7693
BLAKE2b-256 650688cadbe7950d40655d7989a7f91b3e38a288025efed884274f3de72a2114

See more details on using hashes here.

File details

Details for the file vibenet-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: vibenet-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for vibenet-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ddf735f925bc7d571dc387a4197b5ba9236c67ac9b30410bab06bd8fbffe674c
MD5 2b445e902e54b538fadaad7396abaf28
BLAKE2b-256 1ae20a2d60cefc1368c8be21508bcfc1810ba460cf72b19703199a1ef6294208

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page