Swedish folk music audio analysis and dance style classification

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

svnoak

These details have not been verified by PyPI

Project description

NeckenML Analyzer

Swedish folk music audio analysis and dance style classification using machine learning.

Overview

NeckenML Analyzer is a Python package that provides advanced audio analysis and automatic dance style classification for Swedish folk music. It uses a combination of signal processing, machine learning, and domain-specific heuristics to:

Analyze audio features: BPM, meter (ternary/binary), swing ratio, vocal presence, articulation, bounciness, and more
Classify dance styles: Polska, Hambo, Vals, Polka, Schottis, Snoa, and other Swedish folk dance types
Assess authenticity: Distinguish traditional folk recordings from modern/electronic interpretations

Features

Comprehensive Audio Analysis
- Tempo and beat detection using Madmom RNN (optimized for rubato in folk music)
- Meter classification (3/4 ternary vs 2/4/4/4 binary)
- MusiCNN embeddings for audio texture fingerprinting
- Vocal vs instrumental detection
- Swing ratio calculation
- Articulation analysis (smooth/staccato/punchy)
- Folk-specific features (Polska vs Hambo signatures)
Machine Learning Classification
- Pre-trained RandomForest classifier included
- Hierarchical decision-making (metadata → ML → heuristics)
- Confidence scores for each prediction
- Support for model retraining with custom data
Extensible Architecture
- Abstract AudioSource interface for flexible audio acquisition
- Built-in file-based source
- Easy to implement custom sources (S3, HTTP, streaming, etc.)

Installation

1. Install the package

For core functionality (classification with pre-computed features):

pip install neckenml-analyzer

For full audio analysis capabilities, you'll need additional dependencies that require system libraries:

# Install system dependencies (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install -y libsndfile1 ffmpeg gcc g++

# Install audio analysis dependencies
pip install librosa soundfile madmom

# Note: essentia-tensorflow is not available via pip and requires manual installation
# See: https://essentia.upf.edu/installing.html

Note: The audio analysis features (AudioAnalyzer) require essentia-tensorflow, which needs to be built from source. If you only need to classify tracks using pre-computed features, the base package is sufficient.

2. Set up PostgreSQL

neckenml Analyzer uses PostgreSQL with the pgvector extension for storing embeddings:

# Create database
createdb neckenml

# Enable pgvector extension
psql neckenml -c "CREATE EXTENSION vector;"

3. Download pre-trained models

The analyzer requires Essentia's MusiCNN models (not included due to licensing):

# Create models directory
mkdir -p ~/.neckenml/models

# Download MusiCNN embedding model
wget https://essentia.upf.edu/models/feature-extractors/musicnn/msd-musicnn-1.pb \
  -O ~/.neckenml/models/msd-musicnn-1.pb

# Download voice/instrumental classifier
wget https://essentia.upf.edu/models/audio-event-recognition/voice_instrumental/voice_instrumental-musicnn-msd-1.pb \
  -O ~/.neckenml/models/voice_instrumental-musicnn-msd-1.pb

Quick Start

from neckenml import AudioAnalyzer, StyleClassifier
from neckenml.sources import FileAudioSource

# Set up audio source (file-based)
source = FileAudioSource(audio_dir="/path/to/your/audio/files")

# Initialize analyzer with audio source
analyzer = AudioAnalyzer(
    audio_source=source,
    model_dir="~/.neckenml/models"  # Optional, uses default if not specified
)

# Analyze an audio file (track_id should match filename without extension)
features = analyzer.analyze(track_id="my_track")

# The features dict contains:
# - bpm: Tempo in beats per minute
# - meter: 'ternary' or 'binary'
# - swing_ratio: 0.0-1.0 (0.5 = straight, 0.67 = triplet feel)
# - vocal_probability: 0.0-1.0 (vocal vs instrumental)
# - embedding: 217-dimensional feature vector
# - and many more...

# Classify dance style
classifier = StyleClassifier()
result = classifier.classify(features)

print(f"Detected style: {result['primary_style']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Secondary styles: {result['secondary_styles']}")

# Example output:
# Detected style: Polska
# Confidence: 85.0%
# Secondary styles: [('Slängpolska', 0.65)]

Advanced Usage

Artifact Persistence for Fast Re-analysis

Store expensive-to-compute artifacts once, then re-analyze instantly without touching audio files:

from neckenml import AudioAnalyzer, compute_derived_features

# Initial analysis with artifact storage
result = analyzer.analyze_file(
    file_path="/path/to/track.mp3",
    return_artifacts=True
)

features = result["features"]          # Derived features
artifacts = result["raw_artifacts"]     # Raw data to store

# Store artifacts in database (AnalysisSource.raw_data JSONB column)
db.store(artifacts)

# Later: Fast re-analysis from stored artifacts (300x faster!)
new_features = compute_derived_features(artifacts)

# Re-classify with updated model (no audio needed!)
new_features = compute_derived_features(artifacts, new_classifier=my_model)

Performance: Re-classify 1000 tracks in ~2 minutes instead of 8+ hours!

See Artifact Persistence Documentation for details.

Custom Audio Source

Implement the AudioSource interface for custom audio acquisition:

from neckenml.sources import AudioSource
import os

class CloudStorageAudioSource(AudioSource):
    """Fetch audio from cloud object storage"""

    def __init__(self, bucket_name, storage_client):
        self.client = storage_client
        self.bucket = bucket_name

    def fetch_audio(self, track_id: str) -> str:
        """Download audio file from cloud storage and return local path"""
        local_path = f"/tmp/{track_id}.mp3"
        self.client.download_file(
            bucket=self.bucket,
            key=f"audio/{track_id}.mp3",
            destination=local_path
        )
        return local_path

    def cleanup(self, file_path: str) -> None:
        """Clean up temporary file"""
        if os.path.exists(file_path):
            os.remove(file_path)

# Use custom source
source = CloudStorageAudioSource(bucket_name="my-music-bucket", storage_client=my_client)
analyzer = AudioAnalyzer(audio_source=source)

Retraining the Classifier

Train a custom model with your own labeled data:

from neckenml.training import TrainingService
import numpy as np

# Prepare training data
# You can use the Dansbart.se dataset (see Training Data section below)
# or analyze your own collection of labeled audio files
embeddings = np.array([...])  # Nx217 feature vectors from analyzer
labels = ["Polska", "Hambo", "Polska", ...]  # Dance style labels

# Train new model
trainer = TrainingService(model_path="./my_custom_model.pkl")
trainer.train_from_data(embeddings, labels)

# The classifier will automatically use the new model

Training Data

To train or improve the classifier, you need labeled audio data with dance style classifications. The following public datasets are available:

Dansbart.se Open Dataset

Dansbart.se provides a public dataset of Swedish folk music with human-validated dance style classifications:

Content: Audio analysis features, dance style classifications, and human feedback/corrections
License: CC BY 4.0 (free to use with attribution)
Format: JSON via REST API
Details: https://dansbart.se/dataset-info.html

The dataset includes:

Audio features (BPM, meter, embeddings) generated by neckenml-analyzer
Primary dance style classifications with confidence scores
Human feedback and corrections (valuable ground truth for improving models)
Track metadata and structure information

This dataset is ideal for:

Training custom dance style classifiers
Validating model improvements against human feedback
Benchmarking classification accuracy
Research on Swedish folk music analysis

Supported Dance Styles

Ternary (3/4 meter):

Polska
Slängpolska
Hambo
Vals (Waltz)
Springlek
Mazurka

Binary (2/4, 4/4 meter):

Polka
Schottis
Snoa
Gånglåt
Engelska
Marsch

Documentation

Installation Guide - Detailed setup instructions
Quick Start - Getting started examples
Extending - Custom AudioSource implementations and model training
Scripts - CLI tools for evaluation and optimization

Architecture

NeckenML Analyzer uses a multi-stage pipeline:

Audio Acquisition: Flexible AudioSource interface
Feature Extraction: Madmom RNN for beat/rhythm analysis + Librosa for onsets
Embedding Generation: MusiCNN for 217-dim audio fingerprints
Folk Features: Domain-specific rhythm and meter analysis
Classification: Hierarchical decision tree (metadata → ML → heuristics)
Artifact Persistence: Store raw analysis outputs for instant re-classification

Requirements

Python 3.9+
PostgreSQL with pgvector extension
Essentia pre-trained models (see installation instructions)

Contributing

We welcome contributions! Please see our Contributing Guide for details on:

How to report bugs and suggest enhancements
Development setup and coding standards
Testing requirements and guidelines
Pull request process

Whether you're fixing a bug, adding a feature, or improving documentation, your contributions help make Swedish folk music more accessible through technology.

License

MIT License - see LICENSE file for details.

Citation

If you use NeckenML Analyzer in your research, please cite:

@software{neckenml_analyzer,
  title = {NeckenML Analyzer: Swedish Folk Music Analysis and Classification},
  author = {NeckenML Contributors},
  year = {2025},
  url = {https://github.com/svnoak/neckenml-analyzer}
}

Acknowledgments

Built with Essentia audio analysis library
MusiCNN models by Jordi Pons et al.
Training data provided by Dansbart.se under CC BY 4.0 license
Powered by the Swedish folk music community

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

svnoak

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

Mar 10, 2026

0.3.0

Feb 11, 2026

0.2.3

Jan 12, 2026

This version

0.2.2

Jan 9, 2026

0.2.1

Dec 26, 2025

0.2.0

Dec 23, 2025

0.1.1.post1

Dec 19, 2025

0.1.1

Dec 19, 2025

0.1.0

Dec 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neckenml_analyzer-0.2.2.tar.gz (62.0 kB view details)

Uploaded Jan 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neckenml_analyzer-0.2.2-py3-none-any.whl (62.5 kB view details)

Uploaded Jan 9, 2026 Python 3

File details

Details for the file neckenml_analyzer-0.2.2.tar.gz.

File metadata

Download URL: neckenml_analyzer-0.2.2.tar.gz
Upload date: Jan 9, 2026
Size: 62.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for neckenml_analyzer-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`4804649de5766178025dbbae6f84884645d1ab48b94a80ad59df56b65b8403cf`
MD5	`7f45b7807d99532d3298770db19d4ae4`
BLAKE2b-256	`330f7adf9bbd1b4c2fff7c181756349bffe5c28cd3a0557170319ac6c9497aef`

See more details on using hashes here.

Provenance

The following attestation bundles were made for neckenml_analyzer-0.2.2.tar.gz:

Publisher: release.yml on svnoak/neckenml-analyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: neckenml_analyzer-0.2.2.tar.gz
- Subject digest: 4804649de5766178025dbbae6f84884645d1ab48b94a80ad59df56b65b8403cf
- Sigstore transparency entry: 810253123
- Sigstore integration time: Jan 9, 2026
Source repository:
- Permalink: svnoak/neckenml-analyzer@3625dc52876cbae5fdef3e743b5940b507228466
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/svnoak
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3625dc52876cbae5fdef3e743b5940b507228466
- Trigger Event: release

File details

Details for the file neckenml_analyzer-0.2.2-py3-none-any.whl.

File metadata

Download URL: neckenml_analyzer-0.2.2-py3-none-any.whl
Upload date: Jan 9, 2026
Size: 62.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for neckenml_analyzer-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54fc073b6d6814bfb27cf4f65e2478d54c2c41d730391b84ee47524628a2be14`
MD5	`60481fd4fb47e2a27b4f1f86db650804`
BLAKE2b-256	`c7279cd265f6a484675eef8821e6e03b09e0e256c3316bd5dd19ac75f042b58a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for neckenml_analyzer-0.2.2-py3-none-any.whl:

Publisher: release.yml on svnoak/neckenml-analyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: neckenml_analyzer-0.2.2-py3-none-any.whl
- Subject digest: 54fc073b6d6814bfb27cf4f65e2478d54c2c41d730391b84ee47524628a2be14
- Sigstore transparency entry: 810253129
- Sigstore integration time: Jan 9, 2026
Source repository:
- Permalink: svnoak/neckenml-analyzer@3625dc52876cbae5fdef3e743b5940b507228466
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/svnoak
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3625dc52876cbae5fdef3e743b5940b507228466
- Trigger Event: release

neckenml-analyzer 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

NeckenML Analyzer

Overview

Features

Installation

1. Install the package

2. Set up PostgreSQL

3. Download pre-trained models

Quick Start

Advanced Usage

Artifact Persistence for Fast Re-analysis

Custom Audio Source

Retraining the Classifier

Training Data

Dansbart.se Open Dataset

Supported Dance Styles

Documentation

Architecture

Requirements

Contributing

License

Citation

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance