Machine learning audio prediction experiments based on templates
Project description
Nkululeko
Overview
A project to detect speaker characteristics by machine learning experiments with a high-level interface.
The idea is to have a framework (based on e.g. sklearn and torch) that can be used by people not being experienced programmers as they mainly have to adapt an initialization parameter file per experiment.
- The latest features can be seen in the ini-file options](./ini_file.md) that are used to control Nkululeko
- Below is a Hello World example that should set you up fastly.
- Here's a blog post on how to set up nkululeko on your computer.
- Here's a slide presentation about nkululeko
- Here's a video presentation about nkululeko
- Here's the 2022 LREC article on nkululeko
Here are some examples of typical output:
Confusion matrix
Per default, Nkululeko displays results as a confusion matrix using binning with regression.
Epoch progression
The point when overfitting starts can sometimes be seen by looking at the results per epoch:
Feature importance
Using the explore interface, Nkululeko analyses the importance of acoustic features:
Feature distribution
And can show the distribution of specific features per category:
t-SNE plots
A t-SNE plot can give you an estimate wether your acoustic features are useful at all:
Data distribution
Sometimes you only want to take a look at your data:
Installation
Create and activate a virtual Python environment and simply run
pip install nkululeko
Some examples for ini-files (which you use to control nkululeko) are in the tests folder.
Usage
Basically, you specify your experiment in an "ini" file (e.g. experiment.ini) and then call one of the Nkululeko interfaces to run the experiment like this:
python -m nkululeko.nkululeko --config experiment.ini
A basic configuration looks like this:
[EXP]
root = ./
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = ./emodb/
emodb.split_strategy = speaker_split
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear']
[FEATS]
type = ['praat']
[MODEL]
type = svm
[EXPL]
model = tree
plot_tree = True
[PLOT]
combine_per_speaker = mode
Read the Hello World example for initial usage with Emo-DB dataset.
Here is an overview of the interfaces:
- nkululeko.nkululeko: doing experiments
- nkululeko.demo: demo the current best model on command line
- nkululeko.test: predict a series of files with the current best model
- nkululeko.explore: perform data exploration
- nkululeko.augment: augment the current training data
Alternatively, there is a central "experiment" class that can be used by own experiments
There's my blog with tutorials:
- Introduction
- Nkulueko FAQ
- How to set up your first nkululeko project
- Setting up a base nkululeko experiment
- How to import a database
- Comparing classifiers and features
- Use Praat features
- Combine feature sets
- Classifying continuous variables
- Try out / demo a trained model
- Perform cross database experiments
- Meta parameter optimization
- How to set up wav2vec embedding
- How to soft-label a database
- Re-generate the progressing confusion matrix animation wit a different framerate
- How to limit/filter a dataset
- Specifying database disk location
- Add dropout with MLP models
- Do cross-validation
- Combine predictions per speaker
- Run multiple experiments in one go
- Compare several MLP layer layouts with each other
- Import features from outside the software
- Explore feature importance
- Plot distributions for feature values
- Show feature importance
- Augment the training set
- Visualize clusters of acoustic features
- Visualize your data distribution
The framework is targeted at the speech domain and supports experiments where different classifiers are combined with different feature extractors.
Here's a rough UML-like sketch of the framework.
Currently, the following linear classifiers are implemented (integrated from sklearn):
- SVM, SVR, XGB, XGR, Tree, Tree_regressor, KNN, KNN_regressor, NaiveBayes, GMM and the following ANNs
- MLP, CNN (tbd)
Here's an animation that shows the progress of classification done with nkululeko
Initialization file
You could
- use a generic main python file (like my_experiment.py),
- adapt the path to your nkululeko src
- and then adapt an .ini file (again fitting at least the paths to src and data)
Here's an overview of the ini-file options
Hello World example
- NEW: I made a video to show you how to do this on Windows
- Set up Python on your computer, version >= 3.6
- Open a terminal/commandline/console window
- Test python by typing
python
, python should start with version >3 (NOT 2!). You can leave the Python Interpreter by typing exit() - Create a folder on your computer for this example, let's call it
nkulu_work
- Get a copy of the Berlin emodb in audformat and unpack the same folder (
nkulu_work
) - Make sure the folder is called "emodb" and does contain the database files directly (not box-in-a-box)
- Also, in the
nkulu_work
folder:- Create a Python environment
python -m venv venv
- Then, activate it:
- under Linux / mac
source venv/bin/activate
- under Windows
venv\Scripts\activate.bat
- if that worked, you should see a
(venv)
in front of your prompt
- under Linux / mac
- Install the required packages in your environment
pip install nkululeko
- Repeat until all error messages vanished (or fix them, or try to ignore them)...
- Create a Python environment
- Now you should have two folders in your nkulu_work folder:
- emodb and venv
- Download a copy of the file exp_emodb.ini to the current working directory (
nkulu_work
) - Run the demo
python -m nkululeko.nkululeko --config exp_emodb.ini
- Find the results in the newly created folder exp_emodb
- Inspect
exp_emodb/images/run_0/emodb_xgb_os_0_000_cnf.png
- This is the main result of you experiment: a confusion matrix for the emodb emotional categories
- Inspect
- Inspect and play around with the demo configuration file that defined your experiment, then re-run.
- There are many ways to experiment with different classifiers and acoustic features sets, all described here
Features
- Classifiers: Naive Bayes, KNN, Tree, XGBoost, SVM, MLP
- Feature extractors: Praat, Opensmile, openXBOW BoAW, TRILL embeddings, Wav2vec2 embeddings, audModel embeddings, ...
- Feature scaling
- Label encoding
- Binning (continuous to categorical)
- Online demo interface for trained models
Outlook
- Classifiers: CNN
- Feature extractors: mid-level descriptors, Mel-spectra
License
Nkululeko can be used under the MIT license
Changelog
Version 0.44.1
- bugfixing: feature importance: https://github.com/felixbur/nkululeko/issues/23
- bugfixing: loading csv database with filewise index https://github.com/felixbur/nkululeko/issues/24
Version 0.45.2
- bugfix: sample_selection in EXPL was required wrongly
Version 0.45.2
- added sample_selection for sample distribution plots
Version 0.45.1
- fixed dataframe.append bug
Version 0.45.0
- added auddim as features
- added FEATS store_format
- added device use to feat_audmodel
Version 0.44.1
- bugfixes
Version 0.44.0
- added scatter functions: tsne, pca, umap
Version 0.43.7
- added clap features
Version 0.43.6
- small bugs
Version 0.43.5
- because of difficulties with numba and audiomentations importing audiomentations only when augmenting
Version 0.43.4
- added error when experiment type and predictor don't match
Version 0.43.3
- fixed further bugs and added augmentation to the test runs
Version 0.43.2
- fixed a bug when running continuous variable as classification problem
Version 0.43.1
- fixed test_runs
Version 0.43.0
- added augmentation module based on audiomentation
Version 0.42.0
- age labels should now be detected in databases
Version 0.41.0
- added feature tree plot
Version 0.40.1
- fixed a bug: additional test database was not label encoded
Version 0.40.0
- added EXPL section and first functionality
- added test module (for test databases)
Version 0.39.0
- added feature distribution plots
- added plot format
Version 0.38.3
- added demo mode with list argument
Version 0.38.2
- fixed a bug concerned with "no_reuse" evaluation
Version 0.38.1
- demo mode with file argument
Version 0.38.0
- fixed demo mode
Version 0.37.2
- mainly replaced pd.append with pd.concat
Version 0.37.1
- fixed bug preventing praat feature extraction to work
Version 0.37.0
- fixed bug cvs import not detecting multiindex
Version 0.36.3
- published as a pypi module
Version 0.36.0
- added entry nkululeko.py script
Version 0.35.0
- fixed bug that prevented scaling (normalization)
Version 0.34.2
- smaller bug fixed concerning the loss_string
Version 0.34.1
- smaller bug fixes and tried Soft_f1 loss
Version 0.34.0
- smaller bug fixes and debug ouputs
Version 0.33.0
- added GMM as a model type
Version 0.32.0
- added audmodel embeddings as features
Version 0.31.0
- added models: tree and tree_reg
Version 0.30.0
- added models: bayes, knn and knn_reg
Version 0.29.2
- fixed hello world example
Version 0.29.1
- bug fix for 0.29
Version 0.29.0
- added a new FeatureExtractor class to import external data
Version 0.28.2
- removed some Pandas warnings
- added no_reuse function to database.load()
Version 0.28.1
- with database.value_counts show only the data that is actually used
Version 0.28.0
- made "label_data" configuration automatic and added "label_result"
Version 0.27.0
- added "label_data" configuration to label data with trained model (so now there can be train, dev and test set)
Version 0.26.1
- Fixed some bugs caused by the multitude of feature sets
- Added possibilty to distinguish between absolut or relative pathes in csv datasets
Version 0.26.0
- added the rename_speakers funcionality to prevent identical speaker names in datasets
Version 0.25.1
- fixed bug that no features were chosen if not selected
Version 0.25.0
- made selectable features universal for feature sets
Version 0.24.0
- added multiple feature sets (will simply be concatenated)
Version 0.23.0
- added selectable features for Praat interface
Version 0.22.0
- added David R. Feinberg's Praat features, praise also to parselmouth
Version 0.21.0
- Revoked 0.20.0
- Added support for only_test = True, to enable later testing of trained models with new test data
Version 0.20.0
- implemented reuse of trained and saved models
Version 0.19.0
- added "max_duration_of_sample" for datasets
Version 0.18.6
- added support for learning and dropout rate as argument
Version 0.18.5
- added support for epoch number as argument
Version 0.18.4
- added support for ANN layers as arguments
Version 0.18.3
- added reuse of test and train file sets
- added parameter to scale continous target values: target_divide_by
Version 0.18.2
- added preference of local dataset specs to global ones
Version 0.18.1
- added regression value display for confusion matrices
Version 0.18.0
- added leave one speaker group out
Version 0.17.2
- fixed scaler, added robust
Version 0.17.0
- Added minimum duration for test samples
Version 0.16.4
- Added possibility to combine predictions per speaker (with mean or mode function)
Version 0.16.3
- Added minimal sample length for databases
Version 0.16.2
- Added k-fold-cross-validation for linear classifiers
Version 0.16.1
- Added leave-one-speaker-out for linear classifiers
Version 0.16.0
- Added random sample splits
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for nkululeko-0.45.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56760815c518d62e112324cd5f128213fdb2e6adb6564b5329bf4d4454d2b057 |
|
MD5 | 2216cd3a5fd309ddad10a863567a39e8 |
|
BLAKE2b-256 | 3d4c5b55a6b110371defc4e1f6aed3ad1794da668bb683c583d77153df0a20c5 |