Skip to main content

Add your description here

Project description

pydataknot

Audio feature selection, model training, and hyperparameter optimization for Max/MSP DataKnot classifiers.

Installation

Requires Python >= 3.9

Recommend creating a virtual python environment:

python3 -m venv venv
source venv/bin/activate

Install via pip:

pip install --upgrade pip
pip install pydataknot

If you're performing hyperparameter optimization and want to use the Optuna Dashboard to inspect results:

pip install "pydataknot[dashboard]"

Usage

pydataknot includes a set of CLI tools that take as input json files produced from dk.classcreate in Max/MSP. To start, record a dataset using dk.classcreate and write to a json file.

Feature Selection

Select a subset of audio features for classification based on Minimum Redundancy Maximum Relevance (mRMR). The goal is to select features that correlate highly with classes (maximum relevancy) and have low correlation with other features (minimum redundancy).

This command will select 12 features from the dataset in dataset.json:

pydk-select data=dataset.json num_features=12

This will copy the dataset.json file, add the selected feature indices, and save this copy in an output directory outputs/feature_selection/{current-date}/{current-time}/.

Model Training

Train MLP classifiers using PyTorch.

This command will train a MLP on the dataset in dataset.json, the MLP will have two hidden layers each with 16 neurons (note the quotes around the layers list) and will be trained for 500 epochs:

pydk-train data=dataset.json mlp.hidden_layers="[16,16]" mlp.max_iters=500

Note: If you use the json that output from the feature selection step, training will use that subset of selected features.

Similar to feature selection, this will copy the input json file and save a copy in an output directory outputs/feature_selection/{current-date}/{current-time}/ with the trained model. dk.classmatch can load this json file and will automatically know to use the model trained here.

Run pydk-train --help to see a list of all possible arguments or check out flucoma-torch for more detailed information on arguments.

Hyperparameter Search

Perform a search over MLP parameters to find an architecture and training parameters optimized to your dataset. This uses optuna under the hood.

This command will perform a hyperparameter search consisting of 1000 trials (1000 different MLPs will be trained) where each MLP will be trained for 100 epochs.

pydk-optimize data=dataset.json mlp.max_iter=100 n_trials=1000

This could take several hours to complete.

The best resulting model and associated hyperparameters will be saved in the output directory outputs/feature_selection/{current-date}/{current-time}/. dk.classmatch can load the model file and will automatically know to use the model trained here.

See flucoma-torch for a more detailed listing of possible arguments.

Feature Optimization

The number of input features can also be included for optimization by setting optimize_features=true. This will perform mRMR feature selection with different values for num_features to test which leads to the best performing model.

pydk-optimize data=dataset.json mlp.max_iter=100 n_trials=1000 optimize_features=true

Optuna Dashboard

To view all results in a web dashboard you can use the optuna dashboard. This is an optional dependency, to install it:

pip install "pydataknot[dashboard]"

During an optimizing study the results are stored in a database file (sqlite3) in the output directory corresponding to the study. Whether or not a sqlite3 file is saved depends on the argument sqlite, which defaults to true.

For example:

optuna-dashboard sqlite:///outputs/optimize_classifier/2025-09-25/19-58-57/classifier_study.sqlite3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydataknot-0.0.2.tar.gz (336.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydataknot-0.0.2-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file pydataknot-0.0.2.tar.gz.

File metadata

  • Download URL: pydataknot-0.0.2.tar.gz
  • Upload date:
  • Size: 336.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.20

File hashes

Hashes for pydataknot-0.0.2.tar.gz
Algorithm Hash digest
SHA256 77263418e16b240741610b9cb25273d0673f3ad64a5998e1c124b8549f95f519
MD5 65fe063a36e63d06acaa724cc3ea44e1
BLAKE2b-256 da4df5986094039db792dbb6c0dc328e5dd9450016c89a3cb3f0bb897aa2690e

See more details on using hashes here.

File details

Details for the file pydataknot-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pydataknot-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.20

File hashes

Hashes for pydataknot-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7c8754c6c85ba08c62fbe82ee16f0421e752c15c76faf250be7673550bb47bb6
MD5 6778f59ba3b5543ecdb72e4c5b1ddd9b
BLAKE2b-256 25d652dc9bc0a234dbc79d440d2762d1d5e59d50176bedbb28e3bbd991d8b6a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page