Core functions for bioacoustic recognizers.

These details have not been verified by PyPI

Project links

Project description

BriteKit

Getting Started

Introduction
License
Installation
Configuration
Downloading Recordings
Managing Training Data
Training
Testing
Tuning
Ensembling
Calibrating

More Information

Spectrograms
Backbones and Classifier Heads
Metrics (PR-AUC and ROC-AUC)
Data Augmentation
Development Environment

Reference Guides

Getting Started

Introduction

BriteKit (Bioacoustic Recognizer Technology Kit) is a Python package that facilitates the development of bioacoustic recognizers using deep learning. It provides a command-line interface (CLI) as well as a Python API, to support functions such as:

downloading recordings from Xeno-Canto, iNaturalist, and YouTube (optionally using Google Audioset metadata)
managing training data in a SQLite database
training models
testing, tuning and calibrating models
reporting
deployment and inference

To view a list of BriteKit commands, type britekit --help. You can also get help for individual commands, e.g. britekit train --help describes the train command. When accessing BriteKit from Python, the britekit.commands namespace contains a function for each command, as documented here. The classes used by the commands can also be accessed, and are documented here.

License

BriteKit is distributed under the terms of the MIT license.

Installation

It is best to install BriteKit in a virtual environment, such as a Python venv. Once you have that set up, install the BriteKit package using pip:

pip install britekit

In Windows environments, you then need to uninstall and reinstall PyTorch:

pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Note that cu126 refers to CUDA 12.6.
Once BriteKit is installed, initialize a working environment using the init command:

britekit init --dest=<directory path>

This creates the directories needed and installs sample files. If you omit --dest, it will create directories under the current working directory.

Configuration

Configuration parameters are documented here. After running britekit init, the file yaml/base_config.yaml contains all parameters in YAML format. Most CLI commands have a --config argument that allows you to specify the path to a YAML file that overrides selected parameters. For example, when running the train command, you could provide a YAML file containing the following:

train:
  model_type: "effnet.4"
  learning_rate: .002
  drop_rate: 0.1
  num_epochs: 20

This overrides the default values for model_type, learning_rate, drop_rate and num_epochs. When using the API, you can update configuration parameters like this:

import britekit as bk
cfg = bk.get_config()
cfg.train.model_type = "effnet.4"

Downloading Recordings

The inat, xeno and youtube commands make it easy to download recordings from Xeno-Canto, iNaturalist and YouTube. For iNaturalist it is important to provide the scientific name. For example, to download recordings of the American Green Frog (lithobates clamitans), type:

britekit inat --name "lithobates clamitans" --output <output-path>

For Xeno-Canto, use --name for the common name or --sci for the scientific name. For YouTube, specify the ID of the corresponding video. For example, specify --id K_EsxukdNXM to download the audio from https://www.youtube.com/watch?v=K_EsxukdNXM.

BriteKit also supports downloads using Google Audioset, which is metadata that classifies sounds in YouTube videos. Audioset was released in March 2017, so any videos uploaded later than that are not included. Also, some videos that are tagged in Audioset are no longer available. Type britekit audioset --help for more information.

Managing Training Data

Once you have a collection of recordings, the steps to prepare it for training are:

Extract spectrograms from recordings and insert them into the training database.
Curate the training spectrograms.
Create a pickle file from the training data. Then provide the path to the pickle file when running training.

Suppose we have a folder called recordings/cow. To generate spectrograms and insert them into the training database, we could type britekit extract-all --name Cow --dir recordings/cow. This will create a SQLite database in data/training.db and populate it with spectrograms using the default configuration. To browse the database, you can use DB Browser for SQLite, or a similar application. That will reveal the following tables:

Class: classes that the recognizer will be trained to identify, e.g. American Robin
Category: categories such as Bird, Mammal or Amphibian
Source: sources of recordings, e.g. Xeno-Canto or iNaturalist.
Recording: individual recordings
Segment: fixed-length sections of recordings
SpecGroup: groups of spectrograms that share spectrogram parameters
SpecValue: spectrograms, each referencing a Segment and SpecGroup
SegmentClass: associations between Segment and Class, to identify the classes that occur in a segment

There are commands to add or delete database records, e.g. add-cat and del-cat to add or delete a category record. In addition, specifying the --cat argument with the extract-all or extract-by-image commands will add the required category record if it does not exist. You can plot database spectrograms using plot-db, or plot spectrograms for recordings using plot-rec or plot-dir. Once you have a folder of spectrogram images, you can manually delete or copy some of them. The extract-by-image command will then extract only the spectrograms corresponding to the given images. Similarly, the del-spec command will delete spectrograms corresponding to the images in a directory.

It is important to tune spectrogram parameters such as height, width, maximum/minimum frequency and window length for your specific application. This is discussed more in the tuning section below, but for now be aware that you can set specific parameters in a YAML file to pass to an extract or plot command. For example:

audio:
  min_freq: 350
  max_freq: 4000
  win_length: .08
  spec_height: 192
  spec_width: 256

Note that the window length is specified as a fraction of a second, so .08 seconds in this example. That way the real window length does not vary if you change the sampling rate. As a rule of thumb, the sampling rate should be about 2.1 times the maximum frequency. Before training your first model, it is advisable to examine some spectrogram images and choose settings that seem reasonable as a starting point. For example, the frequency range needed for your application may be greater or less than the defaults.

The SpecGroup table allows you to easily experiment with different spectrogram settings. Running extract-all or extract-by-image creates spectrograms assigned to the default SpecGroup, if none is specified. Once you have curated some training data, use the reextract command to create another set of spectrograms, assigned to a different SpecGroup. That way you can keep spectrograms with different settings for easy experimentation.

Training

The pickle command creates a binary pickle file (data/training.pkl by default), which is the source of training data for the train command. Reading a binary file is much faster than querying the database, so this speeds up the training process. Also, this provides a simple way to select a SpecGroup, and/or a subset of classes for training. For training, you should always provide a config file to override some defaults. Here is an expanded version of the earlier example:

train:
  train_pickle: "data/low_freq.pkl"
  model_type: "effnet.4"
  head_type: "basic_sed"
  learning_rate: .002
  drop_rate: 0.1
  drop_path_rate: 0.1
  val_portion: 0.1
  num_epochs: 20

The model_type parameter can be "timm.x" for any model x supported by timm. However, many bioacoustic recognizers benefit from a smaller model than typical timm models. Therefore BriteKit provides a set of scalable models, such as "effnet.3" and "effnet.4", where larger numbers indicate larger models. The scalable models are:

Model	Original Name	Comments	Original Paper
dla	DLA	Slow and not good for large models, but often a good choice for very small models.	here
effnet	EfficientNetV2	Medium speed, widely used, useful for all sizes.	here
gernet	GerNet	Fast, useful for all but the smallest models.	here
hgnet	HgNetV2	Fast, useful for all but the smallest models.	not published
vovnet	VovNet	Medium-fast, useful for all sizes.	here

For very small models, say with less than 10 classes and just a few thousand training spectrograms, DLA and VovNet are good candidates. As model size increases, DLA becomes slower and less appropriate.

If head_type is not specified, BriteKit uses the default classifier head defined by the model. However, you can also specify any of the following head types:

Head Type	Description
basic	A basic non-SED classifier head.
effnet	The classifier head used in EfficientNetV2.
hgnet	The classifier head used in HgNetV2.
basic_sed	A basic SED head.
scalable_sed	The basic_sed head can be larger than desired.

Specifying head_type="effnet" is sometimes helpful for other models such as DLA and VovNet. See the discussion of Backbones and Classifier Heads below for more information.

You can specify val_portion > 0 to run validation on a portion of the training data, or num_folds > 1 to run k-fold cross-validation. In the latter case, training output will be in logs/fold-0, logs/fold-1 etc. Otherwise output is under logs/fold-0. Output from the first training run is saved in logs/fold-0/version_0, and the version number is incremented in subsequent runs. To view graphs of the loss and learning rate, type tensorboard --logdir <log directory>. This will launch an embedded web server and display a URL that you can use to access it from a web browser.

Testing

To run a test, you need to annotate a set of test recordings, analyze them with your model or ensemble, and then run the rpt-test command. Annotations must be saved in a CSV file with a defined format. We recommend annotating each relevant sound (per-segment), but you can also do per-minute and per-recording annotations to save time. Per-recording annotations are defined in a CSV file with these columns:

Column	Description
recording	Just the stem of the recording name, e.g. XC12345, not XC12345.mp3.
classes	Defined classes found in the recording, separated by commas. For example: AMCR,BCCH,COYE.

Per-minute annotations are defined in a CSV file with these columns:

Column	Description
recording	Just the stem of the recording name, as above.
minute	1 for the first minute, 2 for the second, etc.
classes	Defined classes found in that minute, separated by commas.

Per-segment annotations are recommended, and are defined in a CSV file with these columns:

Column	Description
recording	Just the stem of the recording name, as above.
class	Identified class.
start_time	Where the sound starts, in seconds from the start of the recording.
end_time	Where the sound ends, in seconds from the start of the recording.

Use the analyze command to analyze the recordings with your model or ensemble. For testing, be sure to specify --min_score 0. That way all predictions will be saved, not just those above a particular threshold, which is important when calculating metrics. See Metrics (PR-AUC and ROC-AUC) for more information.

It's usually best for a test to consist of a single directory of recordings, containing a file called annotations.csv. If that directory is called recordings and you run analyze specifying --output recordings/labels, you could generate test reports as follows:

britekit rpt-test -a recordings/annotations.csv -l labels -o <output-dir>

If your annotations were per-minute or per-recording, you would specify the --granularity minute or --granularity recording argument (--granularity segment is the default).

Tuning

Before tuning your model, you need to create a good test, as described in the previous section. Then you can use the tune command to find optimal settings for a given test. If you are only tuning inference parameters, you can run many iterations very quickly, since no training is needed. To tune training hyperparameters, many training runs are needed, which takes longer. You can also use the tune command to tune audio/spectrogram settings. In that case, every iteration extracts a new set of spectrograms, which takes even longer.

Here is a practical approach:

Review spectrogram plots with different settings, especially spec_duration, spec_width, spec_height, min_frequency, max_frequency and win_length. Then choose reasonable-looking initial settings. For example, if all the relevant sounds fall between 1000 and 5000 Hz, set min and max frequency accordingly.
Do an initial tuning pass of the main training hyperparameters, especially model_type, head_type and num_epochs.
Based on the above, carefully tune the audio/spectrogram parameters.

This usually leads to a substantial improvement in scores (see Metrics (PR-AUC and ROC-AUC), and then you can proceed to fine-tuning the training and inference. For inference, it is usually worth tuning the audio_power parameter. If you are using a SED classifier head, it is also worth tuning segment_len and overlap. For training, it may be worth tuning the data augmentation hyperparameters, which are described in detail in the Data Augmentation section below.

To run the tune command, you would typically use a config YAML file as described earlier, plus a special tuning YAML file, as in this example:

- name: spec_width
  type: int
  bounds:
  - 256
  - 512
  step: 64

This gives the name of the parameter to tune, its data type, and the bounds and step sizes to try. In this case, we want to try spec_width values of 256, 320, 384, 448 and 512. You can also tune multiple parameters at the same time, by simply appending more definitions similar to this one. Parameters that have a choice of defined values rather than a range are specified like this:

- name: head_type
  type: categorical
  choices:
  - "effnet"
  - "hgnet"
  - "basic_sed"

When running the tune command, you can ask it to test all defined combinations based on the input, or to test a random sample. To try 100 random combinations, add the argument --tries 100. To tune audio/spectrogram parameters, add the --extract argument. To tune inference only, add the --notrain argument.

Training is non-deterministic, and results for a given group of settings can vary substantially across multiple training runs. Therefore it is important to specify the --runs argument, indicating how often training should be run for a given set of values.

As an example, to find the best spec_width value, we could type a command like this:

britekit tune -c yaml/my_train.yml -p yaml/my_tune.yml -a my_test/annotations.csv -o output/tune-spec-width --runs 5 --extract

This will perform an extract before each trial, and use the average score from 5 training runs in each case. Scores will be based on the given test, using macro-averaged ROC-AUC, although this can be changed with the --metric argument.

Ensembling

TBD

Calibrating

TBD

More Information

Spectrograms

TBD

Backbones and Classifier Heads

TBD

Metrics (PR-AUC and ROC-AUC)

TBD

Data Augmentation

TBD

Development Environment

TBD

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.1

Apr 20, 2026

1.3.0

Apr 11, 2026

1.2.0

Apr 5, 2026

1.1.1

Mar 28, 2026

1.1.0

Mar 27, 2026

1.0.8

Mar 9, 2026

1.0.7

Mar 8, 2026

1.0.6

Mar 6, 2026

1.0.5

Mar 2, 2026

1.0.4

Mar 2, 2026

1.0.3

Feb 15, 2026

1.0.2

Feb 11, 2026

1.0.1

Feb 6, 2026

1.0.0

Feb 6, 2026

0.14.3

Feb 3, 2026

0.14.2

Feb 3, 2026

0.14.1

Feb 3, 2026

0.14.0

Feb 2, 2026

0.13.1

Jan 31, 2026

0.13.0

Jan 31, 2026

0.12.6

Jan 29, 2026

0.12.5

Jan 27, 2026

0.12.4

Jan 26, 2026

0.12.3

Jan 25, 2026

0.12.2

Jan 25, 2026

0.12.1

Jan 25, 2026

0.12.0

Jan 24, 2026

0.11.6

Jan 19, 2026

0.11.5

Jan 19, 2026

0.11.4

Jan 18, 2026

0.11.3

Jan 17, 2026

0.11.2

Jan 17, 2026

0.11.1

Jan 17, 2026

0.11.0

Jan 17, 2026

0.10.6

Jan 16, 2026

0.10.5

Jan 14, 2026

0.10.4

Jan 12, 2026

0.10.3

Jan 11, 2026

0.10.2

Jan 11, 2026

0.10.1

Jan 11, 2026

0.10.0

Jan 11, 2026

0.9.8

Jan 11, 2026

0.9.7

Jan 11, 2026

0.9.6

Jan 11, 2026

0.9.5

Jan 8, 2026

0.9.4

Jan 7, 2026

0.9.3

Jan 7, 2026

0.9.2

Jan 7, 2026

0.9.1

Jan 7, 2026

0.9.0

Jan 7, 2026

0.8.5

Jan 6, 2026

0.8.4

Jan 6, 2026

0.8.3

Jan 5, 2026

0.8.2

Jan 5, 2026

0.8.1

Jan 4, 2026

0.8.0

Jan 4, 2026

0.7.6

Jan 1, 2026

0.7.5

Dec 31, 2025

0.7.4

Dec 29, 2025

0.7.3

Dec 29, 2025

0.7.2

Dec 29, 2025

0.7.1

Dec 29, 2025

0.7.0

Dec 25, 2025

0.6.4

Dec 23, 2025

0.6.3

Dec 23, 2025

0.6.2

Dec 23, 2025

0.6.1

Dec 22, 2025

0.6.0

Dec 22, 2025

0.5.6

Dec 21, 2025

0.5.5

Dec 21, 2025

0.5.4

Dec 21, 2025

0.5.3

Dec 20, 2025

0.5.2

Dec 20, 2025

0.5.1

Dec 19, 2025

0.5.0

Dec 18, 2025

0.4.0

Dec 16, 2025

0.3.0

Dec 5, 2025

0.2.0

Nov 14, 2025

0.1.6

Nov 8, 2025

0.1.5

Nov 4, 2025

0.1.4

Oct 31, 2025

0.1.3

Oct 26, 2025

0.1.2

Oct 22, 2025

0.1.1

Oct 21, 2025

0.1.0

Oct 21, 2025

This version

0.0.12

Oct 19, 2025

0.0.11

Oct 13, 2025

0.0.10

Oct 13, 2025

0.0.9

Oct 12, 2025

0.0.8

Oct 11, 2025

0.0.7

Oct 7, 2025

0.0.6

Oct 6, 2025

0.0.5

Oct 4, 2025

0.0.4

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

britekit-0.0.12.tar.gz (50.6 MB view details)

Uploaded Oct 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

britekit-0.0.12-py3-none-any.whl (26.3 MB view details)

Uploaded Oct 19, 2025 Python 3

File details

Details for the file britekit-0.0.12.tar.gz.

File metadata

Download URL: britekit-0.0.12.tar.gz
Upload date: Oct 19, 2025
Size: 50.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for britekit-0.0.12.tar.gz
Algorithm	Hash digest
SHA256	`60ff0a906649954efef9804ed5c7eb51dd1102436f0ce5e9dfe72974c77b4636`
MD5	`fad2d837c1d68d7be3ea97e7c7df0b34`
BLAKE2b-256	`2d40f69c635f1ed7d18982aad191249326941c49b894e2e7d6a37eb1d0736acc`

See more details on using hashes here.

File details

Details for the file britekit-0.0.12-py3-none-any.whl.

File metadata

Download URL: britekit-0.0.12-py3-none-any.whl
Upload date: Oct 19, 2025
Size: 26.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for britekit-0.0.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5533399ddb685db8854e9f18e74c82f838b0fe1230beb244379171dc2cc32996`
MD5	`59ba76012858740dd2a1df4029506fde`
BLAKE2b-256	`bd69a86d05fd60098bd65e2ac97f17d7e48b54344560e5b00e01ce091da0b033`

See more details on using hashes here.

britekit 0.0.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BriteKit

Getting Started

More Information

Reference Guides

Getting Started

Introduction

License

Installation

Configuration

Downloading Recordings

Managing Training Data

Training

Testing

Tuning

Ensembling

Calibrating

More Information

Spectrograms

Backbones and Classifier Heads

Metrics (PR-AUC and ROC-AUC)

Data Augmentation

Development Environment

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes