Skip to main content

Audio Alignment and Recognition in Python

Project description

Audalign

Package for processing and aligning audio files using audio fingerprinting, cross-correlation, cross-correlation with spectrograms, or visual alignment techniques.

gif of audalign aligning

This package offers tools to align many recordings of the same event. It has two main purposes: to accurately align recordings, and to process the audio files prior to alignments. All main functions are accessed through functions in the audalign.__init__ file. The recognizers themselves are objects in the recognizer directory which in turn have configurations in the config directories.

Alignments are primarily accomplished with fingerprinting, though where fingerprinting fails, correlation, correlation with spectrograms, and visual alignment techniques can be used to get a closer result. After an initial alignment is found, that alignment can be passed to "fine_align," which will find smaller, relative alignments to the main one.


Each alignment technique has different degrees of adjustment for accuracy settings. Fingerprinting parameters can be generally set to get consistent results using it's config's set_accuracy method. Visual alignment has many parameters that can be adjusted and requires case by case adjustment. Parameters for correlation are focused on sample rate or scipy's find_peaks.

Noisereduce is very useful for this application and a wrapper is implemented for ease of use. Uniformly leveling prior to noise reduction using uniform_level_file boosts quiet but important sound features.

Alignment and recognition results consist of a dictionary. If an output directory is given, silence is placed before all target files so that they will automatically be aligned and writen to the output directory along with an audio file containing the combined sum. A rankings key is included in each alignment and recognition result. This helps determine the strength of the alignment, but is not definitive proof. Values range from 1-10.


All formats that ffmpeg or libav support are supported here.

All fingerprints are stored in memory in the FingerprintRecognzier and must be saved to disk with the save_fingerprinted_files method in order to persist them.

Regular file recogniton can also be done with Audalign similar to dejavu.

For more details on implementation and results, see the wiki!!

Installation

Install from PyPI:

Don't forget to install ffmpeg/avlib (Below in the Readme)!

pip install audalign

OR

git clone https://github.com/benfmiller/audalign.git
cd audalign/
pip install audalign

OR

Download and extract audalign then

pip install audalign

in the directory

Optional dependencies

  • visrecognize: additional recognizer based on spectrogram image comparison. pip install audalign[visrecognize]
  • noisereduce: wrapper utils around timsainb/noisereduce. pip install audalign[noisereduce]

Recognizers

There are currently four included recognizers, each with their own config objects.

import audalign as ad

fingerprint_rec = ad.FingerprintRecognizer()
correlation_rec = ad.CorrelationRecognizer()
cor_spec_rec = ad.CorrelationSpectrogramRecognizer()
visual_rec = ad.VisualRecognizer() # requires installting optional visrecognize dependencies

fingerprint_rec.config.set_accuracy(3)
# recognizer.config.some_item

For more info about the configuration objects, check out the wiki or the config objects themselves. They are relatively nicely commented.

Recognizers are then passed to recognize and align functions.

results = ad.align("target/folder/", recognizer=fingerprint_rec)
results = ad.align("target/folder/", recognizer=correlation_rec)
results = ad.align("target/folder/", recognizer=cor_spec_rec)
results = ad.align("target/folder/", recognizer=visual_rec)
results = ad.recognize("target/file1", "target/file2", recognizer=fingerprint_rec)
results = ad.recognize("target/file1", "target/folder", recognizer=fingerprint_rec)
# or
results = ad.target_align(
    "target/files",
    "target/folder/",
    destination_path="write/alignments/to/folder",
    recognizer=fingerprint_rec
)
# or
results = ad.align_files(
    "target/file1",
    "target/file2",
    destination_path="write/alignments/to/folder",
    recognizer=correlation_rec
)

# results can then be sent to fine_align
fine_results = ad.fine_align(
    results,
    recognizer=cor_spec_rec,
)

Correlation is more precise than fingerprints and will always give a best alignment unlike fingerprinting, which can return no alignment. max_lags is very important for fine aligning. locality can be very useful for all alignments and recognitions.

Other Functions

# wrapper for timsainb/noisereduce, optional dependency
ad.remove_noise_file(
    "target/file",
    "5", # noise start in seconds
    "20", # noise end in seconds
    "destination/file",
    alt_noise_filepath="different/sound/file",
    prop_decrease="0.5", # If you want noise half reduced
)

ad.remove_noise_directory(
    "target/directory/",
    "noise/file",
    "5", # noise start in seconds
    "20", # noise end in seconds
    "destination/directory",
    prop_decrease="0.5", # If you want noise half reduced
)

ad.uniform_level_file(
    "target/file",
    "destination",
    mode="normalize",
    width=5,
)

ad.plot("file.wav") # Plots spectrogram with peaks overlaid
ad.convert_audio_file("audio.wav", "audio.mp3") # Also convert video file to audio file
ad.get_metadata("file.wav") # Returns metadata from ffmpeg/ avlib

You can easily recalcute the alignment shifts from previous results using recalc_shifts. You can then write those shifts using write_shifts_from_results. write_shifts_from_results also lets you use different source files for alignments too.

recalculated_results = ad.recalc_shifts(older_results)
ad.write_shifts_from_results(recalculated_results, "destination", "source_files_folder_or_file_list")

Fingerprinting

Fingerprinting is only used in the FingerprintRecognizer object. Alignments are not independent, so fingerprints created before alignments will be used for the alignment. The exception of this is in fine_aligning, where new fingerprints are always created.

Running recognitions will fingerprint all files in the recognitions not already fingerprinted.

fingerprint_rec = ad.FingerprintRecognizer()

fingerprint_rec.fingerprint_file("test_file.wav")

# or

fingerprint_rec.fingerprint_directory("audio/directory")

fingerprints are stored in fingerprint_rec and can be saved by

fingerprint_rec.save_fingerprinted_files("save_file.json") # or .pickle
# or loaded with
fingerprint_rec.load_fingerprinted_files("save_file.json") # or .pickle

Resources and Tools

For more tools to align audio and video files, see forart/HyMPS's collection of alignment resources.

forart/HyMPS also has many other audio/video resources.

Getting ffmpeg set up

You can use ffmpeg or libav.

Mac (using homebrew):

# ffmpeg
brew install ffmpeg --with-libvorbis --with-sdl2 --with-theora

####    OR    #####

# libav
brew install libav --with-libvorbis --with-sdl --with-theora

Linux (using apt):

# ffmpeg
apt-get install ffmpeg libavcodec-extra

####    OR    #####

# libav
apt-get install libav-tools libavcodec-extra

Windows:

  1. Download and extract ffmpeg from Windows binaries provided here.
  2. Add the ffmpeg /bin folder to your PATH environment variable

OR

  1. Download and extract libav from Windows binaries provided here.
  2. Add the libav /bin folder to your PATH environment variable

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audalign-1.3.1.tar.gz (43.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audalign-1.3.1-py3-none-any.whl (56.7 kB view details)

Uploaded Python 3

File details

Details for the file audalign-1.3.1.tar.gz.

File metadata

  • Download URL: audalign-1.3.1.tar.gz
  • Upload date:
  • Size: 43.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for audalign-1.3.1.tar.gz
Algorithm Hash digest
SHA256 17e4b0076e66bc286a38f9dec4d91416c1c035077667a0343c6dd646eec42d1b
MD5 a3011247ae955cfdc9c6679ec6ce61c9
BLAKE2b-256 cbb31f917ac2d202090861f1c3c952814cc9ddbed58ea1d35fabd7d14d17fac4

See more details on using hashes here.

File details

Details for the file audalign-1.3.1-py3-none-any.whl.

File metadata

  • Download URL: audalign-1.3.1-py3-none-any.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for audalign-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4d78f71a026d30c7462521084a56c7b1f50103e955b04e2ce7015fd8a107c477
MD5 514911a808e87d88e6fe65ea0a875df6
BLAKE2b-256 2524af008cf18794f411aab9593f67e43a9a6202a4f2746e79310242abdc4089

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page