Skip to main content

Audio module for the OpenMMLA platform

Project description

🎙️ OpenMMLA Audio

PyPI version

Audio module of the mBox - an open multimodal learning analytic platform. For more details, please refer to mBox System Design.

Table of Contents

Related Modules

Installation

Uber Server Setup

Before setting up the audio base, you need to set up a server hosting the InfluxDB, Redis, and Mosquitto services. Please refer to mbox-uber module.

Audio Base & Server Setup

  1. Clone the repository

    git clone https://github.com/ucph-ccs/mbox-audio.git
    
  2. Install required system dependencies

    Mac
    brew install ffmpeg portaudio mecab llvm
    echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc
    echo 'export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"' >> ~/.zshrc
    echo 'export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"' >> ~/.zshrc
    source ~/.zshrc
    
    Ubuntu 24.04
    sudo apt update && sudo apt upgrade
    sudo apt install build-essential git ffmpeg python3-pyaudio libsndfile1 libasound-dev
    wget https://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz
    tar -zxvf pa_stable_v190700_20210406.tgz
    cd portaudio
    ./configure && make
    sudo make install
    
    Raspberry Pi Bullseye or later
    sudo apt-get install portaudio19-dev
    
  3. Install openmmla-audio

    Set up Conda environment
    # For Raspberry Pi
    wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
    bash Miniforge3-$(uname)-$(uname -m).sh
    
    # For Mac and Linux
    wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh"
    bash Miniconda3-latest-$(uname)-$(uname -m).sh
    
    Install Audio Base
    conda create -c conda-forge -n audio-base python==3.10.12 -y
    conda activate audio-base
    pip install openmmla-audio[base]  # for Linux and Raspberry Pi
    pip install 'openmmla-audio[base]'  # for Mac
    
    Install Audio Server
    conda create -c conda-forge -n audio-server python==3.10.12 -y
    conda activate audio-server
    pip install openmmla-audio[server]  # for Linux and Raspberry Pi
    pip install 'openmmla-audio[server]'  # for Mac
    
  4. Set up folder structure

    cd mbox-audio
    ./reset.sh
    

Standalone Setup

If you want to run the entire mBox Audio system on a single machine (not in distributed mode), follow these steps:

  1. Set up the Uber Server on your machine following the instructions in the mbox-uber module.

  2. Install system dependencies as described in the "Audio Base & Server Setup" section above.

  3. Install openmmla-audio with all dependencies:

    conda create -c conda-forge -n mbox-audio python==3.10.12 -y
    conda activate mbox-audio
    pip install openmmla-audio[all]  # for Linux and Raspberry Pi
    pip install 'openmmla-audio[all]'  # for Mac
    
  4. Set up the folder structure:

    cd mbox-audio
    ./reset.sh
    

This setup will allow you to run all components of mBox Audio on a single machine.

Usage

Real-time Audio Analyzer

Real-time Analyzer Pipeline

To run the real-time audio analyzer:

  1. Start Audio Server (optional)

    ./server.sh
    

    This script runs distributed audio services on audio servers. To configure your audio server cluster: please refer to the nginx setup running on your uber server.Default setup: Three servers (server-01.local, server-02.local, server-03.local) with five services (transcribe, separate, infer, enhance, vad).

  2. Start Audio Base

    ./run.sh [-b <num_bases>] [-s <num_synchronizers>] [-l <standalone>] [-p <speech_separation>]
    

    Default parameter settings:

    • -b: 3 (number of audio base)
    • -s: 1 (number of audio base synchronizer)
    • -l: false (not standalone)
    • -p: false (no speech separation)

    💡 You can switch the operating mode of the audio base during runtime:

    Mode Description
    Record Record audio segments without recognition
    Recognize Recognize pre-recorded segments and synchronize
    Full Default: record and recognize simultaneously
  3. Start Control Base

    ./control.sh
    

Post-time Audio Analyzer

Post-time Analyzer Pipeline

To run the post-time audio analyzer:

  1. Create a speaker corpus folder: /audio_db/post-time/[audio_file_name]/
  2. Add speaker audio files named [speaker_name].wav to the corpus folder
  3. Run the analyzer:
    cd examples/
    conda activate mbox-audio
    
    # process single audio file (supported formats: wav, m4a, mp3), if not specified, then would process all files in the /audio/post-time/origin/ folder
    python3 run_audio_post_analyzer.py -f [audio_file_name.wav] 
    

Logs & Visualization

After running the analyzers, logs and visualizations are stored in the /logs/ and /visualizations/ folders.

FAQ

Citation

If you use this code in your research, please cite the following paper:

@inproceedings{inproceedings,
  author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
  year = {2024},
  month = {03},
  pages = {785-791},
  title = {Field report for Platform mBox: Designing an Open MMLA Platform},
  doi = {10.1145/3636555.3636872}
}

References

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openmmla_audio-0.1.4.post5.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

openmmla_audio-0.1.4.post5-py3-none-any.whl (80.8 kB view details)

Uploaded Python 3

File details

Details for the file openmmla_audio-0.1.4.post5.tar.gz.

File metadata

  • Download URL: openmmla_audio-0.1.4.post5.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.9

File hashes

Hashes for openmmla_audio-0.1.4.post5.tar.gz
Algorithm Hash digest
SHA256 92f9bf61c50a5a031ec5c7a7c1147ea8c2bf1c540e0b7cef9745327019b5d3a0
MD5 792837ed58e2889b9bdda451c1ebf622
BLAKE2b-256 6d322add4e760a7f972cff19014ae56ef41c74ab24a6730947fb1b7b2f726dfd

See more details on using hashes here.

File details

Details for the file openmmla_audio-0.1.4.post5-py3-none-any.whl.

File metadata

File hashes

Hashes for openmmla_audio-0.1.4.post5-py3-none-any.whl
Algorithm Hash digest
SHA256 c02fc21d35e6d5f160933557aa87afb1494f90b746133ffc5e5a3070be113649
MD5 83e96d6b4bfed1b1a950c380cf600ef6
BLAKE2b-256 672a5d8452ca3ac62f53a3a25499d3dab13c94a42e52a3db77f1ee1fbf3f17b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page