Skip to main content

Audio module for the OpenMMLA platform

Project description

OpenMMLA Audio

PyPI version

Audio module of the mBox multimodal learning analytic system. For more details, please refer to mBox System Design.

Uber Server Setup

Before setting up the audio base, you need to set up a server hosting the InfluxDB, Redis, and Mosquitto services. Please refer to mbox-uber module.

Audio Base & Server Setup

Downloading and Setting up the mbox-audio module is accomplished in three steps:
(1) Clone the repository from GitHub to your local home directory.
(2) Install required system dependencies.
(3) Install openmmla-audio.

  1. Clone the repository from GitHub

    git clone https://github.com/ucph-ccs/mbox-audio.git
    
  2. Install the required dependencies

    • Mac
      # Install ffmpeg, portaudio-19.7.0, mecab-0.996(required for sacrebleu for NLP collection), llvm-16.0.6
      brew install ffmpeg
      brew install portaudio
      brew install mecab
      brew install llvm
      
      # Export llvm to your PATH, run:
      echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc
      echo 'export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"' >> ~/.zshrc
      echo 'export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"' >> ~/.zshrc
      source ~/.zshrc
      
    • Ubuntu 24.04
      sudo apt update && sudo apt upgrade
      sudo apt install build-essential
      sudo apt install git
      sudo apt install ffmpeg
      sudo apt install python3-pyaudio
      sudo apt update && sudo apt install -y libsndfile1
      
      # Install portaudio
      sudo apt install libasound-dev
      # Download the portaudio archive from: http://files.portaudio.com/download.html
      wget https://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz
      # Unzip the archive
      tar -zxvf pa_stable_v190700_20210406.tgz
      # Enter the directory and compile
      cd portaudio
      ./configure && make
      sudo make install
      
    • Raspberry Pi Bullseye or later
      # Install pyaudio
      sudo apt-get install portaudio19-dev
      
  3. Install openmmla-audio with conda environment

    • Conda
      # For Raspberry Pi
      wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
      bash Miniforge3-$(uname)-$(uname -m).sh
      
      # For Mac and Linux
      wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh"
      bash Miniconda3-latest-$(uname)-$(uname -m).sh
      
    • Audio Base
      conda create -c conda-forge -n audio-base python==3.10.12 -y
      conda activate audio-base
      pip install openmmla-audio
      
    • Audio Server
      conda create -c conda-forge -n audio-server python==3.10.12 -y
      conda activate audio-server
      pip install openmmla-audio[server] # for linux and raspberry pi
      pip install 'openmmla-audio[server]' # for mac
      

Usage

After successfully installing all required libraries, you can run the audio module on terminal.

  1. Run real-time audio analysis system

    • Run audio server and audio bases in distributed mode
    # Run server scripts on your application servers supporting audio bases, specify your audio server cluster on 
    # your uber server by configuring the mbox-uber/conf/nginx.sh file and specify your extra audio upstream services in 
    # mbox-uber/conf/nginx.conf file.
    
    # e.g. our default setting runs audio services on 3 servers, which are server-01.local, server-02.local and 
    # server-03.local. Inside the nginx.conf, we specify 5 audio services related to those three server, which are
    # transcribe, separate, infer, enhance and vad services.
    sudo apt install tmux -y
    ./server.sh
    
    # Run audio bases 
    # :param -b the number of audio base needed to run, default to 3. 
    # :param -s the number of audio base synchronizer need to run, default to 1.
    # :param -l whether to run the audio bases standalone or with application servers, default to false. 
    # :param -p whether to do the speech separation when recognizing, default to false. 
    ./run.sh
    
    # control script to start/stop the session playing
    ./control.sh
    
  2. Run the post-time audio analyzer

    1. Create a speaker corpus folder under /audio_db/post-time/ folder, the folder name should be aligned with the name of the audio file to be processed [audio_file_name.wav] without the extension, e.g. /audio_db/post-time/[audio_file_name]/.
    2. Copy the speaker audio files to the speaker corpus folder, the audio files should be named as [speaker_name].wav.
    3. Run audio_post_analyzer.py
      cd examples/
         
      # process a single audio file, supported audio file format: wav, m4a, mp3
      python3 run_audio_post_analyzer.py -f [audio_file_name.wav]
         
      # process all audio files under the ***/audio/post-time/origin/*** folder
      python3 run_audio_post_analyzer.py
      

Visualization

After running, the logs and visualizations are stored in the /logs/ and /visualizations/ folders.

FAQ

Citation

If you use this code in your research, please cite the following paper:

@inproceedings{inproceedings,
author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
year = {2024},
month = {03},
pages = {785-791},
title = {Field report for Platform mBox: Designing an Open MMLA Platform},
doi = {10.1145/3636555.3636872}
}

References

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openmmla-audio-0.1.4.post1.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

openmmla_audio-0.1.4.post1-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file openmmla-audio-0.1.4.post1.tar.gz.

File metadata

  • Download URL: openmmla-audio-0.1.4.post1.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.9

File hashes

Hashes for openmmla-audio-0.1.4.post1.tar.gz
Algorithm Hash digest
SHA256 324eb9b35f4cf23d1be64bf2ba5a0fc6726a21960807b64ad19dbe62af5fa428
MD5 39c9eefe703bf44c0c09ab7ac063040a
BLAKE2b-256 c3a4df9abe30f1a765c590084db1fbcf57b60b40a2502f7c67eb5d90a66cc1dc

See more details on using hashes here.

File details

Details for the file openmmla_audio-0.1.4.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for openmmla_audio-0.1.4.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 26ea7f5ac78f044bd5bf357437f1506ab56772e367e5ab8d302f8bfcaf6719d7
MD5 705a32bd22d8d4887ba8afb47c3320de
BLAKE2b-256 afc133b023e2b4f007003e87b989b8532fa293b10bc579cf4da587bf2c6ab000

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page