Audio module for the OpenMMLA platform
Project description
🎙️ OpenMMLA Audio
Audio module of the mBox - an open multimodal learning analytic platform. For more details, please refer to mBox System Design.
Table of Contents
Related Modules
Installation
Uber Server Setup
Before setting up the audio base, you need to set up a server hosting the InfluxDB, Redis, and Mosquitto services. Please refer to mbox-uber module.
Audio Base & Server Setup
-
Clone the repository
git clone https://github.com/ucph-ccs/mbox-audio.git
-
Install required system dependencies
Mac
brew install ffmpeg portaudio mecab llvm echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc echo 'export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"' >> ~/.zshrc echo 'export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"' >> ~/.zshrc source ~/.zshrc
Ubuntu 24.04
sudo apt update && sudo apt upgrade sudo apt install build-essential git ffmpeg python3-pyaudio libsndfile1 libasound-dev wget https://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz tar -zxvf pa_stable_v190700_20210406.tgz cd portaudio ./configure && make sudo make install
Raspberry Pi Bullseye or later
sudo apt-get install portaudio19-dev
-
Install openmmla-audio
Set up Conda environment
# For Raspberry Pi wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh # For Mac and Linux wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh" bash Miniconda3-latest-$(uname)-$(uname -m).sh
Install Audio Base
conda create -c conda-forge -n audio-base python==3.10.12 -y conda activate audio-base pip install openmmla-audio[base] # for Linux and Raspberry Pi pip install 'openmmla-audio[base]' # for Mac
Install Audio Server
conda create -c conda-forge -n audio-server python==3.10.12 -y conda activate audio-server pip install openmmla-audio[server] # for Linux and Raspberry Pi pip install 'openmmla-audio[server]' # for Mac
-
Set up folder structure
cd mbox-audio ./reset.sh
Standalone Setup
If you want to run the entire mBox Audio system on a single machine (not in distributed mode), follow these steps:
-
Set up the Uber Server on your machine following the instructions in the mbox-uber module.
-
Install system dependencies as described in the "Audio Base & Server Setup" section above.
-
Install openmmla-audio with all dependencies:
conda create -c conda-forge -n mbox-audio python==3.10.12 -y conda activate mbox-audio pip install openmmla-audio[all] # for Linux and Raspberry Pi pip install 'openmmla-audio[all]' # for Mac
-
Set up the folder structure:
cd mbox-audio ./reset.sh
This setup will allow you to run all components of mBox Audio on a single machine.
Usage
Real-time Audio Analyzer
To run the real-time audio analyzer:
-
Start Audio Server (optional)
./server.sh
This script runs distributed audio services on audio servers. To configure your audio server cluster: please refer to the nginx setup running on your uber server.Default setup: Three servers (server-01.local, server-02.local, server-03.local) with five services (transcribe, separate, infer, enhance, vad).
-
Start Audio Base
./run.sh [-b <num_bases>] [-s <num_synchronizers>] [-l <standalone>] [-p <speech_separation>]
Default parameter settings:
-b
: 3 (number of audio base)-s
: 1 (number of audio base synchronizer)-l
: false (not standalone)-p
: false (no speech separation)
💡 You can switch the operating mode of the audio base during runtime:
Mode Description Record Record audio segments without recognition Recognize Recognize pre-recorded segments and synchronize Full Default: record and recognize simultaneously -
Start Control Base
./control.sh
Post-time Audio Analyzer
To run the post-time audio analyzer:
- Create a speaker corpus folder:
/audio_db/post-time/[audio_file_name]/
- Add speaker audio files named
[speaker_name].wav
to the corpus folder - Run the analyzer:
cd examples/ conda activate mbox-audio # process single audio file (supported formats: wav, m4a, mp3), if not specified, then would process all files in the /audio/post-time/origin/ folder python3 run_audio_post_analyzer.py -f [audio_file_name.wav]
Logs & Visualization
After running the analyzers, logs and visualizations are stored in the /logs/
and /visualizations/
folders.
FAQ
Citation
If you use this code in your research, please cite the following paper:
@inproceedings{inproceedings,
author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
year = {2024},
month = {03},
pages = {785-791},
title = {Field report for Platform mBox: Designing an Open MMLA Platform},
doi = {10.1145/3636555.3636872}
}
References
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file openmmla_audio-0.1.4.post5.tar.gz
.
File metadata
- Download URL: openmmla_audio-0.1.4.post5.tar.gz
- Upload date:
- Size: 65.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92f9bf61c50a5a031ec5c7a7c1147ea8c2bf1c540e0b7cef9745327019b5d3a0 |
|
MD5 | 792837ed58e2889b9bdda451c1ebf622 |
|
BLAKE2b-256 | 6d322add4e760a7f972cff19014ae56ef41c74ab24a6730947fb1b7b2f726dfd |
File details
Details for the file openmmla_audio-0.1.4.post5-py3-none-any.whl
.
File metadata
- Download URL: openmmla_audio-0.1.4.post5-py3-none-any.whl
- Upload date:
- Size: 80.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c02fc21d35e6d5f160933557aa87afb1494f90b746133ffc5e5a3070be113649 |
|
MD5 | 83e96d6b4bfed1b1a950c380cf600ef6 |
|
BLAKE2b-256 | 672a5d8452ca3ac62f53a3a25499d3dab13c94a42e52a3db77f1ee1fbf3f17b9 |