Audio module for the OpenMMLA platform
Project description
OpenMMLA Audio
Audio module of the mBox - an open multimodal learning analytic platform. For more details, please refer to mBox System Design.
Other modules of the mBox:
Uber Server Setup
Before setting up the audio base, you need to set up a server hosting the InfluxDB, Redis, and Mosquitto services. Please refer to mbox-uber module.
Audio Base & Server Setup
Downloading and Setting up the mbox-audio module is accomplished in four steps:
(1) Clone the repository from GitHub to your local home directory.
(2) Install required system dependencies.
(3) Install openmmla-audio.
(4) Set up folder structure.
-
Clone the repository from GitHub
git clone https://github.com/ucph-ccs/mbox-audio.git
-
Install the required dependencies
-
Mac
# Install ffmpeg, portaudio-19.7.0, mecab-0.996(required for sacrebleu for NLP collection), llvm-16.0.6 brew install ffmpeg brew install portaudio brew install mecab brew install llvm # Export llvm to your PATH, run: echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc echo 'export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"' >> ~/.zshrc echo 'export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"' >> ~/.zshrc source ~/.zshrc
-
Ubuntu 24.04
sudo apt update && sudo apt upgrade sudo apt install build-essential sudo apt install git sudo apt install ffmpeg sudo apt install python3-pyaudio sudo apt update && sudo apt install -y libsndfile1 # Install portaudio sudo apt install libasound-dev # Download the portaudio archive from: http://files.portaudio.com/download.html wget https://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz # Unzip the archive tar -zxvf pa_stable_v190700_20210406.tgz # Enter the directory and compile cd portaudio ./configure && make sudo make install
-
Raspberry Pi Bullseye or later
# Install pyaudio sudo apt-get install portaudio19-dev
-
-
Install openmmla-audio with conda environment
-
Conda
# For Raspberry Pi wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh # For Mac and Linux wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh" bash Miniconda3-latest-$(uname)-$(uname -m).sh
-
Audio Base
conda create -c conda-forge -n audio-base python==3.10.12 -y conda activate audio-base # Approach 1: install openmmla-audio pip install openmmla-audio # Approach 2: install the audio module in development mode pip install -e .
-
Audio Server
conda create -c conda-forge -n audio-server python==3.10.12 -y conda activate audio-server # Approach 1: install openmmla-audio with server dependencies pip install openmmla-audio[server] # for linux and raspberry pi pip install 'openmmla-audio[server]' # for mac # Approach 2: install the audio module with server dependencies in development mode pip install -e .[server] # for linux and raspberry pi pip install -e '.[server]' # for mac
-
-
Set up folder structure
cd mbox-audio ./reset.sh
Usage
After successfully installing all required libraries, you can run the audio module on terminal.
-
Run real-time audio analyzer
-
Audio services
Optionally, run server scripts
./server.sh
on your application servers for audio services. To set up your audio server cluster, configure the file mbox-uber/conf/nginx.sh and specify your audio upstream services in mbox-uber/conf/nginx.conf. For example, by default, we run audio services on three servers: server-01.local, server-02.local, and server-03.local. In the nginx.conf file, we define five audio services for these servers: transcribe, separate, infer, enhance, and vad. -
Audio base
# :param -b the number of audio base needed to run, default to 3. # :param -s the number of audio base synchronizer need to run, default to 1. # :param -l whether to run the audio bases standalone or with application servers, default to false. # :param -p whether to do the speech separation when recognizing, default to false. # e.g. for running 3 audio bases and 1 audio base synchronizer in distributed mode with audio services ./run.sh # e.g. for running 3 audio bases and 1 audio base synchronizer in standalone mode ./run.sh -l true
-
Control base
Run the control base with
./control.sh
to control the audio bases and audio base synchronizer.
-
-
Run post-time audio analyzer
- Create a speaker corpus folder under /audio_db/post-time/ folder, the folder name should be aligned with the name of the audio file to be processed [audio_file_name.wav] without the extension, e.g. /audio_db/post-time/[audio_file_name]/.
- Copy the speaker audio files to the speaker corpus folder, the audio files should be named as [speaker_name] .wav.
- Run audio_post_analyzer.py
cd examples/ # process a single audio file, supported audio file format: wav, m4a, mp3 python3 run_audio_post_analyzer.py -f [audio_file_name.wav] # process all audio files under the ***/audio/post-time/origin/*** folder python3 run_audio_post_analyzer.py
-
Run audio analyzer in mixed mode
You can switch the operating mode of the audio dock to “Record”, “Recognize” or “Full”. The “Record” mode will continuously record audio in segments without recognizing them. After recording, you can switch to 'Recognize' mode to recognize pre-recorded audio segments and synchronize the results via the Synchronizer. The “Full” mode is the default mode set in real-time audio analyzer i.e. recording and recognizing at the same time.
Visualization
After running, the logs and visualizations are stored in the /logs/ and /visualizations/ folders.
FAQ
Citation
If you use this code in your research, please cite the following paper:
@inproceedings{inproceedings,
author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
year = {2024},
month = {03},
pages = {785-791},
title = {Field report for Platform mBox: Designing an Open MMLA Platform},
doi = {10.1145/3636555.3636872}
}
References
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file openmmla_audio-0.1.4.post3.tar.gz
.
File metadata
- Download URL: openmmla_audio-0.1.4.post3.tar.gz
- Upload date:
- Size: 76.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e827b65b25c190a9515dc1451cd553e83b6e80e7ee922ca58488c285f848b745 |
|
MD5 | 28d5828493ac34506907f3992232da5e |
|
BLAKE2b-256 | b886cbbb66dd2ead52a77dc1d52b4f8ab21a784f5dfb8eaa9630a3fc238ba38c |
File details
Details for the file openmmla_audio-0.1.4.post3-py3-none-any.whl
.
File metadata
- Download URL: openmmla_audio-0.1.4.post3-py3-none-any.whl
- Upload date:
- Size: 80.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 123b423d376f40c7942809c0b9cab7903f5ed216936ae553745fcdbe44201c3d |
|
MD5 | b1a75d3d56585bc1dd6469467ed3c228 |
|
BLAKE2b-256 | 3abeda1e8a8e3bbfe00c1e8fba4b009299429ef0560d345b945150e706f8605f |