Audio data transformations library with a command-line interface.
Project description
Audio Transformers
A python library for audio signals transformations.
Setup
Prerequisites
Requires ffmpeg
, and Python >= 3.10
Installation
pip install audio-transformers
Command-Line Interface
List Available Transformations
Run:
audio transform list
Output:
Name Description
----------------- --------------------------------------------------
BandPass Apply band-pass filter.
BandStop Apply band-stop filter.
GaussianNoise Add gaussian noise to the signal.
HighPass Apply high-pass filter.
Inversion Inverse waveform polarity by multiplying it by -1.
LowPass Apply low-pass filter.
PitchShift Pitch shift transformation.
SpeedPerturbation Speed perturbation transformer.
Show Transformation Parameters
Run:
audio transform params TRANSFORMATION
For example:
audio transform params PitchShift
Output:
Name Type Default Description
--------------- ------ --------- --------------------------------------
shift float Pitch shift in octaves.
fft_window_size float 0.1 Short Time FFT window size in seconds.
Transform Audio File
Command format variants:
audio transform file INPUT_PATH OUTPUT_PATH TRANSFORMATION *OPTIONS
audio transform file INPUT_PATH OUTPUT_PATH --config=CONFIG_PATH
For example to specify transformation via CLI args run:
audio transform file path/to/input.opus path/to/output.wav PitchShift --shift=0.5
Output:
2024-08-20 18:07:24,897 INFO Processing file path/to/input.opus -> path/to/output.wav
5%|█████ | 21.1M/453M [00:00<00:11, 36.4Msamples/s]
Otherwise, you can specify transformation in a config file.
For example if task.yaml
contains the following definitions:
transforms:
- type: PitchShift
params:
shift: 0.2
- type: SpeedPerturbation
params:
speed_factor: 0.5
You can run:
audio transform file path/to/input.opus path/to/output.wav --config=task.yaml
The output.wav
will have pitch shifted by +0.2
octaves relative to input.opus
and will be stretched twice (with no additional significant pitch perturbations).
Transform Dataset
Command format:
audio transform files --config=FILE
Config will have additional attributes:
input_root: "path/to/INPUT/data/root"
input_pattern: "**/*.opus"
output_root: "path/to/OUTPUT/data/root"
output_pattern: "{reldir}/{name}.opus"
transforms:
- type: PitchShift
params:
shift: 0.5
- type: SpeedPerturbation
params:
speed_factor: 0.5
input_root
is a root directory for input datasetinput_pattern
is input file path pattern relative to theinput_root
output_root
is a root directory for output filesoutput_pattern
output file pattern relative to the output root. It will be recalculated for each input file. You can use curly braces{something}
to substitute the corresponding input file path elements. The following elements are supported:{relpath}
- full input path relative to the input root{reldir}
- input file directory relative to the input root{name}
- input file name without extension{ext}
- input file extension
Public Datasets
The audio
tool supports downloading public STT datasets for testing purpose.
Listing Public Datasets
Run:
audio datasets list
Output:
Name Format Size Archive Size
------------------------------------- -------- -------- --------------
radio_v4_and_public_speech_5percent opus 65.8 GB 11.4 GB
audiobook_2 opus 162.0 GB 25.8 GB
radio_2 opus 154.0 GB 24.6 GB
public_youtube1120 opus 237.0 GB 19.0 GB
asr_public_phone_calls_2 opus 66.0 GB 9.4 GB
public_youtube1120_hq opus 31.0 GB 4.9 GB
asr_public_stories_2 opus 9.0 GB 1.4 GB
tts_russian_addresses_rhvoice_4voices opus 80.9 GB 12.9 GB
public_youtube700 opus 75.0 GB 12.2 GB
asr_public_phone_calls_1 opus 22.7 GB 3.2 GB
Download Public Datasets
Run for example:
audio datasets download public_lecture_1 data/lecture_dataset
Output (intermediate):
2024-08-20 18:27:23,344 INFO Downloading dataset 'public_lecture_1' (122.5 MB) to data/lecture_dataset
90%|████████████████████████████████ | 110M/123M [00:43<00:15, 3.4Mbytes/s]
Development
The project requires Poetry and Python >= 3.10
Clone:
git clone git@github.com:stepan-anokhin/audio-transformers.git
Then:
cd audio-transformers
Install dependencies:
poetry install
Run tests:
poetry run pytest
The project uses Black code style. Run style check:
poetry run black --check --line-length 120 audio_transformers tests
Run linter:
poetry run flake8 audio_transformers tests --count --max-complexity=10 --max-line-length=120 --statistics
Project Structure
Packages:
audio_transformers/core
- implementations audio transformationsaudio_transformers/io
- input/output logic (usingffmpegio
)audio_transformers/cli
- CLI tool implementationaudio_transformers/cli/handlers
- CLI subcommand handlersaudio_transformers/utils
- misc utilitiestests
- unit-tests and integration tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file audio_transformers-1.0.0.tar.gz
.
File metadata
- Download URL: audio_transformers-1.0.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.5 Linux/6.5.0-41-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89e7f2e8df0d4a9e7d364f5082dc5cbeeb9e0b9234756e684bdd2e02d85a3d06 |
|
MD5 | da890a07fb70d05db412690855f225af |
|
BLAKE2b-256 | ddb568f7447f5929b90af4cb80ecd06c8741603a8c00f7b3cf6651e5e4046194 |
File details
Details for the file audio_transformers-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: audio_transformers-1.0.0-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.5 Linux/6.5.0-41-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a861a0227f68a49601bf16193b2c0bf246490cb6421fb06b6211a7d853cd4c1f |
|
MD5 | 7958d982517203d99473670aa183f667 |
|
BLAKE2b-256 | 9973e2a2072ecca096baa7ed36e3df427804615d40538e51abe55b1f67e66aa3 |