Skip to main content

CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender.

Project description

# inaSpeechSegmenter

inaSpeechSegmenter is a framework for speech segmentation in Python 3.
It provides methods for speech music segmentation allowing to split audio signal into homogenous zones of speech and music.
It provides methods for speaker gender segmentation allowing to split speech excerpts into men and women speech.

## Installation

inaSpeechSegmenter is a framework in python 3.
It can be installed using the following procedure:

### Prerequisites

inaSpeechSegmenter requires ffmpeg for decoding any type of format.
Installation of ffmpeg for ubuntu can be done using the following commandline:
```bash
$ sudo apt-get install ffmpeg
```


### Installing from from sources

```bash
# clone git repository
$ git clone https://github.com/ina-foss/inaSpeechSegmenter.git
# create a python 3 virtual environement and activate it
$ virtualenv -p python3 inaSpeechSegEnv
$ source inaSpeechSegEnv/bin/activate
# install a backend for keras (tensorflow, theano, cntk...)
$ pip install tensorflow-gpu # if you wish GPU implementation (recommended)
$ pip install tensorflow # for a CPU implementation
# install framework and dependencies
$ cd inaSpeechSegmenter
$ python setup.py install
```

### PIP installation
TODO: Not Yet managed
```bash
# create a python 3 virtual environement and activate it
$ virtualenv -p python3 inaSpeechSegEnv
$ source inaSpeechSegEnv/bin/activate
# install a backend for keras (tensorflow, theano, cntk...)
$ pip install tensorflow-gpu # if you wish GPU implementation (recommended)
$ pip install tensorflow # for a CPU implementation
# install framework and dependencies
$ pip install inaSpeechSegmenter
```

## Using inaSpeechSegmenter

### Speech Segmentation Program
Binary program ina_speech_segmenter.py may be used to segment multimedia archives encoded in any format supported by ffmpeg. It requires input media and output csv files corresponding to the segmentation. Corresponding csv may be visualised using softwares such as https://www.sonicvisualiser.org/
```bash
# get help
$ ina_speech_segmenter.py --help
usage: ina_speech_segmenter.py [-h] -i INPUT [INPUT ...] -o OUTPUT_DIRECTORY

Do Speech/Music and Male/Female segmentation. Store segmentations into CSV
files

optional arguments:
-h, --help show this help message and exit
-i INPUT [INPUT ...], --input INPUT [INPUT ...]
Input media to analyse. May be a full path to a media
(/home/david/test.mp3), a list of full paths
(/home/david/test.mp3 /tmp/mymedia.avi), or a regex
input pattern ("/home/david/myaudiobooks/*.mp3")
-o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
Directory used to store segmentations. Resulting
segmentations have same base name as the corresponding
input media, with csv extension. Ex: mymedia.MPG will
result in mymedia.csv
```
### Using Speech Segmentation API

InaSpeechSegmentation API is very simple to use!
See the following notebook for a comprehensive example: [API Tutorial Here!](API_Tutorial.ipynb)

## Citing

inaSpeechSegmenter has been presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018 conference in Calgary, Canada. If you use this toolbox in your research, you can cite the following work in your publications :


```bibtex
@inproceedings{ddoukhanicassp2018,
author = {Doukhan, David and Carrive, Jean and Vallet, Félicien and Larcher, Anthony and Meignier, Sylvain},
title = {An Open-Source Speaker Gender Detection Framework for Monitoring Gender Equality},
year = {2018},
organization={IEEE},
booktitle={Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on}
}
```


## CREDITS

This work was realized in the framework of MeMAD project.
https://memad.eu/
MeMAD is an EU funded H2020 research project.
It has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inaSpeechSegmenter-0.0.1.tar.gz (6.9 kB view hashes)

Uploaded Source

Built Distribution

inaSpeechSegmenter-0.0.1-py3-none-any.whl (32.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page