Scikit-talk helps to process real-world conversational speech data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

=========== scikit-talk

.. image:: https://img.shields.io/pypi/v/scikit_talk.svg :target: https://pypi.python.org/pypi/scikit_talk

.. image:: https://img.shields.io/travis/partigabor/scikit_talk.svg :target: https://travis-ci.com/partigabor/scikit_talk

.. image:: https://readthedocs.org/projects/scikit-talk/badge/?version=latest :target: https://scikit-talk.readthedocs.io/en/latest/?version=latest :alt: Documentation Status

Scikit-talk helps to process real-world conversational speech data.

Scikit-talk is a free and open source Python library to help working with transcriptions of conversational speech data, tailored for the research community. The ultimate aim is to make the processing, analyzing, and merging of such corpora speedier, and less cumbersome. Scikit-talk is still in development, with various modules underway. The current version features a working Preprocessor module.

Preprocessor module - build dataframes from files (e.g. .eaf, .cha, .txt). This module currently contains 3 functions. They essentially execute similar tasks for different transcription formats. The functions read in differently formatted conversational speech data, returning them in a unified format, which is then comparable, concatenatable, and easier to work on.

The functions of the Preprocessor module take two arguments, an input path, and an output path, where the latter is optional. If an output path is given, a .csv file is written there, which contains the a dataframe of all the transcription files that were read in. If only an input file is given, the functions return a dataframe compiled from the files.

The data is organized into the following columns: begin, end, speaker, utterance, source. If the corpus provides timestamps, begin and end will contain these in a pandas.datetime format, otherwise NaN.

We assume that the corpora and files are formatted perfectly, adhering to the requirements of various standards and conventions (e.g. Linguistic Data Consortium).

Free software: MIT license
Documentation: https://scikit-talk.readthedocs.io.
GitHub: https://github.com/partigabor/scikit_talk

Features

Preprocessor

Credits

This package uses tools from the speach_ library made by Le Tuan Anh, and the pylangacq_ library by Jackson L. Lee. This package was created with Cookiecutter_ and the audreyr/cookiecutter-pypackage_ project template.

.. _speach: https://github.com/neocl/speach .. _pylangacq: https://github.com/jacksonllee/pylangacq .. _Cookiecutter: https://github.com/audreyr/cookiecutter .. _audreyr/cookiecutter-pypackage: https://github.com/audreyr/cookiecutter-pypackage

======= History

0.0.223 (2021-07-27)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.1.1

Jan 5, 2024

0.0.300

Sep 27, 2022

0.0.251

Sep 27, 2022

0.0.250

Sep 27, 2022

This version

0.0.223

Jul 26, 2021

0.0.212

Jul 26, 2021

0.0.211

Jul 26, 2021

0.0.21

Jul 26, 2021

0.0.3

Sep 27, 2022

0.0.2

Jul 26, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_talk-0.0.223.tar.gz (12.9 kB view hashes)

Uploaded Jul 26, 2021 Source

Built Distribution

scikit_talk-0.0.223-py2.py3-none-any.whl (7.0 kB view hashes)

Uploaded Jul 26, 2021 Python 2 Python 3

Hashes for scikit_talk-0.0.223.tar.gz

Hashes for scikit_talk-0.0.223.tar.gz
Algorithm	Hash digest
SHA256	`1b83ba48221e0f6da6fe327a4407077de176575cbeed97999cd7e05ccc7c6c03`
MD5	`bfeb6796bd950cd041ce19c013713f7c`
BLAKE2b-256	`70da0cb64711f2192affa0e7382224d6deaf509d95787cecae911809e7b2d7ef`

Hashes for scikit_talk-0.0.223-py2.py3-none-any.whl

Hashes for scikit_talk-0.0.223-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa0728ab6a7b915131e7f37f8fac138adfde49b327752a12b1601ce593d7df03`
MD5	`f4407a8ca83f63908bfb8d73bf7c0e10`
BLAKE2b-256	`8cb06f64836441d132032e0e87a3a93bee60b3484a3cf23fae225e71bcdcb5d4`