Skip to main content

Scikit-talk helps to process real-world conversational speech data.

Project description

=========== scikit-talk

.. image:: https://img.shields.io/pypi/v/scikit_talk.svg :target: https://pypi.python.org/pypi/scikit_talk

.. image:: https://img.shields.io/travis/partigabor/scikit_talk.svg :target: https://travis-ci.com/partigabor/scikit_talk

.. image:: https://readthedocs.org/projects/scikit-talk/badge/?version=latest :target: https://scikit-talk.readthedocs.io/en/latest/?version=latest :alt: Documentation Status

Scikit-talk helps to process real-world conversational speech data.

Scikit-talk is a free and open source Python library to help working with transcriptions of conversational speech data, tailored for the research community. The ultimate aim is to make the processing, analyzing, and merging of such corpora speedier, and less cumbersome. Scikit-talk is still in development, with various modules underway. The current version features a working Preprocessor module.

Preprocessor module - build dataframes from files (e.g. .eaf, .cha, .txt). This module currently contains 3 functions. They essentially execute similar tasks for different transcription formats. The functions read in differently formatted conversational speech data, returning them in a unified format, which is then comparable, concatenatable, and easier to work on.

The functions of the Preprocessor module take two arguments, an input path, and an output path, where the latter is optional. If an output path is given, a .csv file is written there, which contains the a dataframe of all the transcription files that were read in. If only an input file is given, the functions return a dataframe compiled from the files.

The data is organized into the following columns: begin, end, speaker, utterance, source. If the corpus provides timestamps, begin and end will contain these in a pandas.datetime format, otherwise NaN.

We assume that the corpora and files are formatted perfectly, adhering to the requirements of various standards and conventions (e.g. Linguistic Data Consortium).

Features

  • Preprocessor

Credits

This package uses tools from the speach_ library made by Le Tuan Anh, and the pylangacq_ library by Jackson L. Lee. This package was created with Cookiecutter_ and the audreyr/cookiecutter-pypackage_ project template.

.. _speach: https://github.com/neocl/speach .. _pylangacq: https://github.com/jacksonllee/pylangacq .. _Cookiecutter: https://github.com/audreyr/cookiecutter .. _audreyr/cookiecutter-pypackage: https://github.com/audreyr/cookiecutter-pypackage

======= History

0.0.223 (2021-07-27)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_talk-0.0.223.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scikit_talk-0.0.223-py2.py3-none-any.whl (7.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file scikit_talk-0.0.223.tar.gz.

File metadata

  • Download URL: scikit_talk-0.0.223.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for scikit_talk-0.0.223.tar.gz
Algorithm Hash digest
SHA256 1b83ba48221e0f6da6fe327a4407077de176575cbeed97999cd7e05ccc7c6c03
MD5 bfeb6796bd950cd041ce19c013713f7c
BLAKE2b-256 70da0cb64711f2192affa0e7382224d6deaf509d95787cecae911809e7b2d7ef

See more details on using hashes here.

File details

Details for the file scikit_talk-0.0.223-py2.py3-none-any.whl.

File metadata

  • Download URL: scikit_talk-0.0.223-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for scikit_talk-0.0.223-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fa0728ab6a7b915131e7f37f8fac138adfde49b327752a12b1601ce593d7df03
MD5 f4407a8ca83f63908bfb8d73bf7c0e10
BLAKE2b-256 8cb06f64836441d132032e0e87a3a93bee60b3484a3cf23fae225e71bcdcb5d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page