Skip to main content

Vectorizing Speech Sounds in Phonetic Transcription

Project description

SoundVectors

This lightweight Python package provides a robust tool for translating sounds into phonological feature vectors. It is described in detail in our study "A Generative System for Translating Sounds to Phonological Feature Vectors". If you use the package, we ask you kindly to cite this paper.

Rubehn, Arne, Jessica Nieder, and Johann-Mattis List (2024): A Generative System for Translating Sounds to Phonological Feature Vectors. +++

Build Status PyPI

Installation

You can install the soundvectors package via pip.

pip install soundvectors

Requirements for running the evaluation

If you wish to reproduce the evaluation from our paper, you require some additional dependencies that are not required by the core package. To install them, clone this repository and run:

$ pip install -e .[dev]

You also need to download the evaluation data from Lexibank. For this, cd into the eval directory and run:

soundvectors$ cd eval  # cd into eval directory
eval$ make download

This will clone the lexibank-analysed dataset into the eval directory.

After running the evaluation scripts, you can clear the data from your disk by running the command:

eval$ make clear

Usage

The core of this package is the SoundVectors class, which translates valid IPA symbols to their corresponding feature vectors. The recommended usage of SoundVectors is passing a callable transcription system via the keyword argument ts:

>>> from soundvectors import SoundVectors
>>> from pyclts import CLTS
>>> bipa = CLTS().bipa
>>> sv = SoundVectors(ts=bipa)
>>> sv.get_vec("t")
(1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 0, 0, -1, -1, -1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0)

Alternatively, the get_vec function can be called passing a Sound object (derived from soundvectors), or a string describing the sound according to IPA conventions. The resulting vectors are the same:

>>> sv.get_vec("voiceless alveolar stop consonant") == sv.get_vec("t") == sv.get_vec(bipa["t"])
True

Instead of obtaining a vector directly, you can also obtain a FeatureBundle object:

>>> feature_bundle = sv["t"]  # set vectorize=False to return an object
>>> feature_bundle.cons  # feature values can be retrieved by attribute access
1

>>> feature_bundle.as_set()  # represent feature bundle as set of non-zero feature strings
frozenset({'-son', '-distr', '-cont', '-lab', '-lo', '-long', '+front', '-laryngeal', '-syl', '-delrel', '-voi', '-round', '+cons', '-velaric', '-dorsal', '-back', '-nas', '-pharyngeal', '+ant', '+cor', '-cg', '-sg', '-lat', '-hi'})

>>> str(feature_bundle)  # string representation
'+cons,-syl,-son,-cont,-delrel,-lat,-nas,-voi,-sg,-cg,-pharyngeal,-laryngeal,+cor,-dorsal,-lab,-hi,-lo,-back,+front,0_tense,-round,-velaric,-long,+ant,-distr,0_strid,0_hitone,0_hireg,0_loreg,0_rising,0_falling,0_contour,0_backshift,0_frontshift,0_opening,0_closing,0_centering,0_longdistance,0_secondrounded'

>>> feature_bundle.as_vector()  # raw vector representation (equal to the return value with vectorize=True)
(1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0)

Finally, you can __call__ the SoundVectors object to process a Collection of sounds:

>> > sv(["s", "v"])
[(1, -1, -1, -1, ..., 0),
 (1, -1, -1, 1, ..., 0)]

Evaluation

The eval directory provides the code that was used for the Evaluation section in the paper. If you wish to reproduce our results reported in the paper, make sure that you have installed the dependencies and downloaded the data (see above). Then, you can simply run all evaluation scripts - with each file corresponding to a subsection of the paper with the same name:

$ cd eval
$ python vector_similarities.py  # 4.1 & 4.2
$ python equivalence_classes.py  # 4.3
$ python distinctiveness.py  # 4.4
$ python concordanceline.py  # 4.4

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soundvectors-1.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

soundvectors-1.0-py2.py3-none-any.whl (10.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file soundvectors-1.0.tar.gz.

File metadata

  • Download URL: soundvectors-1.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for soundvectors-1.0.tar.gz
Algorithm Hash digest
SHA256 dc6515e7a78c7d5f53715a37fc8076dd7f1c4cff05648986d90cab10c7892574
MD5 655cfbf61f2b8603f68fbbfc2fae7c0f
BLAKE2b-256 48af3bb924320bc0eed8c4ed95f321a9b271154e89b160efad96bd75c6431c65

See more details on using hashes here.

File details

Details for the file soundvectors-1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: soundvectors-1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for soundvectors-1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6b2de24aba7369d97ec35c59bb73188d3072e0b0db76146de1b46f4ab977bd23
MD5 28afbad6ec6cba4beb98b23fa5bcf2d8
BLAKE2b-256 8766567fe022d3b3685ffa268e2769789b51f0449a483bca004cc7bfaf8889ca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page