An even smaller speech recognizer

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Natural Language
- English
- French
Operating System
- OS Independent
Programming Language

Reason this release was yanked:

horrible bug

Project description

SoundSwallower: an even smaller speech recognizer

"Time and change have a voice; eternity is silent. The human ear is always searching for one or the other."
Leena Krohn, Datura, or a delusion we all see

SoundSwallower is a refactored version of PocketSphinx intended primarily for embedding in web applications. The goal is not to provide a fast implementation of large-vocabulary continuous speech recognition, but rather to provide a small implementation of simple, useful speech technologies.

With that in mind the current version is limited to finite-state grammar recognition. In addition, the eternally problematic and badly-designed audio library as well as all other external dependencies have been removed.

Compiling SoundSwallower

Currently SoundSwallower can be built in several different ways. To build the C shared library, run CMake in the standard way:

mkdir build
cd build
cmake ..
make
make test
make install

Note that this isn't terribly useful as there is no command-line frontend. You probably want to target JavaScript or Python.

Installing the Python module and CLI

The SoundSwallower command-line is a Python module (soundswallower.cli) and can be installed using pip. It is highly recommended to do this in a virtualenv. You can simply install it from PyPI:

pip install soundswallower

Or compile from source:

pip install .

For development, you can install it in-place, but please make sure to remove any existing global installation:

pip uninstall soundswallower
pip install -e .

The command-line supports JSGF grammars and word-level force alignment for one or more input files, for example:

soundswallower --align tests/data/goforward.txt tests/data/goforward.wav
soundswallower --align-text "go forward ten meters" tests/data/goforward.wav
soundswallower --grammar tests/data/goforward.gram tests/data/goforward.wav

Note that multiple input files are not particularly useful for --align or --align-text as they will simply (try to) align the same text to each file. The output results (a list of time-aligned words) can be written to a JSON file with --output.

Compiling to JavaScript/WebAssembly

To build the JavaScript library, use CMake with Emscripten:

cd js
emcmake cmake ..
emmake make

This will create js/soundswallower.js and js/soundswallower.wasm in the jsbuild directory, which you can then include in your projects. Demo applications can be seen at https://github.com/dhdaines/alignment-demo and https://github.com/dhdaines/soundswallower-demo.

For more details on the JavaScript implementation and API, see js/README.js.

Creating binary distributions for Python

To build the Python extension, I suggest using build, as it will ensure that everything is done in a totally clean environment. Run this from the top-level directory

python -m build

In all cases the resulting binary wheel (found in dist) is self-contained and should not need any other components aside from the system libraries. To create wheels that are compatible with multiple Linux distributions, see the instructions in README.manylinux.md.

Compiling on Windows in Visual Studio Code

The method for building distributions noted above will also work on Windows, from within a Conda environment, provided you have Visual Studio or the Visual Studio Build Tools installed. This is somewhat magic.

If you don't have Conda, then what you will need to do is:

Install Visual Studio build tools. Unfortunately, a direct link does not seem to exist, but you can find them under Microsoft's downloads page. The 2019 version is probably the optimal one to use as it is compatible with all recent versions of Windows.
Install the version of Python you wish to use.
Launch the Visual Studio command-line prompt from the Start menu. Note that if your Python is 64-bit (recommended), you must be sure to launch the "x64 Native Command Line Prompt".
Create and activate a virtual environment using your Python binary, which may or may not be in your AppData directory:
```
  %USERPROFILE%\AppData\Local\Programs\Python\Python310\python -m venv py310
  py310\scripts\activate
```
now you can build wheels with pip, using the same method mentioned above.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Natural Language
- English
- French
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.6.4

Feb 1, 2024

0.6.0

Jan 26, 2023

0.5.0

Dec 20, 2022

0.4.1

Nov 9, 2022

0.4.0

Nov 9, 2022

0.3.2

Jul 9, 2022

This version

0.3.1 yanked

Jul 8, 2022

Reason this release was yanked:

horrible bug

0.3.0

Jun 27, 2022

0.2.2

Jun 1, 2022

0.2.1

May 24, 2022

0.2

May 24, 2022

0.1.5

Apr 27, 2022

0.1.4

Apr 17, 2022

0.1.3

Apr 14, 2022

0.1.2 yanked

Mar 17, 2022

Reason this release was yanked:

does not compile on windows

0.1.1

Apr 22, 2020

0.1

Apr 21, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soundswallower-0.3.1.tar.gz (10.3 MB view hashes)

Uploaded Jul 8, 2022 Source

Built Distributions

soundswallower-0.3.1-cp310-cp310-win_amd64.whl (9.5 MB view hashes)

Uploaded Jul 8, 2022 CPython 3.10 Windows x86-64

soundswallower-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded Jul 8, 2022 CPython 3.10 manylinux: glibc 2.17+ x86-64

soundswallower-0.3.1-cp39-cp39-win_amd64.whl (9.5 MB view hashes)

Uploaded Jul 8, 2022 CPython 3.9 Windows x86-64

soundswallower-0.3.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded Jul 8, 2022 CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

soundswallower-0.3.1-cp38-cp38-win_amd64.whl (9.5 MB view hashes)

Uploaded Jul 8, 2022 CPython 3.8 Windows x86-64

soundswallower-0.3.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded Jul 8, 2022 CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

soundswallower-0.3.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded Jul 8, 2022 CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

Hashes for soundswallower-0.3.1.tar.gz

Hashes for soundswallower-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`1fb4324ee0030544c3e882f0b04fdbd8e18b13d8f6201abb6e86b18720301b49`
MD5	`2f7937bbbd27639e68523b5ff3ca560c`
BLAKE2b-256	`026e7d4add7c5875be6e72b31f9e370aa8af86f2836534f9f5cbeef742eb6de9`

Hashes for soundswallower-0.3.1-cp310-cp310-win_amd64.whl

Hashes for soundswallower-0.3.1-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`37b01ae0460b1f6a3cb2a5d96370fd7b742bf050777cda1d235126ae5473a7a9`
MD5	`0fead7cbec566cfdd98a99cc641e66a2`
BLAKE2b-256	`ec29711355f4f44184889bef7822aa292cf31d1c086ac643b8d6c73aca411095`

Hashes for soundswallower-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for soundswallower-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`674804b2de95d94d48cbffe62d7780b94f0bd871655c2188fc83f01b16c11486`
MD5	`5321476c79bc8a2badf41fd27d290f01`
BLAKE2b-256	`97af0da45b07312077bcf55c692bce743b5909991847bdc3a8f9447b42abb5c5`

Hashes for soundswallower-0.3.1-cp39-cp39-win_amd64.whl

Hashes for soundswallower-0.3.1-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`8169eb230b4d90cef905bc5e509d747f970d7eb06b32372fcf36a755688c1336`
MD5	`b5c24d1ab1f1c8b3fcc270d036098ac8`
BLAKE2b-256	`69aa297440c314c4cb8e86a2e490caa2530ce492b8a291ec45185b065b8aed3d`

Hashes for soundswallower-0.3.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for soundswallower-0.3.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`dff7323e26ab59cc49bd910ba98255eb02444240af6ae62e2bb374d39275bf53`
MD5	`155281097d5158c9a31ae3b6bdcf389a`
BLAKE2b-256	`9c5bd6fe01e977bc46170ef46967711dd9d277f11024c6d666636b968ff00430`

Hashes for soundswallower-0.3.1-cp38-cp38-win_amd64.whl

Hashes for soundswallower-0.3.1-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`4ec41ac5b75e9d1a48b42be4afadb2e23cfacb8bbd430ea1ca6e9cd5205efafb`
MD5	`770f548f093a0005940a4aac455149c7`
BLAKE2b-256	`da9653d0497c529253a3bcdd2a8672b88bea71fa11a58b8ce8b58df8c1a5dc3b`

Hashes for soundswallower-0.3.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for soundswallower-0.3.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e26a30fad3016bbb503a25b4859e77669a4a0ac4c12016bac84d35ca58fd35f2`
MD5	`91e592b160f5b814ae77741cf7943eb7`
BLAKE2b-256	`ca254c5120bae18363559d7b0cee0a8b824e2e64a38ae7a91710a0cba4383984`

Hashes for soundswallower-0.3.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for soundswallower-0.3.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`0bb3ee277883447a0e5c06b74c79055b329d907f5aebb2038a864a5f9e8ead52`
MD5	`62908fffade8e7c7db7d0b46830e4393`
BLAKE2b-256	`a46240010a9f9fab06cc1fc21542ccaa99d8eccd4c98edac1aae48127615ffd9`