Skip to main content

An even smaller speech recognizer

Project description

SoundSwallower: an even smaller speech recognizer

"Time and change have a voice; eternity is silent. The human ear is always searching for one or the other."
Leena Krohn, Datura, or a delusion we all see

SoundSwallower is a very small and simple speech recognizer intended primarily for embedding in web applications. The goal is not to provide a fast implementation of large-vocabulary continuous speech recognition, but rather to provide a small implementation of simple, useful speech technologies.

With that in mind the current version is limited to finite-state grammar recognition. In addition, the eternally problematic and badly-designed audio library as well as (almost) all other external dependencies have been removed.

Compiling SoundSwallower

Currently SoundSwallower can be built in several different ways. To build the C library, run CMake in the standard way:

cmake -S . -B build
cmake --build build
cmake --build build --target check
sudo cmake --build --target install

Note that this isn't terribly useful as there is no command-line frontend, and shared libraries are not built by default (pass -DBUILD_SHARED_LIBS=ON if you insist). You probably want to target JavaScript or Python.

Installing the Python module and CLI

The SoundSwallower command-line is a Python module (soundswallower.cli) and can be installed using pip. It is highly recommended to do this in a virtualenv. You can simply install it from PyPI:

pip install soundswallower

Or compile from source:

pip install .

For development, you can install it in-place, but please make sure to remove any existing global installation:

pip uninstall soundswallower
pip install -e .

The command-line supports JSGF grammars and word-level force alignment for one or more input files, for example:

soundswallower --align tests/data/goforward.txt tests/data/goforward.wav
soundswallower --align-text "go forward ten meters" tests/data/goforward.wav
soundswallower --grammar tests/data/goforward.gram tests/data/goforward.wav

Note that multiple input files are not particularly useful for --align or --align-text as they will simply (try to) align the same text to each file. The output results (a list of time-aligned words) can be written to a JSON file with --output. To obtain phoneme-level alignments, add the --phone-align flag. The JSON format (which has recently changed) is the same as used in PocketSphinx 5.0 and is more compact than it is readable, but briefly, it consists of one dictionary (or "object" in JavaScript-ese) per line, where the t attribute is the recognized text and the w attribute contains a list of word segmentations, with start time in b and duration in d and, optionally, a list of phone segmentations in the w attribute with the same format.

See also the full documentation of the Python API.

Compiling to JavaScript/WebAssembly

To use the JavaScript library in your projects:

npm install soundswallower

To build the JavaScript library, use CMake with Emscripten:

emcmake cmake -S . -B jsbuild
cmake --build jsbuild

This will create soundswallower.js and soundswallower.wasm in the jsbuild directory, which you can then include in your projects. You can also use npm link to link it to your node_modules folder for development Demo applications can be seen at https://github.com/dhdaines/alignment-demo and https://github.com/dhdaines/soundswallower-demo.

To run the JavaScript tests:

cd jsbuild
npm install
npm test
npx tsc
node test_typescript.js

And in the browser:

cd jsbuild
python server.py
# Navigate to http://localhost:8000/test_web.html

For more details on the JavaScript implementation and API, see js/README.js.

See also the documentation of the JavaScript API.

Creating binary distributions for Python

To build the Python extension, I suggest using build, as it will ensure that everything is done in a totally clean environment. Run this from the top-level directory

python -m build

In all cases the resulting binary wheel (found in dist) is self-contained and should not need any other components aside from the system libraries. To create wheels that are compatible with multiple Linux distributions, see the instructions in README.manylinux.md.

Compiling on Windows in Visual Studio Code

The method for building distributions noted above will also work on Windows, from within a Conda environment, provided you have Visual Studio or the Visual Studio Build Tools installed. This is somewhat magic.

If you don't have Conda, then what you will need to do is:

  • Install Visual Studio build tools. Unfortunately, a direct link does not seem to exist, but you can find them under Microsoft's downloads page. The 2019 version is probably the optimal one to use as it is compatible with all recent versions of Windows.

  • Install the version of Python you wish to use.

  • Launch the Visual Studio command-line prompt from the Start menu. Note that if your Python is 64-bit (recommended), you must be sure to launch the "x64 Native Command Line Prompt".

  • Create and activate a virtual environment using your Python binary, which may or may not be in your AppData directory:

      %USERPROFILE%\AppData\Local\Programs\Python\Python310\python -m venv py310
      py310\scripts\activate
    
  • now you can build wheels with pip, using the same method mentioned above.

Authors

SoundSwallower is based on PocketSphinx, which is based on Sphinx-II, which is based on Sphinx, which is based on Harpy, and so on, and so on, back to somewhere around the Unix Epoch. Thanks to Kevin Lenzo for releasing CMU Sphinx under a BSD license and making this possible, and Ravishankar Mosur who actually wrote most of the decoder. Many others also contributed along the way, take a look at the AUTHORS file in PocketSphinx for an idea.

This document and SoundSwallower are now being developed by David Huggins-Daines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soundswallower-0.6.4.tar.gz (10.4 MB view details)

Uploaded Source

Built Distributions

soundswallower-0.6.4-cp312-cp312-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.12 Windows x86-64

soundswallower-0.6.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

soundswallower-0.6.4-cp312-cp312-macosx_10_9_universal2.whl (9.9 MB view details)

Uploaded CPython 3.12 macOS 10.9+ universal2 (ARM64, x86-64)

soundswallower-0.6.4-cp311-cp311-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.11 Windows x86-64

soundswallower-0.6.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

soundswallower-0.6.4-cp311-cp311-macosx_10_9_universal2.whl (9.9 MB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

soundswallower-0.6.4-cp310-cp310-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.10 Windows x86-64

soundswallower-0.6.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

soundswallower-0.6.4-cp310-cp310-macosx_10_9_universal2.whl (9.9 MB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

soundswallower-0.6.4-cp38-cp38-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.8 Windows x86-64

soundswallower-0.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

soundswallower-0.6.4-cp38-cp38-macosx_10_9_universal2.whl (9.9 MB view details)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file soundswallower-0.6.4.tar.gz.

File metadata

  • Download URL: soundswallower-0.6.4.tar.gz
  • Upload date:
  • Size: 10.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for soundswallower-0.6.4.tar.gz
Algorithm Hash digest
SHA256 cdbe243623ddc762d60902f10fc5ea4979a6570fd45cf6c0ea9a85415d561eae
MD5 95b86c0ebdd38c79e960ed0151b9d52b
BLAKE2b-256 d6984c089fa957c9df7f016a4c7b8a6e20fd8ac6875aa48ba3330bc1234b47e9

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 93f07682c01ff7c13e02349f17a03ce24cb345fbd6193b52e08554eae3f6aae1
MD5 42c47aa0cdfb623d5e0a28218e76aee0
BLAKE2b-256 f6a22b45831bf5b928b544fd15ffb273ae1436964b0a4f604b0f5a094c31428e

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8797d1a9262fb9c9a291a50bf93b13f93c971b42447882ddf6cdadc095909fe6
MD5 ce010bce8c2ddedac4d602bd12013161
BLAKE2b-256 65f65a8e27e2a1921c9fd7b7cc12fa07f214d47748d20ccbb251f6efc0745e7f

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp312-cp312-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp312-cp312-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 e547b14286885e477319a04f15571ef80c4b8e875ffaa7b231d52aeed72d210e
MD5 52910317259517121bf4f779913475b4
BLAKE2b-256 4540cb1ba1da584a73b28c34b0e44c1a45d4f8c04c39530a0ab94f0f16007f83

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 530adfa8b287e8fd0158738b0a9d3155d4b053566e880334e7884ad7693dd1ea
MD5 962f1f40c6fb17f21030a5e1055bc959
BLAKE2b-256 7f8800beea36c9c4afeb26d5fdbf8a04f62f53364c04d36e8cee9bee41537c2e

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aafee97efd81fcf495456a00b83a5a96743875e334cfa0fc8401d7dd9c935c6c
MD5 b5db54f3776a9ae6b686a00765ce503c
BLAKE2b-256 1511380c22a0b651eba736344fa528835516e2f9d7fe2dbb3a1a18e8460e0f91

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 82fdc8461d1f95c92ec7be99ade90e0202ad6270001e85be56786c8fbbdf1ddf
MD5 f0ab698f496da65ec39294cc915ecd68
BLAKE2b-256 e006d636d7d11c4fbc5ae80f610b54be92a401dc75c2bf286f366f048016f592

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c274594b2e94cc60a77eb412c3f51975d173b2671a86704c4ebadc6545cfeb6a
MD5 09c5290188b0dbab9af11c8356492d1a
BLAKE2b-256 61304e2aa3272fbc37b8c50fb5a8cf586bb423ea71d47d46622c4c127a28a6e1

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d396708690d8a83a5874c887b95e9cce480c1e4f39fa20c7479c9f3ae4d42fe6
MD5 4480c30ca95a44b60497a33edf60cea5
BLAKE2b-256 bd51daffbf33309c51bfd5b1c71c958f73481aac0fb0237b6623014a7ad29a41

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 fe1e718779b153e8745f2a1012934b0aa9f46b50b00c33cabb37e93cee66af8e
MD5 6bda0b34836e7036898a27164103075b
BLAKE2b-256 f7de7e7b3094d676f739eb8a1c1f17d7e96718d285eb8e5ca277b07508d137a2

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 3c3fa3edd3c57978757f900c59658283964222ea69f2a2a2eee985b68bd11d24
MD5 ef497980ffc87ee1de9cf2be7578763a
BLAKE2b-256 f0bc06a76a1c3fa5e4123dd509671480bf46e0716b58c4b578cb601f94411921

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c98be85553987454b45467f216c1b81aba3a887f0d2375714d4fa44765bfd59b
MD5 83d77544c11a4a763102f83979c94de6
BLAKE2b-256 983dbc53cd94e52a045a63bb94211cd3ad1e4d95d11d5d8628b863812c5c2f34

See more details on using hashes here.

File details

Details for the file soundswallower-0.6.4-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for soundswallower-0.6.4-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 866a2e4527a579301116c71aa18450e9c2b1d8dd475a5900e11945e9eed2da01
MD5 ee8ecb3813c568583f97ca84d4923b85
BLAKE2b-256 e593d2798f110b629708e3f8593b603b88e377021f366453dec5ea5b74273638

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page