Skip to main content

Forked version of [pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python) which adds utility for installing models, and the StreamSpeech interface.

Project description

Pocketsphinx Python

Pocketsphinx is a part of the CMU Sphinx Open Source Toolkit For Speech Recognition.

This package provides a python interface to CMU Sphinxbase and Pocketsphinx libraries created with SWIG and Setuptools.

Supported platforms

  • Windows (untested)
  • Linux
  • Mac OS X (untested)

Install requirements

Windows requirements:

Ubuntu requirements:

sudo apt-get install -qq python python-dev python-pip build-essential swig git libpulse-dev libasound2-dev

Mac OS X requirements:

brew reinstall swig python

Installation

# Make sure we have up-to-date versions of pip, setuptools and wheel
python -m pip install --upgrade pip setuptools wheel
pip install --upgrade pocketsphinx

More binary distributions for manual installation are available here.

Installing Models

Pocketsphinx models in .tar.gz format can be installed using this packages as well.

from pocketsphinx import PocketsphinxModel, AudioFile

models = PocketsphinxModel(model_path='/some/installation/path')
# this will install the model from the give url under name 'de'
models.install_model('https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/cmusphinx-de-voxforge-5.2.tar.gz', 'de')

de = models.get_model('de')
# this returns a dictionary with the locations of hmm, lm and dict of the model

# we can now use the 'de' model directly with any pocketsphinx object
for phrase in LiveSpeech(model=de): print(phrase)

The default model_path is '~/pocketsphinx_models'.

Usage

LiveSpeech

It's an iterator class for continuous recognition or keyword search from a microphone.

from pocketsphinx import LiveSpeech
for phrase in LiveSpeech(): print(phrase)

An example of a keyword search:

from pocketsphinx import LiveSpeech

speech = LiveSpeech(lm=False, keyphrase='forward', kws_threshold=1e-20)
for phrase in speech:
    print(phrase.segments(detailed=True))

With your model and dictionary:

import os
from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()

speech = LiveSpeech(
    verbose=False,
    sampling_rate=16000,
    buffer_size=2048,
    no_search=False,
    full_utt=False,
    hmm=os.path.join(model_path, 'en-us'),
    lm=os.path.join(model_path, 'en-us.lm.bin'),
    dic=os.path.join(model_path, 'cmudict-en-us.dict')
)

for phrase in speech:
    print(phrase)

StreamSpeech

This can be used to send chunks of raw bytes to the iterator, usually when transferring audio over a socket or similar.

from pocketsphinx import StreamSpeech

f = open('somefile.wav', 'rb')

def callback():
    return f.read(2048)

for phrase in StreamSpeech(callback=callback): print(phrase)

For an example of keyword search and custom models, see LiveSpeech.

AudioFile

It's an iterator class for continuous recognition or keyword search from a file.

from pocketsphinx import AudioFile
for phrase in AudioFile(): print(phrase) # => "go forward ten meters"

An example of a keyword search:

from pocketsphinx import AudioFile

audio = AudioFile(lm=False, keyphrase='forward', kws_threshold=1e-20)
for phrase in audio:
    print(phrase.segments(detailed=True)) # => "[('forward', -617, 63, 121)]"

With your model and dictionary:

import os
from pocketsphinx import AudioFile, get_model_path, get_data_path

model_path = get_model_path()
data_path = get_data_path()

config = {
    'verbose': False,
    'audio_file': os.path.join(data_path, 'goforward.raw'),
    'buffer_size': 2048,
    'no_search': False,
    'full_utt': False,
    'hmm': os.path.join(model_path, 'en-us'),
    'lm': os.path.join(model_path, 'en-us.lm.bin'),
    'dict': os.path.join(model_path, 'cmudict-en-us.dict')
}

audio = AudioFile(**config)
for phrase in audio:
    print(phrase)

Convert frame into time coordinates:

from pocketsphinx import AudioFile

# Frames per Second
fps = 100

for phrase in AudioFile(frate=fps):  # frate (default=100)
    print('-' * 28)
    print('| %5s |  %3s  |   %4s   |' % ('start', 'end', 'word'))
    print('-' * 28)
    for s in phrase.seg():
        print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
    print('-' * 28)

# ----------------------------
# | start |  end  |   word   |
# ----------------------------
# |  0.0s | 0.24s | <s>      |
# | 0.25s | 0.45s | <sil>    |
# | 0.46s | 0.63s | go       |
# | 0.64s | 1.16s | forward  |
# | 1.17s | 1.52s | ten      |
# | 1.53s | 2.11s | meters   |
# | 2.12s |  2.6s | </s>     |
# ----------------------------

Pocketsphinx

It's a simple and flexible proxy class to pocketsphinx.Decode.

from pocketsphinx import Pocketsphinx
print(Pocketsphinx().decode()) # => "go forward ten meters"

A more comprehensive example:

from __future__ import print_function
import os
from pocketsphinx import Pocketsphinx, get_model_path, get_data_path

model_path = get_model_path()
data_path = get_data_path()

config = {
    'hmm': os.path.join(model_path, 'en-us'),
    'lm': os.path.join(model_path, 'en-us.lm.bin'),
    'dict': os.path.join(model_path, 'cmudict-en-us.dict')
}

ps = Pocketsphinx(**config)
ps.decode(
    audio_file=os.path.join(data_path, 'goforward.raw'),
    buffer_size=2048,
    no_search=False,
    full_utt=False
)

print(ps.segments()) # => ['<s>', '<sil>', 'go', 'forward', 'ten', 'meters', '</s>']
print('Detailed segments:', *ps.segments(detailed=True), sep='\n') # => [
#     word, prob, start_frame, end_frame
#     ('<s>', 0, 0, 24)
#     ('<sil>', -3778, 25, 45)
#     ('go', -27, 46, 63)
#     ('forward', -38, 64, 116)
#     ('ten', -14105, 117, 152)
#     ('meters', -2152, 153, 211)
#     ('</s>', 0, 212, 260)
# ]

print(ps.hypothesis())  # => go forward ten meters
print(ps.probability()) # => -32079
print(ps.score())       # => -7066
print(ps.confidence())  # => 0.04042641466841839

print(*ps.best(count=10), sep='\n') # => [
#     ('go forward ten meters', -28034)
#     ('go for word ten meters', -28570)
#     ('go forward and majors', -28670)
#     ('go forward and meters', -28681)
#     ('go forward and readers', -28685)
#     ('go forward ten readers', -28688)
#     ('go forward ten leaders', -28695)
#     ('go forward can meters', -28695)
#     ('go forward and leaders', -28706)
#     ('go for work ten meters', -28722)
# ]

Default config

If you don't pass any argument while creating an instance of the Pocketsphinx, AudioFile or LiveSpeech class, it will use next default values:

verbose = False
logfn = /dev/null or nul
audio_file = site-packages/pocketsphinx/data/goforward.raw
audio_device = None
sampling_rate = 16000
buffer_size = 2048
no_search = False
full_utt = False
hmm = site-packages/pocketsphinx/model/en-us
lm = site-packages/pocketsphinx/model/en-us.lm.bin
dict = site-packages/pocketsphinx/model/cmudict-en-us.dict

Any other option must be passed into the config as is, without using symbol -.

If you want to disable default language model or dictionary, you can change the value of the corresponding options to False:

lm = False
dict = False

Verbose

Send output to stdout:

from pocketsphinx import Pocketsphinx

ps = Pocketsphinx(verbose=True)
ps.decode()

print(ps.hypothesis())

Send output to file:

from pocketsphinx import Pocketsphinx

ps = Pocketsphinx(verbose=True, logfn='pocketsphinx.log')
ps.decode()

print(ps.hypothesis())

Compatibility

Parent classes are still available:

import os
from pocketsphinx import DefaultConfig, Decoder, get_model_path, get_data_path

model_path = get_model_path()
data_path = get_data_path()

# Create a decoder with a certain model
config = DefaultConfig()
config.set_string('-hmm', os.path.join(model_path, 'en-us'))
config.set_string('-lm', os.path.join(model_path, 'en-us.lm.bin'))
config.set_string('-dict', os.path.join(model_path, 'cmudict-en-us.dict'))
decoder = Decoder(config)

# Decode streaming data
buf = bytearray(1024)
with open(os.path.join(data_path, 'goforward.raw'), 'rb') as f:
    decoder.start_utt()
    while f.readinto(buf):
        decoder.process_raw(buf, False, False)
    decoder.end_utt()
print('Best hypothesis segments:', [seg.word for seg in decoder.seg()])

Projects using pocketsphinx-python

  • SpeechRecognition - Library for performing speech recognition, with support for several engines and APIs, online and offline.

License

The BSD License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pocketsphinx-fork-1.0.0.tar.gz (29.1 MB view details)

Uploaded Source

File details

Details for the file pocketsphinx-fork-1.0.0.tar.gz.

File metadata

  • Download URL: pocketsphinx-fork-1.0.0.tar.gz
  • Upload date:
  • Size: 29.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for pocketsphinx-fork-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d1f29c38a2178e188c6cc197e54710378fc60b05025696ed0ab4492fa5bcf6ff
MD5 54bd146d695f25d3bd8f2c563226b505
BLAKE2b-256 adfaaa4098478dc488b935bcba0a87ca69ded86ec01f800915ae0f34c6bd8418

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page