Forked version of [pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python) which adds utility for installing models, and the StreamSpeech interface.
Project description
Pocketsphinx Python
Pocketsphinx is a part of the CMU Sphinx Open Source Toolkit For Speech Recognition.
This package provides a python interface to CMU Sphinxbase and Pocketsphinx libraries created with SWIG and Setuptools.
Supported platforms
- Windows (untested)
- Linux
- Mac OS X (untested)
Install requirements
Windows requirements:
Ubuntu requirements:
sudo apt-get install -qq python python-dev python-pip build-essential swig git libpulse-dev libasound2-dev
Mac OS X requirements:
brew reinstall swig python
Installation
# Make sure we have up-to-date versions of pip, setuptools and wheel
python -m pip install --upgrade pip setuptools wheel
pip install --upgrade pocketsphinx
More binary distributions for manual installation are available here.
Installing Models
Pocketsphinx models in .tar.gz
format can be installed using this packages as well.
from pocketsphinx import PocketsphinxModel, AudioFile
models = PocketsphinxModel(model_path='/some/installation/path')
# this will install the model from the give url under name 'de'
models.install_model('https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/cmusphinx-de-voxforge-5.2.tar.gz', 'de')
de = models.get_model('de')
# this returns a dictionary with the locations of hmm, lm and dict of the model
# we can now use the 'de' model directly with any pocketsphinx object
for phrase in LiveSpeech(model=de): print(phrase)
The default model_path
is '~/pocketsphinx_models'
.
Usage
LiveSpeech
It's an iterator class for continuous recognition or keyword search from a microphone.
from pocketsphinx import LiveSpeech
for phrase in LiveSpeech(): print(phrase)
An example of a keyword search:
from pocketsphinx import LiveSpeech
speech = LiveSpeech(lm=False, keyphrase='forward', kws_threshold=1e-20)
for phrase in speech:
print(phrase.segments(detailed=True))
With your model and dictionary:
import os
from pocketsphinx import LiveSpeech, get_model_path
model_path = get_model_path()
speech = LiveSpeech(
verbose=False,
sampling_rate=16000,
buffer_size=2048,
no_search=False,
full_utt=False,
hmm=os.path.join(model_path, 'en-us'),
lm=os.path.join(model_path, 'en-us.lm.bin'),
dic=os.path.join(model_path, 'cmudict-en-us.dict')
)
for phrase in speech:
print(phrase)
StreamSpeech
This can be used to send chunks of raw bytes to the iterator, usually when transferring audio over a socket or similar.
from pocketsphinx import StreamSpeech
f = open('somefile.wav', 'rb')
def callback():
return f.read(2048)
for phrase in StreamSpeech(callback=callback): print(phrase)
For an example of keyword search and custom models, see LiveSpeech.
AudioFile
It's an iterator class for continuous recognition or keyword search from a file.
from pocketsphinx import AudioFile
for phrase in AudioFile(): print(phrase) # => "go forward ten meters"
An example of a keyword search:
from pocketsphinx import AudioFile
audio = AudioFile(lm=False, keyphrase='forward', kws_threshold=1e-20)
for phrase in audio:
print(phrase.segments(detailed=True)) # => "[('forward', -617, 63, 121)]"
With your model and dictionary:
import os
from pocketsphinx import AudioFile, get_model_path, get_data_path
model_path = get_model_path()
data_path = get_data_path()
config = {
'verbose': False,
'audio_file': os.path.join(data_path, 'goforward.raw'),
'buffer_size': 2048,
'no_search': False,
'full_utt': False,
'hmm': os.path.join(model_path, 'en-us'),
'lm': os.path.join(model_path, 'en-us.lm.bin'),
'dict': os.path.join(model_path, 'cmudict-en-us.dict')
}
audio = AudioFile(**config)
for phrase in audio:
print(phrase)
Convert frame into time coordinates:
from pocketsphinx import AudioFile
# Frames per Second
fps = 100
for phrase in AudioFile(frate=fps): # frate (default=100)
print('-' * 28)
print('| %5s | %3s | %4s |' % ('start', 'end', 'word'))
print('-' * 28)
for s in phrase.seg():
print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
print('-' * 28)
# ----------------------------
# | start | end | word |
# ----------------------------
# | 0.0s | 0.24s | <s> |
# | 0.25s | 0.45s | <sil> |
# | 0.46s | 0.63s | go |
# | 0.64s | 1.16s | forward |
# | 1.17s | 1.52s | ten |
# | 1.53s | 2.11s | meters |
# | 2.12s | 2.6s | </s> |
# ----------------------------
Pocketsphinx
It's a simple and flexible proxy class to pocketsphinx.Decode
.
from pocketsphinx import Pocketsphinx
print(Pocketsphinx().decode()) # => "go forward ten meters"
A more comprehensive example:
from __future__ import print_function
import os
from pocketsphinx import Pocketsphinx, get_model_path, get_data_path
model_path = get_model_path()
data_path = get_data_path()
config = {
'hmm': os.path.join(model_path, 'en-us'),
'lm': os.path.join(model_path, 'en-us.lm.bin'),
'dict': os.path.join(model_path, 'cmudict-en-us.dict')
}
ps = Pocketsphinx(**config)
ps.decode(
audio_file=os.path.join(data_path, 'goforward.raw'),
buffer_size=2048,
no_search=False,
full_utt=False
)
print(ps.segments()) # => ['<s>', '<sil>', 'go', 'forward', 'ten', 'meters', '</s>']
print('Detailed segments:', *ps.segments(detailed=True), sep='\n') # => [
# word, prob, start_frame, end_frame
# ('<s>', 0, 0, 24)
# ('<sil>', -3778, 25, 45)
# ('go', -27, 46, 63)
# ('forward', -38, 64, 116)
# ('ten', -14105, 117, 152)
# ('meters', -2152, 153, 211)
# ('</s>', 0, 212, 260)
# ]
print(ps.hypothesis()) # => go forward ten meters
print(ps.probability()) # => -32079
print(ps.score()) # => -7066
print(ps.confidence()) # => 0.04042641466841839
print(*ps.best(count=10), sep='\n') # => [
# ('go forward ten meters', -28034)
# ('go for word ten meters', -28570)
# ('go forward and majors', -28670)
# ('go forward and meters', -28681)
# ('go forward and readers', -28685)
# ('go forward ten readers', -28688)
# ('go forward ten leaders', -28695)
# ('go forward can meters', -28695)
# ('go forward and leaders', -28706)
# ('go for work ten meters', -28722)
# ]
Default config
If you don't pass any argument while creating an instance of the Pocketsphinx, AudioFile or LiveSpeech class, it will use next default values:
verbose = False
logfn = /dev/null or nul
audio_file = site-packages/pocketsphinx/data/goforward.raw
audio_device = None
sampling_rate = 16000
buffer_size = 2048
no_search = False
full_utt = False
hmm = site-packages/pocketsphinx/model/en-us
lm = site-packages/pocketsphinx/model/en-us.lm.bin
dict = site-packages/pocketsphinx/model/cmudict-en-us.dict
Any other option must be passed into the config as is, without using symbol -
.
If you want to disable default language model or dictionary, you can change the value of the corresponding options to False:
lm = False
dict = False
Verbose
Send output to stdout:
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx(verbose=True)
ps.decode()
print(ps.hypothesis())
Send output to file:
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx(verbose=True, logfn='pocketsphinx.log')
ps.decode()
print(ps.hypothesis())
Compatibility
Parent classes are still available:
import os
from pocketsphinx import DefaultConfig, Decoder, get_model_path, get_data_path
model_path = get_model_path()
data_path = get_data_path()
# Create a decoder with a certain model
config = DefaultConfig()
config.set_string('-hmm', os.path.join(model_path, 'en-us'))
config.set_string('-lm', os.path.join(model_path, 'en-us.lm.bin'))
config.set_string('-dict', os.path.join(model_path, 'cmudict-en-us.dict'))
decoder = Decoder(config)
# Decode streaming data
buf = bytearray(1024)
with open(os.path.join(data_path, 'goforward.raw'), 'rb') as f:
decoder.start_utt()
while f.readinto(buf):
decoder.process_raw(buf, False, False)
decoder.end_utt()
print('Best hypothesis segments:', [seg.word for seg in decoder.seg()])
Projects using pocketsphinx-python
- SpeechRecognition - Library for performing speech recognition, with support for several engines and APIs, online and offline.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pocketsphinx-fork-1.0.0.tar.gz
.
File metadata
- Download URL: pocketsphinx-fork-1.0.0.tar.gz
- Upload date:
- Size: 29.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1f29c38a2178e188c6cc197e54710378fc60b05025696ed0ab4492fa5bcf6ff |
|
MD5 | 54bd146d695f25d3bd8f2c563226b505 |
|
BLAKE2b-256 | adfaaa4098478dc488b935bcba0a87ca69ded86ec01f800915ae0f34c6bd8418 |