Leopard Speech-to-Text Engine.
Project description
Leopard Binding for Python
Leopard Speech-to-Text Engine
Made in Vancouver, Canada by Picovoice
Leopard is an on-device speech-to-text engine. Leopard is:
- Private; All voice processing runs locally.
- Accurate
- Compact and Computationally-Efficient
- Cross-Platform:
- Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64)
- Android and iOS
- Chrome, Safari, Firefox, and Edge
- Raspberry Pi (3, 4, 5)
Compatibility
- Python 3.8+
- Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64), and Raspberry Pi (3, 4, 5).
Installation
pip3 install pvleopard
AccessKey
Leopard requires a valid Picovoice AccessKey
at initialization. AccessKey
acts as your credentials when using Leopard SDKs.
You can get your AccessKey
for free. Make sure to keep your AccessKey
secret.
Signup or Login to Picovoice Console to get your AccessKey
.
Usage
Create an instance of the engine and transcribe an audio file:
import pvleopard
leopard = pvleopard.create(access_key='${ACCESS_KEY}')
transcript, words = leopard.process_file('${AUDIO_FILE_PATH}')
print(transcript)
for word in words:
print(
"{word=\"%s\" start_sec=%.2f end_sec=%.2f confidence=%.2f speaker_tag=%d}"
% (word.word, word.start_sec, word.end_sec, word.confidence, word.speaker_tag))
Replace ${ACCESS_KEY}
with yours obtained from Picovoice Console and
${AUDIO_FILE_PATH}
to the path an audio file.
Finally, when done be sure to explicitly release the resources:
leopard.delete()
Language Model
The Leopard Python SDK comes preloaded with a default English language model (.pv
file).
Default models for other supported languages can be found in lib/common.
Create custom language models using the Picovoice Console. Here you can train language models with custom vocabulary and boost words in the existing vocabulary.
Pass in the .pv
file via the model_path
argument:
leopard = pvleopard.create(
access_key='${ACCESS_KEY}',
model_path='${MODEL_FILE_PATH}')
Word Metadata
Along with the transcript, Leopard returns metadata for each transcribed word. Available metadata items are:
- Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
- End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
- Confidence: Leopard's confidence that the transcribed word is accurate. It is a number within
[0, 1]
. - Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with
0
reserved for unknown speakers. If speaker diarization is not enabled, the value will always be-1
.
Demos
pvleoparddemo provides command-line utilities for processing audio using Leopard.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pvleopard-2.0.3.tar.gz
.
File metadata
- Download URL: pvleopard-2.0.3.tar.gz
- Upload date:
- Size: 41.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7c8a459d020986f8f61f55aac227dc141e1af452ea13826dfb201cc411802b3 |
|
MD5 | d2447869daf41be2ce72e180f82e69e7 |
|
BLAKE2b-256 | 8476d0e5d15a9686686a1fb173137e41a5ed738fb774fd70a6d235ff8b727f2a |
File details
Details for the file pvleopard-2.0.3-py3-none-any.whl
.
File metadata
- Download URL: pvleopard-2.0.3-py3-none-any.whl
- Upload date:
- Size: 41.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7934e7c3246c54b8a81cb7acd492968c7720452bc534e380d72ddaedfde011c3 |
|
MD5 | b92ace0745c02507941e6ebfd7af5943 |
|
BLAKE2b-256 | 0f37add945d3154e9dad1125a9bdb502e10e3cd686ec18d765065ee30a81679f |