Skip to main content

Octopus Speech-to-Index engine.

Project description

Octopus

Made in Vancouver, Canada by Picovoice

Octopus is Picovoice's Speech-to-Index engine. It directly indexes speech without relying on a text representation. This acoustic-only approach boosts accuracy by removing out-of-vocabulary limitation and eliminating the problem of competing hypothesis (e.g. homophones)

Compatibility

  • Python 3.5+
  • Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64)

Installation

pip3 install pvoctopus

AccessKey

Octopus requires a valid Picovoice AccessKey at initialization. AccessKey acts as your credentials when using Octopus SDKs. You can get your AccessKey for free. Make sure to keep your AccessKey secret. Signup or Login to Picovoice Console to get your AccessKey.

Usage

Create an instance of the engine:

import pvoctopus

access_key = ""  # AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)
handle = pvoctopus.create(access_key=access_key)

Octopus consists of two steps: Indexing and Searching. Indexing transforms audio data into a Metadata object that searches can be run against.

Octopus indexing has two modes of operation: indexing PCM audio data, or indexing an audio file.

When indexing PCM audio data, the valid audio sample rate is given by handle.sample_rate. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio:

audio_data = [...]
metadata = handle.index(audio_data)

Similarly, files can be indexed by passing in the absolute file path to the audio object. Supported file formats are mp3, flac, wav and opus:

audio_file_path = "/path/to/my/audiofile.wav"
metadata = handle.index_file(audio_file_path)

Once the Metadata object has been created, it can be used for searching:

search_term = 'picovoice'
matches = octopus.search(metadata, [search_term])

Multiple search terms can be given:

matches = octopus.search(metadata, ['picovoice', 'Octopus', 'rhino'])

The matches object is a dictionary where the key is the phrase, and the value is a list of Match objects. The Match object contains the start_sec, end_sec and probablity of each match:

matches = octopus.search(metadata, ['avocado'])

avocado_matches = matches['avocado']
for match in avocado_matches:
    print(f"Match for `avocado`: {match.start_sec} -> {match.end_sec} ({match.probablity})")

The Metadata object can be cached or stored to skip the indexing step on subsequent searches. This can be done with the to_bytes() and from_bytes() methods:

metadata_bytes = metadata.to_bytes()

# ... Write & load `metadata_bytes` from cache/filesystem/etc.

cached_metadata = pvoctopus.OctopusMetadata.from_bytes(metadata_bytes)
matches = octopus.search(cached_metadata, ['avocado'])

When done the handle resources have to be released explicitly:

handle.delete()

Non-English Models

In order to search non-English phrases you need to use the corresponding model file. The model files for all supported languages are available here.

Demos

pvoctopusdemo provides command-line utilities for searching audio files using Octopus.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pvoctopus-1.2.1.tar.gz (8.1 MB view hashes)

Uploaded source

Built Distribution

pvoctopus-1.2.1-py3-none-any.whl (8.1 MB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page