framework for synchronous batch speech-to-text transcription using backends like AWS, Watson, etc.
Project description
py-transcribe
Implementation-agnostic framework for synchronous batch text-to-speech transcription with backend services such as AWS, Watson, etc.
This module itself does NOT include a full implementation or an integration with any transcription service. The intention instead is that you include a specific implementation in your project. For example, for AWS Transcribe, use (py-transcribe-aws)[https://github.com/ICTLearningSciences/py-transcribe-aws]
Python Installation
pip install py-transcribe
Usage
You first need to install some concrete implementation of py-transcribe. If you are using AWS, then you can install transcribe-aws
like this:
pip install py-transcribe-aws
...once the implementation is installed, you can configure that one of two ways:
Setting the implementation module path
Set ENV var TRANSCRIBE_MODULE_PATH
, e.g.
export TRANSCRIBE_MODULE_PATH=transcribe_aws
or pass the module path at service-creation time, e.g.
from transcribe import init_transcription_service
service = init_transcription_service(
module_path="transcribe_aws"
)
Basic usage
Once you're set up, basic usage looks like this:
from transcribe import (
init_transcription_service
TranscribeJobRequest,
TranscribeJobStatus
)
service = init_transcription_service()
result = service.transcribe([
TranscribeJobRequest(
sourceFile="/some/path/j1.wav"
),
TranscribeJobRequest(
sourceFile="/some/other/path/j2.wav"
)
])
for j in result.jobs():
if j.status == TranscribeJoStatus.SUCCEEDED:
print(j.transcript)
else:
print(j.error)
Handling updates on large/long-running batch jobs
The main transcribe method is synchronous to hide the async/polling-based complexity of most transcribe services. But for any non-trivial batch of transcriptions, you probably do want to receive periodic updates, for example to save any completed transcriptions. You can do that by passing an on_update
callback as follows:
from transcribe import (
init_transcription_service
TranscribeJobRequest,
TranscribeJobStatus,
TranscribeJobsUpdate
)
service = init_transcription_service()
def _on_update(u: TranscribeJobsUpdate) -> None:
for j in u.jobs():
if j.status == TranscribeJoStatus.SUCCEEDED:
print(f"save result: {j.transcript}")
else:
print(j.error)
result = service.transcribe(
[
TranscribeJobRequest(
sourceFile="/some/path/j1.wav"
),
TranscribeJobRequest(
sourceFile="/some/other/path/j2.wav"
)
],
on_update=_on_update
)
Configuring the environment for your implementation
Most implementations will also require other configuration, which you can either set in your environment or pass to init_transcription_service
as config={}
. See your implementation docs for details.
Development
Run tests during development with
make test-all
Once ready to release, create a release tag, currently using semver-ish numbering, e.g. 1.0.0(-alpha.1)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for py_transcribe-1.4.0a1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87a367933d34a9090e842295e42fa5f4e6fa4a4f7c75dd6dfdd784837d718f04 |
|
MD5 | f59225409a1a684bc37faf5a45929ce9 |
|
BLAKE2b-256 | d10d557fd684cb5ced52845ac4ddfaa0fd86c995c2ab19863e5f37c210243128 |