framework for synchronous batch speech-to-text transcription using backends like AWS, Watson, etc.
Project description
py-transcribe
Implementation-agnostic framework for synchronous batch text-to-speech transcription with backend services such as AWS, Watson, etc.
This module itself does NOT include a full implementation or an integration with any transcription service. The intention instead is that you include a specific implementation in your project. For example, for AWS Transcribe, use (py-transcribe-aws)[https://github.com/ICTLearningSciences/py-transcribe-aws]
Python Installation
pip install py-transcribe
Usage
You first need to install some concrete implementation of py-transcribe. If you are using AWS, then you can install transcribe-aws
like this:
pip install py-transcribe-aws
...once the implementation is installed, you can configure that one of two ways:
Setting the implementation module path
Set ENV var TRANSCRIBE_MODULE_PATH
, e.g.
export TRANSCRIBE_MODULE_PATH=transcribe_aws
or pass the module path at service-creation time, e.g.
from transcribe import init_transcription_service
service = init_transcription_service(
module_path="transcribe_aws"
)
Basic usage
Once you're set up, basic usage looks like this:
from transcribe import (
init_transcription_service
TranscribeJobRequest,
TranscribeJobStatus
)
service = init_transcription_service()
result = service.transcribe([
TranscribeJobRequest(
sourceFile="/some/path/j1.wav"
),
TranscribeJobRequest(
sourceFile="/some/other/path/j2.wav"
)
])
for j in result.jobs():
if j.status == TranscribeJoStatus.SUCCEEDED:
print(j.transcript)
else:
print(j.error)
Handling updates on large/long-running batch jobs
The main transcribe method is synchronous to hide the async/polling-based complexity of most transcribe services. But for any non-trivial batch of transcriptions, you probably do want to receive periodic updates, for example to save any completed transcriptions. You can do that by passing an on_update
callback as follows:
from transcribe import (
init_transcription_service
TranscribeJobRequest,
TranscribeJobStatus,
TranscribeJobsUpdate
)
service = init_transcription_service()
def _on_update(u: TranscribeJobsUpdate) -> None:
for j in u.jobs():
if j.status == TranscribeJoStatus.SUCCEEDED:
print(f"save result: {j.transcript}")
else:
print(j.error)
result = service.transcribe(
[
TranscribeJobRequest(
sourceFile="/some/path/j1.wav"
),
TranscribeJobRequest(
sourceFile="/some/other/path/j2.wav"
)
],
on_update=_on_update
)
Configuring the environment for your implementation
Most implementations will also require other configuration, which you can either set in your environment or pass to init_transcription_service
as config={}
. See your implementation docs for details.
Development
Run tests during development with
make test-all
Once ready to release, create a release tag, currently using semver-ish numbering, e.g. 1.0.0(-alpha.1)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for py_transcribe-1.5.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb0e83b6f03c3adfdea8dd3156897c0d5c0b6e16f1618b00d6980f9dad7853d7 |
|
MD5 | fdaef1a3f514238fdbb957dc07805e64 |
|
BLAKE2b-256 | 527f71d2d690c79c06cb27ee64dd306a3d050f4755bac8ee62e0cec89e017261 |