framework for synchronous batch speech-to-text transcription using backends like AWS, Watson, etc.
Project description
py-transcribe
Implementation-agnostic framework for synchronous batch text-to-speech transcription with backend services such as AWS, Watson, etc.
This module itself does NOT include a full implementation or an integration with any transcription service. The intention instead is that you include a specific implementation in your project. For example, for AWS Transcribe, use (py-transcribe-aws)[https://github.com/ICTLearningSciences/py-transcribe-aws]
Python Installation
pip install py-transcribe
Usage
You first need to install some concrete implementation of py-transcribe. If you are using AWS, then you can install transcribe-aws like this:
pip install py-transcribe-aws
...once the implementation is installed, you can configure that one of two ways:
Setting the implementation module path
Set ENV var TRANSCRIBE_MODULE_PATH, e.g.
export TRANSCRIBE_MODULE_PATH=transcribe_aws
or pass the module path at service-creation time, e.g.
from transcribe import init_transcription_service
service = init_transcription_service(
module_path="transcribe_aws"
)
Basic usage
Once you're set up, basic usage looks like this:
from transcribe import (
init_transcription_service
TranscribeJobRequest,
TranscribeJobStatus
)
service = init_transcription_service()
result = service.transcribe([
TranscribeJobRequest(
sourceFile="/some/path/j1.wav"
),
TranscribeJobRequest(
sourceFile="/some/other/path/j2.wav"
)
])
for j in result.jobs():
if j.status == TranscribeJoStatus.SUCCEEDED:
print(j.transcript)
else:
print(j.error)
Handling updates on large/long-running batch jobs
The main transcribe method is synchronous to hide the async/polling-based complexity of most transcribe services. But for any non-trivial batch of transcriptions, you probably do want to receive periodic updates, for example to save any completed transcriptions. You can do that by passing an on_update callback as follows:
from transcribe import (
init_transcription_service
TranscribeJobRequest,
TranscribeJobStatus,
TranscribeJobsUpdate
)
service = init_transcription_service()
def _on_update(u: TranscribeJobsUpdate) -> None:
for j in u.jobs():
if j.status == TranscribeJoStatus.SUCCEEDED:
print(f"save result: {j.transcript}")
else:
print(j.error)
result = service.transcribe(
[
TranscribeJobRequest(
sourceFile="/some/path/j1.wav"
),
TranscribeJobRequest(
sourceFile="/some/other/path/j2.wav"
)
],
on_update=_on_update
)
Configuring the environment for your implementation
Most implementations will also require other configuration, which you can either set in your environment or pass to init_transcription_service as config={}. See your implementation docs for details.
Development
Run tests during development with
make test-all
Once ready to release, create a release tag, currently using semver-ish numbering, e.g. 1.0.0(-alpha.1)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py-transcribe-1.5.0.tar.gz.
File metadata
- Download URL: py-transcribe-1.5.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca270ab4621bc4f7076a14bb757939fa434ac839bfdad9695213288fb60f5807
|
|
| MD5 |
c242cf427f7981303b73ad7d0e57bfc5
|
|
| BLAKE2b-256 |
775982421a66ffda07460fa4cba22792221d7d861fc0badde7ec1694092c1e4b
|
File details
Details for the file py_transcribe-1.5.0-py3-none-any.whl.
File metadata
- Download URL: py_transcribe-1.5.0-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb0e83b6f03c3adfdea8dd3156897c0d5c0b6e16f1618b00d6980f9dad7853d7
|
|
| MD5 |
fdaef1a3f514238fdbb957dc07805e64
|
|
| BLAKE2b-256 |
527f71d2d690c79c06cb27ee64dd306a3d050f4755bac8ee62e0cec89e017261
|