Skip to main content

framework for synchronous batch speech-to-text transcription using backends like AWS, Watson, etc.

Project description

py-transcribe

Implementation-agnostic framework for synchronous batch text-to-speech transcription with backend services such as AWS, Watson, etc.

This module itself does NOT include a full implementation or an integration with any transcription service. The intention instead is that you include a specific implementation in your project. For example, for AWS Transcribe, use (py-transcribe-aws)[https://github.com/ICTLearningSciences/py-transcribe-aws]

Python Installation

pip install py-transcribe

Usage

You first need to install some concrete implementation of py-transcribe. If you are using AWS, then you can install transcribe-aws like this:

pip install py-transcribe-aws

...once the implementation is installed, you can configure that one of two ways:

Setting the implementation module path

Set ENV var TRANSCRIBE_MODULE_PATH, e.g.

export TRANSCRIBE_MODULE_PATH=transcribe_aws

or pass the module path at service-creation time, e.g.

from transcribe import init_transcription_service


service = init_transcription_service(
    module_path="transcribe_aws"
)

Basic usage

Once you're set up, basic usage looks like this:

from transcribe import (
    init_transcription_service
    TranscribeJobRequest,
    TranscribeJobStatus
)


service = init_transcription_service()
result = service.transcribe([
    TranscribeJobRequest(
        sourceFile="/some/path/j1.wav"
    ),
    TranscribeJobRequest(
        sourceFile="/some/other/path/j2.wav"
    )
])
for j in result.jobs():
    if j.status == TranscribeJoStatus.SUCCEEDED:
        print(j.transcript)
    else:
        print(j.error)

Handling updates on large/long-running batch jobs

The main transcribe method is synchronous to hide the async/polling-based complexity of most transcribe services. But for any non-trivial batch of transcriptions, you probably do want to receive periodic updates, for example to save any completed transcriptions. You can do that by passing an on_update callback as follows:

from transcribe import (
    init_transcription_service
    TranscribeJobRequest,
    TranscribeJobStatus,
    TranscribeJobsUpdate
)


service = init_transcription_service()


def _on_update(u: TranscribeJobsUpdate) -> None:
    for j in u.jobs():
        if j.status == TranscribeJoStatus.SUCCEEDED:
            print(f"save result: {j.transcript}")
        else:
            print(j.error)

result = service.transcribe(
    [
        TranscribeJobRequest(
            sourceFile="/some/path/j1.wav"
        ),
        TranscribeJobRequest(
            sourceFile="/some/other/path/j2.wav"
        )
    ],
    on_update=_on_update
)

Configuring the environment for your implementation

Most implementations will also require other configuration, which you can either set in your environment or pass to init_transcription_service as config={}. See your implementation docs for details.

Development

Run tests during development with

make test-all

Once ready to release, create a release tag, currently using semver-ish numbering, e.g. 1.0.0(-alpha.1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-transcribe-1.5.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_transcribe-1.5.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file py-transcribe-1.5.0.tar.gz.

File metadata

  • Download URL: py-transcribe-1.5.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for py-transcribe-1.5.0.tar.gz
Algorithm Hash digest
SHA256 ca270ab4621bc4f7076a14bb757939fa434ac839bfdad9695213288fb60f5807
MD5 c242cf427f7981303b73ad7d0e57bfc5
BLAKE2b-256 775982421a66ffda07460fa4cba22792221d7d861fc0badde7ec1694092c1e4b

See more details on using hashes here.

File details

Details for the file py_transcribe-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: py_transcribe-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for py_transcribe-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb0e83b6f03c3adfdea8dd3156897c0d5c0b6e16f1618b00d6980f9dad7853d7
MD5 fdaef1a3f514238fdbb957dc07805e64
BLAKE2b-256 527f71d2d690c79c06cb27ee64dd306a3d050f4755bac8ee62e0cec89e017261

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page