Skip to main content

framework for synchronous batch speech-to-text transcription using backends like AWS, Watson, etc.

Project description

py-transcribe-aws

Simple library for running batch transcribe jobs in AWS. Implemented on the py-transcribe framework to make your code transcribe-platform agnostic and easy to test.

Python Installation

pip install py_transcribe_aws

Usage

Setting the implementation module path

Set ENV var TRANSCRIBE_MODULE_PATH, e.g.

export TRANSCRIBE_MODULE_PATH=transcribe_aws

or pass the module path at service-creation time, e.g.

from transcribe import init_transcription_service


service = init_transcription_service(
    module_path="transcribe_aws"
)

Basic usage

Your code generally should not need to access any of the implementations in this module directly. See py-transcribe for docs on usage of the framework.

ENV/config vars

The following config vars can be set in ENV or passed in code, e.g. init_transcription_service(config={}). Most env vars have two accepted versions and the version with a TRANSCRIBE_ prefix has higher precedence.

TRANSCRIBE_AWS_REGION|AWS_REGION

(required)

The region hosting the S3 bucket to which source audio (or video) files will be uploaded for transcription

TRANSCRIBE_AWS_ACCESS_KEY_ID|AWS_ACCESS_KEY_ID

(required)

TRANSCRIBE_AWS_SECRET_ACCESS_KEY|AWS_SECRET_ACCESS_KEY

(required)

TRANSCRIBE_AWS_S3_BUCKET_SOURCE

(required)

Bucket where source will be uploaded and then passed to AWS Transcribe

AWS Configuration

Using Terraform

This repo includes a terraform module for setting up all the necessary infrastructure to run transcribe.

You can include the terraform module, like this:

module "transcribe_aws" {
    source                  = "git::https://github.com/ICTLearningSciences/py-transcribe-aws.git?ref=tags/{CHANGE_TO_LATEST_VERSION}"
    transcribe_namespace    = "YOUR_NAMESPACE"
}

...and then the module exposes all the (sensitive) env vars for running transcribe in an output map, which you can use like

resource "some_server_type" {
    # set TRANSCRIBE_AWS_ACCESS_KEY_ID, TRANSCRIBE_AWS_SECRET_ACCESS_KEY, etc. in some server-resource env
    env = module.transcribe_aws.transcribe_env_vars  
}

If You're Setting up Permissions Manually...

If you setting up AWS infrastructure manually (as opposed to using the terraform aboice), the AWS IAM used must have permissions to read/write/delete from the configured source bucket and also use AWS Transcribe

A minimal(ish) policy to allow the above might look like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:*Object"],
            "Resource": "arn:aws:s3:::${YOUR_S3_BUCKET_NAME}/*"
        },
        {
            "Effect": "Allow",
            "Action": ["transcribe:*"],
            "Resource": "*"
        }
    ]
}

Development

Run tests during development with

make test-all

Once ready to release, create a release tag, currently using semver-ish numbering, e.g. 1.0.0(-alpha.1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_transcribe_aws-1.5.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_transcribe_aws-1.5.0-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file py_transcribe_aws-1.5.0.tar.gz.

File metadata

  • Download URL: py_transcribe_aws-1.5.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for py_transcribe_aws-1.5.0.tar.gz
Algorithm Hash digest
SHA256 eb43cac708bd1374653a576d3ce53106c3bc86bcd4d3e9da8cf51579908615c2
MD5 75fe28e169d81a527dbaf44012b0a735
BLAKE2b-256 fddfbe621e3e38f9bc0d8634180707bb2de2a92e2a30c0fb12d2ad47fbea0da1

See more details on using hashes here.

File details

Details for the file py_transcribe_aws-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: py_transcribe_aws-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for py_transcribe_aws-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3cd0bd87c653184c75257a9479bea7a416357730d80fe4968bd5527f2502e2ae
MD5 64f1324e04fb35336cce2f5c105a317f
BLAKE2b-256 7c26ea6e3d15a27c7a5f3cdb312be1fab3ae061ed3f769127493bc6a1ed56c7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page