framework for synchronous batch speech-to-text transcription using backends like AWS, Watson, etc.
Project description
py-transcribe-aws
Simple library for running batch transcribe jobs in AWS. Implemented on the py-transcribe framework to make your code transcribe-platform agnostic and easy to test.
Python Installation
pip install py_transcribe_aws
Usage
Setting the implementation module path
Set ENV var TRANSCRIBE_MODULE_PATH, e.g.
export TRANSCRIBE_MODULE_PATH=transcribe_aws
or pass the module path at service-creation time, e.g.
from transcribe import init_transcription_service
service = init_transcription_service(
module_path="transcribe_aws"
)
Basic usage
Your code generally should not need to access any of the implementations in this module directly. See py-transcribe for docs on usage of the framework.
ENV/config vars
The following config vars can be set in ENV or passed in code, e.g. init_transcription_service(config={}). Most env vars have two accepted versions and the version with a TRANSCRIBE_ prefix has higher precedence.
TRANSCRIBE_AWS_REGION|AWS_REGION
(required)
The region hosting the S3 bucket to which source audio (or video) files will be uploaded for transcription
TRANSCRIBE_AWS_ACCESS_KEY_ID|AWS_ACCESS_KEY_ID
(required)
TRANSCRIBE_AWS_SECRET_ACCESS_KEY|AWS_SECRET_ACCESS_KEY
(required)
TRANSCRIBE_AWS_S3_BUCKET_SOURCE
(required)
Bucket where source will be uploaded and then passed to AWS Transcribe
AWS Configuration
Using Terraform
This repo includes a terraform module for setting up all the necessary infrastructure to run transcribe.
You can include the terraform module, like this:
module "transcribe_aws" {
source = "git::https://github.com/ICTLearningSciences/py-transcribe-aws.git?ref=tags/{CHANGE_TO_LATEST_VERSION}"
transcribe_namespace = "YOUR_NAMESPACE"
}
...and then the module exposes all the (sensitive) env vars for running transcribe in an output map, which you can use like
resource "some_server_type" {
# set TRANSCRIBE_AWS_ACCESS_KEY_ID, TRANSCRIBE_AWS_SECRET_ACCESS_KEY, etc. in some server-resource env
env = module.transcribe_aws.transcribe_env_vars
}
If You're Setting up Permissions Manually...
If you setting up AWS infrastructure manually (as opposed to using the terraform aboice), the AWS IAM used must have permissions to read/write/delete from the configured source bucket and also use AWS Transcribe
A minimal(ish) policy to allow the above might look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:*Object"],
"Resource": "arn:aws:s3:::${YOUR_S3_BUCKET_NAME}/*"
},
{
"Effect": "Allow",
"Action": ["transcribe:*"],
"Resource": "*"
}
]
}
Development
Run tests during development with
make test-all
Once ready to release, create a release tag, currently using semver-ish numbering, e.g. 1.0.0(-alpha.1)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_transcribe_aws-1.5.0.tar.gz.
File metadata
- Download URL: py_transcribe_aws-1.5.0.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb43cac708bd1374653a576d3ce53106c3bc86bcd4d3e9da8cf51579908615c2
|
|
| MD5 |
75fe28e169d81a527dbaf44012b0a735
|
|
| BLAKE2b-256 |
fddfbe621e3e38f9bc0d8634180707bb2de2a92e2a30c0fb12d2ad47fbea0da1
|
File details
Details for the file py_transcribe_aws-1.5.0-py3-none-any.whl.
File metadata
- Download URL: py_transcribe_aws-1.5.0-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cd0bd87c653184c75257a9479bea7a416357730d80fe4968bd5527f2502e2ae
|
|
| MD5 |
64f1324e04fb35336cce2f5c105a317f
|
|
| BLAKE2b-256 |
7c26ea6e3d15a27c7a5f3cdb312be1fab3ae061ed3f769127493bc6a1ed56c7a
|