Skip to main content

tpro processes transcripts from speech-to-text services and outputs to various formats.

Project description

tpro

Transcript Processing! tpro takes JSON-formatted transcripts produced by various speech-to-text services and converts them to various standardized formats.

Installation and Usage

Non-pip Requirement: Stanford NER JAR

  • download and unzip this
  • put these files in in /usr/local/bin/:
    • stanford-ner.jar
    • classifiers/english.all.3class.distsim.crf.ser.gz
  • you might have to update Java on Linux

Pip

$ pip install tpro

Usage

$ tpro --help

Usage: tpro [OPTIONS] TRANSCRIPT_DATA_PATH OUTPUT_PATH
        [amazon|gentle|speechmatics|google] [universal|vo]

Options:
  -p, --print-output    pretty print the transcript, breaks pipeability
  --language-code TEXT  specify language, defaults to en-US.
  --help                Show this message and exit.

Example

$ cat transcript.json 

  { "job": {
      "lang": "en",
      "user_id": 2152310,
      "name": "recording.mp4",
      "duration": 7,
      "created_at": "Mon Nov 12 14:57:06 2018",
      "id": 9871364
    },
    "speakers": [
      {
        "duration": "6.87",
        "confidence": null,
        "name": "M2",
        "time": "5.98"
      }
    ],
    "words": [
      {
        "duration": "0.13",
        "confidence": "0.670",
        "name": "Hello",
        "time": "5.98"
      },
      {
        "duration": "0.45",
        "confidence": "1.000",
        "name": "there",
        "time": "6.14"
      }
  ]

} 

$ tpro transcript.json converted_transcript.json speechmatics universal_transcript

[
    {
        "start": 5.98,
        "end": 6.11,
        "confidence": 0.67,
        "word": "Hello",
        "always_capitalized": false,
        "punc_after": false,
        "punc_before": false
    },
    {
        "start": 6.14,
        "end": 6.59,
        "confidence": 1.0,
        "word": "there",
        "always_capitalized": false,
        "punc_after": false,
        "punc_before": false
    }
]

☝☝☝ There\'s your transcript, which was saved to converted_transcript.json.

STT Services

Planned

Output Formats

Planned

  • Word (.doc, .docx)
  • text files
  • SRT (subtitles)
  • Draft.js JSON

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpro-0.15.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

tpro-0.15-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file tpro-0.15.tar.gz.

File metadata

  • Download URL: tpro-0.15.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for tpro-0.15.tar.gz
Algorithm Hash digest
SHA256 f5b9223b255a443d2ca3a34328774b0a9d227675b26a9aa752256157eb61b214
MD5 52fcffe4ebe806fb376994062a57fa9d
BLAKE2b-256 72d2f52bcefea054f740d24f41047a8674c8e88e5557c254fb5deadc65ad5622

See more details on using hashes here.

File details

Details for the file tpro-0.15-py3-none-any.whl.

File metadata

  • Download URL: tpro-0.15-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for tpro-0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 37b367721a4d7b1c9da78296edda29eab26d65677d079d409ff1eac77ec3d201
MD5 26014101dadc59f5456032ab6af59c4b
BLAKE2b-256 33123479cfc232bb3051f23fb265ee4240e43590c0269714054a8bac81cd8252

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page