tpro processes transcripts from speech-to-text services and outputs to various formats.
Project description
tpro
Transcript Processing! tpro
takes JSON-formatted transcripts produced by
various speech-to-text services and converts them to various standardized
formats.
Installation and Usage
Non-pip Requirement: Stanford NER JAR
- download and unzip this
- put these files in in /usr/local/bin/:
- stanford-ner.jar
- classifiers/english.all.3class.distsim.crf.ser.gz
- you might have to update Java on Linux
Pip
$ pip install tpro
Usage
$ tpro --help
Usage: tpro [OPTIONS] JSON_PATH_OR_DATA [amazon|gentle|speechmatics]
[universal_transcript|viral_overlay]
Options:
-s, --save TEXT save to file
--help Show this message and exit.
Example
$ tpro '{
"job": {
"lang": "en",
"user_id": 2152310,
"name": "recording.mp4",
"duration": 7,
"created_at": "Mon Nov 12 14:57:06 2018",
"id": 9871364
},
"speakers": [
{
"duration": "6.87",
"confidence": null,
"name": "M2",
"time": "5.98"
}
],
"words": [
{
"duration": "0.13",
"confidence": "0.670",
"name": "Hello",
"time": "5.98"
},
{
"duration": "0.45",
"confidence": "1.000",
"name": "there",
"time": "6.14"
}
]
}' speechmatics universal_transcript
[
{
"start": 5.98,
"end": 6.11,
"confidence": 0.67,
"word": "Hello",
"always_capitalized": false,
"punc_after": false,
"punc_before": false
},
{
"start": 6.14,
"end": 6.59,
"confidence": 1.0,
"word": "there",
"always_capitalized": false,
"punc_after": false,
"punc_before": false
}
]
$
STT Services
Planned
Output Formats
- Universal Transcript (JSON)
- viraloverlay (JSON)
Planned
- Word (
.doc
,.docx
) - text files
- SRT (subtitles)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tpro-0.8.tar.gz
(6.9 kB
view hashes)
Built Distribution
tpro-0.8-py3-none-any.whl
(9.4 kB
view hashes)