Skip to main content

WEBVTT to text converter

Project description

VttFormatter

Converts WEBVTT files into text removing timestamps and identifiers and formatting the text into paragraphs.

VTT_formatter is a python package that can be executed using python in the command line or through an interface such as a Jupyter Notebook either locally on a machine or using Azure Notebooks.

Full instructions on using VTT_formatter in a Jupyter Notebook, on either Azure Notebooks, or locally using Anaconda can be found on the wiki.

Example Input/Output

Input

WEBVTT

NOTE duration:"00:00:32.5820000"

NOTE language:en-us

NOTE Confidence: 0.69450831413269

ef04c7c2-a59e-463f-9d27-b5b1259d6777
00:00:03.300 --> 00:00:06.870
Hello.

NOTE Confidence: 0.621036410331726

8a017ebb-1722-4e7f-8984-fc6da39c3489
00:00:08.100 --> 00:00:09.620
Hi there.

NOTE Confidence: 0.713402450084686

d9a1567a-1ebe-40ce-983a-98436bcabcfe
00:00:19.240 --> 00:00:20.240
Can you hear me properly?

NOTE Confidence: 0.536461710929871

b8e0fa64-8c2f-4070-9b0f-922a50f3fcde
00:00:21.930 --> 00:00:23.490
Yeah.

NOTE Confidence: 0.889019846916199

88910870-8af9-48f5-bcc4-a501eda95d3f
00:00:24.670 --> 00:00:28.778
But now my headphones are playing
up, I can still hear you though.

NOTE Confidence: 0.889019846916199

7d633414-089b-4813-9617-9533f5f215c0
00:00:28.778 --> 00:00:32.570
Well, I mean it is crackling. It 
will still be recording the audio.

Output

Hello.

Hi there.

Can you hear me properly?

Yeah.

But now my headphones are playing up, I can still hear you though. Well, I mean 
it is crackling. It will still be recording the audio.

Simple useage

The screenshot belows shows the simple implementation of the VTT formatter in a jupyter notebook. This will read in the file defined and create a new .txt file in the same directory as the original.

Further information can be found in the notebook here

Installation

The simplest way to install this vttformatter is to use pip to install from PyPI

pip install vttformatter

Alternatively, you can download the latest release from GitHub, and install directly:

cd vttformatter
pip install -e .

which installs an editable (-e) version of pyscses in your userspace.

Or clone the latest version from GitHub with

git clone git@github.com:georgiewellock/VTT_formatter.git

and install the same way.

cd vttformatter
pip install -e .

Tests

Unit tests are available in the top tests directory. These can be run using

pytest

or

python -m unittest discover

in the top directory.

Contributing

Bugs reports and feature requests

If you think you have found a bug, please report it on the Issue Tracker. This is also the place to propose ideas for new features or ask questions about the design of the vtt formatter. Poor documentation is considered a bug, but please be as specific as possible when asking for improvements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vttformatter-2.10.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vttformatter-2.10-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file vttformatter-2.10.tar.gz.

File metadata

  • Download URL: vttformatter-2.10.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for vttformatter-2.10.tar.gz
Algorithm Hash digest
SHA256 5ef65ace698d2c13fd6bec0883c8ef0a8e6941b23a5866f47ab451ecff5eb87c
MD5 018d622d1554dcd268e13bd94e571b43
BLAKE2b-256 8329bc303ab971f85636963af47e8250a212d7476ddedf9c7f244633fa866f31

See more details on using hashes here.

File details

Details for the file vttformatter-2.10-py3-none-any.whl.

File metadata

  • Download URL: vttformatter-2.10-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for vttformatter-2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 e36a45d5ed68486e46b1bcb3e4ea33dab2fb9c80445816a375bfcf92e1cf7007
MD5 5127ea7a564fae595c60a920f3dd8cb6
BLAKE2b-256 da0515a18dd082599e638424680ddfb7ab47beb3d624749bc75d62f9ca9badcf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page