Skip to main content

Closed caption converter

Project description

py-caption
==========

|Build Status|

``pycaption`` is a caption reading/writing module. Use one of the given
Readers to read content into a CaptionSet object,
and then use one of the Writers to output the CaptionSet into
captions of your desired format.

Requires Python 2.7.

Turn a caption into multiple caption outputs:

::

srt_caps = u'''1
00:00:09,209 --> 00:00:12,312
This is an example SRT file,
which, while extremely short,
is still a valid SRT file.
'''

converter = CaptionConverter()
converter.read(srt_caps, SRTReader())
print converter.write(SAMIWriter())
print converter.write(DFXPWriter())
print converter.write(pycaption.transcript.TranscriptWriter())

Not sure what format the caption is in? Detect it:

::

from pycaption import detect_format

caps = u'''1
00:00:01,500 --> 00:00:12,345
Small caption'''

reader = detect_format(caps)
if reader:
print SAMIWriter().write(reader().read(caps))

Or if you expect to have only a subset of the supported input formats:

::

caps = u'''1
00:00:01,500 --> 00:00:12,345
Small caption'''

if SRTReader().detect(caps):
print SAMIWriter().write(SRTReader().read(caps))
elif DFXPReader().detect(caps):
print SAMIWriter().write(DFXPReader().read(caps))
elif SCCReader().detect(caps):
print SAMIWriter().write(SCCReader().read(caps))

Supported Formats
-----------------

Read: - DFXP/TTML - SAMI - SCC - SRT - WebVTT

Write: - DFXP/TTML - SAMI - SRT - Transcript - WebVTT

See the `examples
folder <https://github.com/pbs/pycaption/tree/master/examples/>`__ for
example captions that currently can be read correctly.

Python Usage
------------

Example: Convert from SAMI to DFXP

::

from pycaption import SAMIReader, DFXPWriter

sami = u'''<SAMI><HEAD><TITLE>NOVA3213</TITLE><STYLE TYPE="text/css">
<!--
P { margin-left: 1pt;
margin-right: 1pt;
margin-bottom: 2pt;
margin-top: 2pt;
text-align: center;
font-size: 10pt;
font-family: Arial;
font-weight: normal;
font-style: normal;
color: #ffffff; }

.ENCC {Name: English; lang: en-US; SAMI_Type: CC;}
.FRCC {Name: French; lang: fr-cc; SAMI_Type: CC;}

--></STYLE></HEAD><BODY>
<SYNC start="9209"><P class="ENCC">
( clock ticking )
</P><P class="FRCC">
FRENCH LINE 1!
</P></SYNC>
<SYNC start="12312"><P class="ENCC">&nbsp;</P></SYNC>
<SYNC start="14848"><P class="ENCC">
MAN:<br/>
<span style="text-align:center;font-size:10">When <i>we</i> think</span><br/>
of E equals m c-squared,
</P><P class="FRCC">
FRENCH LINE 2?
</P></SYNC>'''

print DFXPWriter().write(SAMIReader().read(sami))

Which will output the following:

::

<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling">
<head>
<styling>
<style id="p" tts:color="#fff" tts:fontfamily="Arial" tts:fontsize="10pt" tts:textAlign="center"/>
</styling>
</head>
<body>
<div xml:lang="fr-cc">
<p begin="00:00:09.209" end="00:00:14.848" style="p">
FRENCH LINE 1!
</p>
<p begin="00:00:14.848" end="00:00:18.848" style="p">
FRENCH LINE 2?
</p>
</div>
<div xml:lang="en-US">
<p begin="00:00:09.209" end="00:00:12.312" style="p">
( clock ticking )
</p>
<p begin="00:00:14.848" end="00:00:18.848" style="p">
MAN:<br/>
<span tts:fontsize="10" tts:textAlign="center">When</span> <span tts:fontStyle="italic">we</span> think<br/>
of E equals m c-squared,
</p>
</div>
</body>
</tt>

Extensibility
-------------

Different readers and writers are easy to add if you would like to: -
Read/Write a previously unsupported format - Read/Write a supported
format in a different way (more styling?)

Simply follow the format of a current Reader or Writer, and edit to your
heart's desire.

SAMI Reader / Writer :: `spec <http://msdn.microsoft.com/en-us/library/ms971327.aspx>`__
----------------------------------------------------------------------------------------

Microsoft Synchronized Accessible Media Interchange. Supports multiple
languages.

Supported Styling: - text-align - italics - font-size - font-family -
color

If the SAMI file is not valid XML (e.g. unclosed tags), will still
attempt to read it.

DFXP/TTML Reader / Writer :: `spec <http://www.w3.org/TR/ttaf1-dfxp/>`__
-------------------------------------------------------------------

The W3 standard. Supports multiple languages.

Supported Styling: - text-align - italics - font-size - font-family -
color

SRT Reader / Writer :: `spec <http://matroska.org/technical/specs/subtitles/srt.html>`__
----------------------------------------------------------------------------------------

SubRip captions. If given multiple languages to write, will output all
joined together by a 'MULTI-LANGUAGE SRT' line.

Supported Styling: - None

Assumes input language is english. To change:

::

pycaps = SRTReader().read(srt_content, lang='fr')

SCC Reader :: `spec <http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML>`__
-----------------------------------------------------------------------------------------------

Scenarist Closed Caption format. Assumes Channel 1 input.

Supported Styling: - italics

By default, the SCC Reader does not simulate roll-up captions. To enable
roll-ups:

::

pycaps = SCCReader().read(scc_content, simulate_roll_up=True)

Also, assumes input language is english. To change:

::

pycaps = SCCReader().read(scc_content, lang='fr')

Now has the option of specifying an offset (measured in seconds) for the
timestamp. For example, if the SCC file is 45 seconds ahead of the
video:

::

pycaps = SCCReader().read(scc_content, offset=45)

The SCC Reader handles both dropframe and non-dropframe captions, and
will auto-detect which format the captions are in.

Transcript Writer
-----------------

Text stripped of styling, arranged in sentences.

Supported Styling: - None

The transcript writer uses natural sentence boundary detection
algorithms to create the transcript.

WebVTT Reader / Writer `spec <http://dev.w3.org/html5/webvtt/>`__
-----------------------------------------------------------------

Web Video Text Tracks format.

Supported Styling - None (yet)


License
-------

This module is Copyright 2012 PBS.org and is available under the `Apache
License, Version 2.0 <http://www.apache.org/licenses/LICENSE-2.0>`__.

.. |Build Status| image:: https://travis-ci.org/pbs/pycaption.png?branch=master
:target: https://travis-ci.org/pbs/pycaption

Project details


Release history Release notifications | RSS feed

This version

0.4.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycaption-0.4.3.tar.gz (186.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page