Skip to main content

Subtitles extremely clean

Project description

Subtitles extremely clean.

Latest Version Travis CI build status License
Project page:

https://github.com/ratoaq2/cleanit

CleanIt is a command line tool (written in python) that helps you to keep your subtitles clean. You can specify rules to detect subtitle entries to be removed or patterns to be replaced. Simple text matching or complex regex can be used.

Usage

CLI

Clean subtitles:

$ cleanit --config my-config.yml my-subtitle.srt
Collected 1 subtitles
Saving <Subtitle [my-subtitle.srt]>
Saved <Subtitle [my-subtitle.srt]>

Library

How to clean subtitles in a specific path using a specific configuration:

from cleanit.api import clean_subtitle, save_subtitle
from cleanit.config import Config
from cleanit.subtitle import Subtitle

subtitle = Subtitle('/subtitle/path')
config = Config.from_file('/config/path')
if clean_subtitle(subtitle, config.rules):
    save_subtitle(subtitle)

YAML Configuration file

The yaml configuration file has 2 main sections: templates and groups.

  • Templates can help you to define common configuration snippets to be used in several groups.

  • Groups: where you can define your rules.

# Reference:
#   type: [text*, regex]
#   match: [contains*, exact, startswith, endswith]
#   flags: [ignorecase, dotall, multiline, locale, unicode, verbose]
#   whitelist: no*
#   rules:
#   - sometext
#   - (\b)(\d{1,2})x(\d{1,2})(\b): {replacement: \1S\2E\3\4, type: regex, match: contains, flags: [unicode], whitelist: no}


templates:
  common:
    type: text
    match: contains

groups:
  # Groups can have any name, in this case 'blacklist' we have all the rules to remove subtitle  entries
  blacklist:
    template: common
    rules:
      # Removes any subtitle entry that contains the word FooBar
      - FooBar

      # Removes any subtitle entry that contains the pattern S00E00
      # Example:
      #   My Series S01E02
      - \bs\d{2}\s?e\d{2}\b: {type: regex, flags: ignorecase}

      # Removes any subtitle entry that is exactly the word: 'Ah' or 'Oh' (with 1 or more h)
      # Example:
      #   Ohhh!
      - ((Ah+)|(Oh+))\W?: {match: exact}

  # The group 'tidy' has all rules to replace certain patterns in your subtitles.
  tidy:
    template: common
    type: regex
    rules:
      # Description: Replace extra spaces to a single space
      # Example:
      #   Foo     bar.
      # to
      #   Foo bar.
      - \s{2,}: ' '

      # Description: Add space when starting phrase with '-'. It ignores tags, such as <i>, <b>
      # Example:
      #   <i>-Francine, what has happened?
      #   -What has happened? You tell me!</i>
      # to
      #   <i>- Francine, what has happened?
      #   - What has happened? You tell me!</i>
      - '(?:^(|(?:\<\w\>)))-([''"]?\w+)': { replacement: '\1- \2', flags: [multiline, unicode] }

* The default value if none is defined

CleanIt will try to load configuration file from ~/.config/cleanit/config.yml if no configuration file is defined.

Changelog

0.2.1

release date: 2016-02-28 * Adding guess encoding back without python-magic dependency.

0.2

release date: 2016-02-27 * Removing chardet and python-magic dependencies. Either encoding is specified or it should be guessed by pysrt

0.1

release date: 2015-10-16

  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanit-0.3.tar.gz (12.7 kB view details)

Uploaded Source

File details

Details for the file cleanit-0.3.tar.gz.

File metadata

  • Download URL: cleanit-0.3.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.5

File hashes

Hashes for cleanit-0.3.tar.gz
Algorithm Hash digest
SHA256 6d7901329ba248f4b90a5d373b9deecbaba033da8364fe136ded7998f32f612e
MD5 e10b3bf34ece9c410c5197a6223f0759
BLAKE2b-256 e1884739bab9ab8499f10d5f73c6ca90e12ba69fa067f420419eef874c1a4bcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page