Skip to main content

pluggable command-line tool for validating the formatting and orthography of text files

Project description

text-validator

pluggable command-line tool for validating the formatting and orthography of text files

You config your validator plugins with a TOML file like:

["text_validator.plugins.whitespace"]
CHECK_CRLF = true
CHECK_TABS = true
CHECK_TRAILING_WHITESPACE = true
CHECK_NO_EOF_NEWLINE = true

["text_validator.plugins.unicode"]
CONFIRM_UTF_8_NFC = true

["text_validator.plugins.ref_line_format"]
REF_REGEX = "(\\d+|EP|SB)\\.\\d+(\\.\\d+)?$"  # example from AF

["text_validator.plugins.characters"]
REPLACE_CHARS = [
    # bad character, suggested replacement
    ["\u02BC", "\u2019"],
    ["\u1FBF", "\u2019"],
    ["\u037E", "\u003B"],
    ["\u0387", "\u00B7"],
    ["\u0374", "\u02B9"],
    ["\u03D5", "\u03C6"],
    ["\u03D1", "\u03B8"],
]

and they'll validate the texts you give it:

tests/test_0001.txt:1:line ends with CRLF
tests/test_0001.txt:2:line ends with CRLF
tests/test_0002.txt:1:no newline at end of file
tests/test_0003.txt:1:line contains a tab
tests/test_0004.txt:1:trailing whitespace
tests/test_0006.txt:1:not NFC
tests/test_0007.txt:2:BLANK LINE
tests/test_0008.txt:1:BAD WHITESPACE
tests/test_0008.txt:2:BAD WHITESPACE
tests/test_0009.txt:4:BAD REFERENCE FORM
tests/test_0009.txt:5:BAD REFERENCE FORM
tests/test_0010.txt:2:29:bad U+02BC; consider replacing with U+2019
tests/test_0010.txt:3:29:bad U+1FBF; consider replacing with U+2019

To install:

pip install text-validator

Then you can either run from the command line:

validate-text tests/config_004.toml tests/test_0007.txt tests/test_0008.txt tests/test_0009.txt

or programmatically from Python, either with the helper function validate:

from text_validator.main import validate

validate("tests/config_003.toml", ["tests/test_0005.txt", "tests/test_0006.txt"])

or by working directly with a Suite instance:

from text_validator.base import Suite

suite = Suite()
suite.load_toml("tests/config_002.toml")
suite.validate_files(["tests/test_0005.txt", "tests/test_0006.txt"])

Also see:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text-validator-0.3.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

text_validator-0.3-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file text-validator-0.3.tar.gz.

File metadata

  • Download URL: text-validator-0.3.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.8.0

File hashes

Hashes for text-validator-0.3.tar.gz
Algorithm Hash digest
SHA256 282901c6143e1dc90e534e5d951d35890e84ad7f9ef38cd551466c32a340fae4
MD5 fce4fd78dd1068262e5d53d2f3f46a8a
BLAKE2b-256 abd76a8f9b668d79ec07c4f750dcac7682fcd1ad228428650fb402094ac84fc8

See more details on using hashes here.

File details

Details for the file text_validator-0.3-py3-none-any.whl.

File metadata

  • Download URL: text_validator-0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.8.0

File hashes

Hashes for text_validator-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ad1ef9f051afd69a286eca1c95c269ef418238f57eb500e8a134054d6074c481
MD5 210571ec19fdc10c23bd5f9c6aa41842
BLAKE2b-256 6c8eaab33044f0b28429ffa7d0acaed4726cf9a150d9d9addba31f44c8fa9015

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page