Skip to main content

Validation, sanitization and metrics for Markdown manuscripts.

Project description

manuscript-tools

A Python toolkit for validating, sanitizing and measuring Markdown manuscripts. Built for authors who automate their publishing workflow.

Deutsche Version

What it does

ms-check scans your manuscript for style violations. Ships with two built-in rules (typographic dashes, invisible Unicode characters) and supports custom rules as simple Python callables.

ms-sanitize fixes encoding issues, normalizes Unicode (NFKC), strips invisible control characters, replaces problematic whitespace and ensures consistent line endings. Supports dry-run and backup modes.

ms-metrics reports word counts, line counts and character counts per file and in total. Uses regex-based word boundary matching instead of naive whitespace splitting, so Markdown syntax characters are not counted as words.

Requirements

  • Python 3.11+
  • Poetry

Installation

git clone <your-repo-url> manuscript-tools
cd manuscript-tools
make install

Or manually:

poetry install

Usage

All commands accept a path (file or directory, default: manuscript/), --include and --exclude glob patterns.

Style checks

make check
# or with custom path
make check MANUSCRIPT=chapters/
# or directly
poetry run ms-check manuscript/ --exclude 'drafts/*'

Output per file: rule name, violation message and line number. Exit code 1 if any violations are found.

Sanitization

# Preview changes without writing
make sanitize-dry

# Apply changes with backup
make sanitize-backup

# Apply changes in-place (no backup)
make sanitize

Text metrics

make metrics

Output:

chapter-01.md                     3,412 words   187 lines
chapter-02.md                     2,891 words   154 lines
----------------------------------------
Total                             6,303 words   341 lines    38,219 chars

Combined validation

# Runs sanitize dry-run followed by style check
make validate

Writing custom rules

A rule is any callable with the signature (text: str, path: Path) -> list[StyleViolation].

from pathlib import Path
from manuscript_tools.checker import check_file
from manuscript_tools.models import StyleViolation


def rule_no_todos(text: str, path: Path) -> list[StyleViolation]:
    return [
        StyleViolation(file=path, rule="no-todos", message="TODO found", line=i)
        for i, line in enumerate(text.splitlines(), start=1)
        if "TODO" in line
    ]


report = check_file(Path("chapter.md"), rules=[rule_no_todos])

Development

# Install with dev dependencies
make install-dev

# Run tests
make test

# Run tests verbose
make test-v

# Lint
make lint

# Auto-fix lint issues
make lint-fix

# Format code
make format

# Full CI pipeline (lint + format check + tests)
make ci

Project structure

src/manuscript_tools/
    __init__.py
    models.py       # Data classes (StyleViolation, FileReport, SanitizeResult, ...)
    io.py           # File discovery and reading
    checker.py      # Style validation with pluggable rules
    sanitizer.py    # Text sanitization (pure logic + file-level operation)
    metrics.py      # Word counting and text statistics
    cli.py          # CLI entry points (ms-check, ms-sanitize, ms-metrics)
tests/
    test_checker.py
    test_sanitizer.py
    test_metrics.py
Makefile            # All tasks in one place
pyproject.toml      # Poetry config, dependencies, tool settings

Makefile targets

Run make or make help for a complete list:

Target Description
install Install project with all dependencies
install-dev Install with dev dependencies
check Run style checks on manuscript
sanitize Sanitize manuscript files in-place
sanitize-dry Sanitize dry-run (preview only)
sanitize-backup Sanitize with .bak backup files
metrics Show word counts and text metrics
validate Full validation pipeline (sanitize dry-run + check)
test Run all tests
ci Full CI pipeline (lint + format check + tests)
clean Remove build artifacts and caches
build Build distribution package

All manuscript targets accept MANUSCRIPT=path, INCLUDE=pattern and EXCLUDE=pattern variables.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manuscript_tools-0.1.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

manuscript_tools-0.1.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file manuscript_tools-0.1.0.tar.gz.

File metadata

  • Download URL: manuscript_tools-0.1.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.13 Linux/6.8.0-101-lowlatency

File hashes

Hashes for manuscript_tools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 74961dbb3b90a1b40de5264020cb39c0457eb724669b0a8dfd33213b64cab5ef
MD5 a34cf7659cbd51f4f64c52513cd16914
BLAKE2b-256 480ac707b66546d716301870e738ec5ee14c07824c1e60e922ca0e7a17c6d411

See more details on using hashes here.

File details

Details for the file manuscript_tools-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: manuscript_tools-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.13 Linux/6.8.0-101-lowlatency

File hashes

Hashes for manuscript_tools-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5ac629f25e886b1c762352ca5cedf8e4371b3c8bbc3c0ee7ce6759a46335dfd
MD5 5589a24bc15d54238563c8f7234954e7
BLAKE2b-256 051096e4246fd88ad72a3de9ff3bd4b6aafd39c5b2c557c21537eea6e3cd6379

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page