Skip to main content

HTML page splitter (preserves tags), text page splitter (natural breaks), and chapter detector for pure text.

Project description

Build Status Coverage

pagesmith

Split HTML into pages while preserving HTML tags and respecting the original document structure. Utilizes the blazingly fast lxml parser.

Split pure text into pages at natural break points such as paragraphs or sentences.

Detect chapters in pure text to create a Table of Contents.

Documentation

Pagesmith

Developers

Do not forget to run . ./activate.sh.

For development, you need uv installed.

Use pre-commit hooks for code quality:

pre-commit install

Allure test report

Scripts

Install invoke preferably with pipx:

pipx install invoke

For a list of available scripts run:

invoke --list

For more information about a script run:

invoke <script> --help

Coverage report

Created with cookiecutter using template

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pagesmith-2.1.1.tar.gz (142.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pagesmith-2.1.1-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file pagesmith-2.1.1.tar.gz.

File metadata

  • Download URL: pagesmith-2.1.1.tar.gz
  • Upload date:
  • Size: 142.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pagesmith-2.1.1.tar.gz
Algorithm Hash digest
SHA256 4b414b39b981af46775634e85793cd31dbdba5ed247930db566ae164e6c403ba
MD5 662e0c386cf0eccb0ca343b560fc4b16
BLAKE2b-256 b414035456dd760ce4b647796b5564cc47999b995bd87143ac790af61dd1e7ed

See more details on using hashes here.

File details

Details for the file pagesmith-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: pagesmith-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pagesmith-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 92132abe48ac18234653dc41fd5c516b567fc0e6f9a7d87aa9bef8b357c4e1ca
MD5 11dfdccd6a0b652a47c1e5eb5614cfc2
BLAKE2b-256 932d69ef65414ba26deaf46eb2b754aa4ebcfae1461626e12f05108879f9e40c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page