Skip to main content

Split Markdown files at headings

Project description

mdsplit

mdsplit is a python command line tool to split Markdown files into chapters at a given heading level.

Each chapter (or subchapter) is written to its own file, which is named after the heading title. These files are written to subdirectories representing the document's structure.

Optionally you can create:

  • table of contents (toc.md) for each input file
  • navigation footers (links to table of contents, previous page, next page)

Note:

  • Code blocks (```) are detected (and headers inside ignored)
  • The output is guaranteed to be identical with the input (except for the separation into multiple files of course)
    • This means: no touching of whitespace or changing - to * of your lists like some viusual Markdown editors tend to do
  • Text before the first heading is written to a file with the same name as the Markdown file
  • Chapters with the same heading name are written to the same file.
  • Reading from stdin is supported
  • Can easily handle large files, e.g. a 1 GB file is split into 30k files in 35 seconds on a 2015 Thinkpad (with an SSD)

Limitations:

positional arguments:
  input                 path to input file/folder (omit or set to '-' to read from stdin)

options:
  -h, --help            show this help message and exit
  -e ENCODING, --encoding ENCODING
                        force a specific encoding, default: python's default platform encoding
  -l {1,2,3,4,5,6}, --max-level {1,2,3,4,5,6}
                        maximum heading level to split, default: 1
  -t, --table-of-contents
                        generate a table of contents (one 'toc.md' per input file)
  -n, --navigation      add a navigation footer on each page (links to toc, previous page, next page)
  -o OUTPUT, --output OUTPUT
                        path to output folder (must not exist)
  -f, --force           write into output folder even if it already exists
  -v, --verbose

Similar projects:

You may also be interested in https://github.com/alandefreitas/mdsplit (C++-based).

Installation

Either use pip:

pip install mdsplit
mdsplit

Or simply download mdsplit.py and run it (it does not use any dependencies but python itself):

python3 mdsplit.py

Usage

Show documentation and supported arguments:

mdsplit --help

Split a file at level 1 headings, e.g. # This Heading, and write results to an output folder based on the input name:

mdsplit in.md
%%{init: {'themeVariables': { 'fontFamily': 'Monospace', 'text-align': 'left'}}}%%
flowchart LR
    subgraph in.md
        SRC[# Heading 1<br>lorem ipsum<br><br># HeadingTwo<br>dolor sit amet<br><br>## Heading 2.1<br>consetetur sadipscing elitr]
    end
    SRC --> MDSPLIT(mdsplit in.md)
    MDSPLIT --> SPLIT_A
    MDSPLIT --> SPLIT_B
    subgraph in/HeadingTwo.md
        SPLIT_B[# HeadingTwo<br>dolor sit amet<br><br>## Heading 2.1<br>consetetur sadipscing elitr]
    end
    subgraph in/Heading 1.md
        SPLIT_A[# Heading 1<br>lorem ipsum<br><br>]
    end
    style SRC text-align:left
    style SPLIT_A text-align:left
    style SPLIT_B text-align:left
    style MDSPLIT fill:#000,color:#0F0

Split a file at level 2 headings and higher, e.g. # This Heading and ## That Heading, and write to a specific output directory:

mdsplit in.md --max-level 2 --output out
%%{init: {'themeVariables': { 'fontFamily': 'Monospace', 'text-align': 'left'}}}%%
flowchart LR
    subgraph in.md
        SRC[# Heading 1<br>lorem ipsum<br><br># HeadingTwo<br>dolor sit amet<br><br>## Heading 2.1<br>consetetur sadipscing elitr]
    end
    SRC --> MDSPLIT(mdsplit in.md -l 2 -o out)
    subgraph out/HeadingTwo/Heading 2.1.md
        SPLIT_C[## Heading 2.1<br>consetetur sadipscing elitr]
    end
    subgraph out/HeadingTwo.md
        SPLIT_B[# HeadingTwo<br>dolor sit amet<br><br>]
    end
    subgraph out/Heading 1.md
        SPLIT_A[# Heading 1<br>lorem ipsum<br><br>]
    end
    MDSPLIT --> SPLIT_A
    MDSPLIT --> SPLIT_B
    MDSPLIT --> SPLIT_C
    style SRC text-align:left
    style SPLIT_A text-align:left
    style SPLIT_B text-align:left
    style MDSPLIT fill:#000,color:#0F0

Split Markdown from stdin:

cat in.md | mdsplit --output out

Development (Ubuntu 24.04)

Add the deadsnakes PPA and install additional python versions for testing

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.9-distutils python3.9-venv
...

Install poetry

Prepare virtual environment and download dependencies

poetry install

Run tests (for the default python version)

poetry run pytest

Run tests for all supported python versions

poetry run tox

Release new version

poetry build
poetry publish

Download statistics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdsplit-0.5.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdsplit-0.5.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file mdsplit-0.5.0.tar.gz.

File metadata

  • Download URL: mdsplit-0.5.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.3 Linux/6.8.0-47-generic

File hashes

Hashes for mdsplit-0.5.0.tar.gz
Algorithm Hash digest
SHA256 81062448e549645052828f265e1dd358d8da764a3318106fc208b363d4c6a380
MD5 cbcf0db482f8c7a8191a41003edba4f9
BLAKE2b-256 7d6e17e6015c588fdc2bf30332d856325eb433b384b01e9e3f59ef53596bc46b

See more details on using hashes here.

File details

Details for the file mdsplit-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: mdsplit-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.3 Linux/6.8.0-47-generic

File hashes

Hashes for mdsplit-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5d83abbd8e43b408a9c5f8d5a71c8329cbbec2d6614ac729b6edcf5d179ec137
MD5 83acf0cd59a6cdaeb30a3fd8b8788de7
BLAKE2b-256 072cc8c4e2280984ceed3e435e46c2ae8e9aaebcc190b0cd07af62971e832fbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page