Break down docs, build up knowledge.
Project description
Break down your docs. Build up your knowledge.
A Markdown text splitter for modular docs and maximum flexibility.
What is SplitmeAI?
SplitmeAI is a Python module that addresses challenges in managing large Markdown files, particularly when creating and maintaining structured static documentation sites such as Mkdocs.
Key Features:
- Section Splitting: Breaks down large Markdown files into smaller, manageable sections based on specified heading levels.
- Hierarchy Preservation: Maintains parent heading context within each split file.
- Filename Sanitization: Generates clean, unique filenames for each section, ensuring compatibility and readability.
- Reference Link Management: Extracts and appends reference-style links used within each section.
- Reference Link Conversion: Convert all inline links to reference-style links for improved readability and maintainability.
- Link Validation: Checks and validates all links within a Markdown file for accuracy and integrity.
- Thematic Break Handling: Recognizes and handles line breaks (
---,***,___) for intelligent content segmentation. - MkDocs Integration: Automatically generates an
mkdocs.ymlconfiguration file based on the split sections. - CLI Support: Provides a user-friendly Command-Line Interface for seamless operation.
Quick Start
Installation
Install from PyPI using your preferred package manager listed below.
pip
Use pip (recommended for most users):
pip install -U splitme-ai
pipx
Install in an isolated environment with pipx:
❯ pipx install splitme-ai
uv
For the fastest installation use uv:
❯ uv tool install splitme-ai
Usage
Using the CLI
Let's take a look at some examples of how to use the splitme-ai CLI.
Splitting a Markdown File
Example 1: Split a Markdown file on heading level 2 (default setting):
splitme-ai \
--split.i docs/examples/data/README-AI.md \
--split.settings.o docs/examples/output-h2
Example 2: Split on heading level 2 and generate an mkdocs.yml configuration file:
splitme-ai \
--split.i docs/examples/data/README-AI.md \
--split.settings.o docs/examples/output-h2 \
--split.settings.mkdocs
Example 3: Split on heading level 3:
splitme-ai \
--split.i docs/examples/data/README-AI.md \
--split.settings.o docs/examples/output-h3 \
--split.settings.hl "###"
Example 4: Split on heading level 4:
splitme-ai \
--split.i docs/examples/data/README-AI.md \
--split.settings.o docs/examples/output-h4 \
--split.settings.hl "####"
Converting Reference Links
Example 5: Convert inline links to reference-style links:
splitme-ai --reflinks.i tests/data/pydantic.md --reflinks.o with_reflinks.md
Validating Links
Example 6: Validate all links in a Markdown file:
splitme-ai --validate-links.i tests/data/pydantic.md
The output will display the results of whether the links are valid or broken.
Scanning markdown file tests/data/pydantic.md for broken links...
Markdown Link Check Results:
--------------------------------------------------------------------------------
✓ Line 2: [
✓ Line 3: [
✓ Line 4: [
✓ Line 5: [
✓ Line 6: [
✓ Line 7: [
✓ Line 8: [
✓ Line 9: [
✓ Line 18: [Learn more](https://pydantic.dev/articles/logfire-announcement)
✓ Line 24: [pydantic V1.10 Documentation](https://docs.pydantic.dev/)
✓ Line 24: [`1.10.X-fixes` git branch](https://github.com/pydantic/pydantic/tree/1.10.X-fixes)
✓ Line 28: [documentation](https://docs.pydantic.dev/)
✓ Line 34: [Install](https://docs.pydantic.dev/install/)
Summary: 0 broken links out of 13 total links.
View the output of all examples above here.
[!NOTE] Explore the [Official Documentation][docs] for more detailed guides and examples.
Roadmap
- Implement reference link conversion and management.
- Enhance CLI usability and user experience.
- Integrate AI-powered content analysis and segmentation.
- Add robust chunking and splitting algorithms for LLM applications.
- Add support for additional static site generators.
- Add support for additional input and output formats.
Contributing
Contributions are welcome! For bug reports, feature requests, or questions, please open an issue or submit a pull request on GitHub.
License
Copyright © 2024-2025 splitme-ai.
Released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file splitme_ai-0.1.10.tar.gz.
File metadata
- Download URL: splitme_ai-0.1.10.tar.gz
- Upload date:
- Size: 152.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8dbb1c533a2f93211c18548ecccc400c6517901c0a854d0cda9a4c53862f020
|
|
| MD5 |
cbb0d8b9108864307dce1c58402310d7
|
|
| BLAKE2b-256 |
273d6b10880bea8f04770f1f9a1e840753147ec980872e2315228fb1b6bc2ad9
|
Provenance
The following attestation bundles were made for splitme_ai-0.1.10.tar.gz:
Publisher:
ci.yml on eli64s/splitme-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
splitme_ai-0.1.10.tar.gz -
Subject digest:
f8dbb1c533a2f93211c18548ecccc400c6517901c0a854d0cda9a4c53862f020 - Sigstore transparency entry: 160665154
- Sigstore integration time:
-
Permalink:
eli64s/splitme-ai@4ad812214ca532f0acd6ecb8977270ccd853a106 -
Branch / Tag:
refs/tags/v0.1.10 - Owner: https://github.com/eli64s
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@4ad812214ca532f0acd6ecb8977270ccd853a106 -
Trigger Event:
push
-
Statement type:
File details
Details for the file splitme_ai-0.1.10-py3-none-any.whl.
File metadata
- Download URL: splitme_ai-0.1.10-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3d01a5328c5b35dee07a0dbc6b2a64f275c84390273e10fc69de808ca49be1f
|
|
| MD5 |
07fc7ecbf802a30d0a006762709f10ac
|
|
| BLAKE2b-256 |
a0bf2b82cbafa72a75fc882650b19a85620cb2bd89ba84250f2e5ff3ae7d0f30
|
Provenance
The following attestation bundles were made for splitme_ai-0.1.10-py3-none-any.whl:
Publisher:
ci.yml on eli64s/splitme-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
splitme_ai-0.1.10-py3-none-any.whl -
Subject digest:
a3d01a5328c5b35dee07a0dbc6b2a64f275c84390273e10fc69de808ca49be1f - Sigstore transparency entry: 160665157
- Sigstore integration time:
-
Permalink:
eli64s/splitme-ai@4ad812214ca532f0acd6ecb8977270ccd853a106 -
Branch / Tag:
refs/tags/v0.1.10 - Owner: https://github.com/eli64s
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@4ad812214ca532f0acd6ecb8977270ccd853a106 -
Trigger Event:
push
-
Statement type: