Skip to main content

An original recursive character splitter with a simple local UI for markdown chunk zip files

Project description

RecursiveCharacterTextSplitter

A small, original, recursive character splitter with a simple local UI for taking a .zip of Markdown chunks and splitting those chunks into smaller Markdown files.

What it does

  • Accepts a .zip created by your Markdown header splitter or any .zip containing .md files
  • Recursively splits each Markdown file by natural separators like blank lines, lines, sentence endings, spaces, and finally raw character windows
  • Lets you choose chunk_size and chunk_overlap
  • Outputs a new downloadable .zip containing the re-split Markdown chunks and a manifest JSON file
  • Keeps the implementation lightweight and original instead of copying third-party splitter code

Install

pip install RecursiveCharacterTextSplitter

Run

RecursiveCharacterTextSplitter

This launches a local Streamlit app in your browser.

Notes

  • This package does not vendor or copy code from LangChain or other companies.
  • The idea of recursive text splitting is common, but you should still do your own final name, trademark, licensing, and patent review before publishing publicly.
  • PyPI package-name availability can change over time, so confirm the final project name before upload.

File structure

RecursiveCharacterTextSplitter_pypi/
  README.md
  LICENSE
  pyproject.toml
  recursivecharactertextsplitter/
    __init__.py
    app.py
    cli.py
    core.py
  dist/
    recursivecharactertextsplitter-0.1.0.tar.gz
    recursivecharactertextsplitter-0.1.0-py3-none-any.whl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recursivecharactertextsplitter-0.1.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file recursivecharactertextsplitter-0.1.0.tar.gz.

File metadata

File hashes

Hashes for recursivecharactertextsplitter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 058f70db4e470798edd69b811e9f5a50cc2855794315d7c7d0b7a2951e0b78f9
MD5 6ac482ea5221da211a3192efc89512c7
BLAKE2b-256 506f882b6d8de860fba028b8ecc9b4a706e47ea7b5bdbb468f0dcb67b9c6be32

See more details on using hashes here.

File details

Details for the file recursivecharactertextsplitter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for recursivecharactertextsplitter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f7ba67609f4918c9629a7fbe5023550452e2e718c794ee2b9f02b7fd8f8fa25c
MD5 e38f4ee446a3aba86b1c7ebf9d8362ed
BLAKE2b-256 56df0f5c18e7a4f4c2c94d41db07ff0704aaea002b03f49a514ec3841e9cbdb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page