Skip to main content

A small package to quickly read, process and recursively split multiple different, text-based filetypes (like .txt, .pdf, .docx) into overlapping chunks of predefined sizes.

Project description

simple_file_splitters

Overview:

This library provides a simple and efficient way to split documents into overlapping chunks based on specified separators. It currently supports splitting of .docx, .pdf, and .txt files. The resulting chunks may be used to create embeddings or similar things.

Installation:

This library needs the following extentions installed: - pandas - langchain_text_splitters - langchain_community - docx2python - PyMuPDF

You can quickinstall this library and its extentions with: pip install simple_file_splitters

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_file_splitters-0.0.1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_file_splitters-0.0.1-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file simple_file_splitters-0.0.1.tar.gz.

File metadata

  • Download URL: simple_file_splitters-0.0.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for simple_file_splitters-0.0.1.tar.gz
Algorithm Hash digest
SHA256 651687d243f496fc4dc8e2936eed498affebde1e76f356f995b494451a9c0e3b
MD5 1a184c2b1105b1458ceafbeccc2bc630
BLAKE2b-256 fa73107522e71803d9126920fb37465bc68ffdf404b460fc6e001f3fd5eaf8d3

See more details on using hashes here.

File details

Details for the file simple_file_splitters-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_file_splitters-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8bdc0122a54a471b94c2b322c0258ef6a1fb5387ad0c4376f2b88212d0cd0694
MD5 995725f07a60bde36ef2af7b22d2ff45
BLAKE2b-256 71c1dd09e6baed5c2c3ea6b1828bb32be26d3323d02a07db91b1c13a8a5a8f42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page