Skip to main content

A small package to quickly read, process and recursively split multiple different, text-based filetypes (like .txt, .pdf, .docx) into overlapping chunks of predefined sizes.

Project description

simple_file_splitters

Overview:

This library provides a simple and efficient way to split documents into overlapping chunks based on specified separators. It currently supports splitting of .docx, .pdf, and .txt files. The resulting chunks may be used to create embeddings or similar things.

Installation:

This library needs the following extentions installed:

  • pandas
  • langchain_text_splitters
  • langchain_community
  • docx2python
  • PyMuPDF

You can quickinstall this library and its extentions with:

pip install simple_file_splitters

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_file_splitters-0.0.2.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_file_splitters-0.0.2-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file simple_file_splitters-0.0.2.tar.gz.

File metadata

  • Download URL: simple_file_splitters-0.0.2.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for simple_file_splitters-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ababf47784d24ac72e8e03929e9f4c16b1639677cff416af31865e7eacfcdb99
MD5 efcb05fe68b3fa0d7ca03395b1e220f1
BLAKE2b-256 e490f11745714fe4320deeb9b2cb3a2263773996af790d91ad94cae51853395d

See more details on using hashes here.

File details

Details for the file simple_file_splitters-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_file_splitters-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a09a7d6e4d641ace974023b397b7d13631d594e9e1c21d6d959efa3139172122
MD5 5423ca3aa46441994a1df68edee57be4
BLAKE2b-256 f236423f6df3e38fec094a529ed9aaedd4905a6082b55568753b7bb61afa6686

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page