Skip to main content

[DEV} Chunking components for the Sayou Data Platform

Project description

Sayou Chunking

Flexible text segmentation strategies for preprocessing large documents before embedding or RAG.


💡 Why Sayou Chunking?

sayou_chunking turns raw text into manageable, context-preserving chunks — a critical step for embedding pipelines and RAG.

  • Multiple Strategies: Fixed-length, recursive, semantic, structure-based.
  • LLM-Aware Chunking: Uses LLM cues to split logically.
  • Interoperable: Produces DataAtoms usable by Sayou Refinery or Assembler.

🚀 Quick Start

pip install sayou-chunking

🏗️ Core Concepts

  • Splitter: Defines the splitting strategy.
  • Plugins: Extend chunking logic for domain-specific use cases.

📜 License

Apache 2.0 License © 2025 Sayouzone

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sayou_chunking_dev-0.0.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sayou_chunking_dev-0.0.1-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file sayou_chunking_dev-0.0.1.tar.gz.

File metadata

  • Download URL: sayou_chunking_dev-0.0.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sayou_chunking_dev-0.0.1.tar.gz
Algorithm Hash digest
SHA256 427f4802dc90cc0f83c2efe6c20b669a010d8c6904dcdc24ba4937e82a85539a
MD5 c41d78b2a5aee2a271cc693defa6b8a6
BLAKE2b-256 95d136fd28cb8b9c260bd5bf52a61643f8b172db993090c95bbdf3d89b3af3f2

See more details on using hashes here.

File details

Details for the file sayou_chunking_dev-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sayou_chunking_dev-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e098e9d53280800e3693a8fdcfc25add1fed5ac68f3e243073c1fc106cdd1ccd
MD5 8a5af91cf26dc1e7818f39347f2e5dda
BLAKE2b-256 4ed9d6b4a8de2985ffc56cc869e88f31867e41bd6521cb5c8c5cee8e9458c377

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page