Skip to main content

Chunking utilities for GraphRAG

Project description

GraphRAG Chunking

This package contains a collection of text chunkers, a core config model, and a factory for acquiring instances.

Examples

Basic sentence chunking with nltk

The SentenceChunker class splits text into individual sentences by identifying sentence boundaries. It takes input text and returns a list where each element is a separate sentence, making it easy to process text at the sentence level.

Open the notebook to explore the basic sentence example code

Token chunking

The TokenChunker splits text into fixed-size chunks based on token count rather than sentence boundaries. It uses a tokenizer to encode text into tokens, then creates chunks of a specified size with configurable overlap between chunks.

Open the notebook to explore the token chunking example code

Using the factory via helper util

The create_chunker factory function provides a configuration-driven approach to instantiate chunkers by accepting a ChunkingConfig object that specifies the chunking strategy and parameters. This allows for more flexible and maintainable code by separating chunker configuration from direct instantiation.

Open the notebook to explore the factory helper util example code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphrag_chunking-3.0.5.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphrag_chunking-3.0.5-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file graphrag_chunking-3.0.5.tar.gz.

File metadata

  • Download URL: graphrag_chunking-3.0.5.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for graphrag_chunking-3.0.5.tar.gz
Algorithm Hash digest
SHA256 05cd876256ad6f2212158f11beb6d44f7e48f5cdaa567c3114582ff933f78b34
MD5 0a1e63cd4992dc2c83065c573ec7fb75
BLAKE2b-256 b9a35f4b2da9e6b03d3d33e4b175e61900d8b0504af39f16be2c60706b30cf50

See more details on using hashes here.

File details

Details for the file graphrag_chunking-3.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for graphrag_chunking-3.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ad820bc273b8e5b15017c37711ff98cdf838205c8c5f12887c677362b2036ecc
MD5 3940d0c1db76d19fb0e6b2ce7e668629
BLAKE2b-256 0c5bf6ea045792cae76babefe80ac0acbaea2d34554a131476b93f5a50aa62b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page