Skip to main content

Chunking utilities for GraphRAG

Project description

GraphRAG Chunking

This package contains a collection of text chunkers, a core config model, and a factory for acquiring instances.

Examples

Basic sentence chunking with nltk

The SentenceChunker class splits text into individual sentences by identifying sentence boundaries. It takes input text and returns a list where each element is a separate sentence, making it easy to process text at the sentence level.

chunker = SentenceChunker()
chunks = chunker.chunk("This is a test. Another sentence.")
print(chunks) # ["This is a test.", "Another sentence."]

Token chunking

The TokenChunker splits text into fixed-size chunks based on token count rather than sentence boundaries. It uses a tokenizer to encode text into tokens, then creates chunks of a specified size with configurable overlap between chunks.

tokenizer = tiktoken.get_encoding("o200k_base")
chunker = TokenChunker(size=3, overlap=0, encode=tokenizer.encode, decode=tokenizer.decode)
chunks = chunker.chunk("This is a random test fragment of some text")
print(chunks) # ["This is a", " random test fragment", " of some text"]

Using the factory via helper util

The create_chunker factory function provides a configuration-driven approach to instantiate chunkers by accepting a ChunkingConfig object that specifies the chunking strategy and parameters. This allows for more flexible and maintainable code by separating chunker configuration from direct instantiation.

tokenizer = tiktoken.get_encoding("o200k_base")
config = ChunkingConfig(
    strategy="tokens",
    size=3,
    overlap=0
)
chunker = create_chunker(config, tokenizer.encode, tokenizer.decode)
...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphrag_chunking-3.0.1.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphrag_chunking-3.0.1-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file graphrag_chunking-3.0.1.tar.gz.

File metadata

  • Download URL: graphrag_chunking-3.0.1.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for graphrag_chunking-3.0.1.tar.gz
Algorithm Hash digest
SHA256 52eaed9c405ae6b4244297a3f9ed99f7d64737452bef8c2f15a501d1f828e8d5
MD5 b8dc4f8ad60da0c671693f6ae96071b3
BLAKE2b-256 0fb0d22f608c54bda74b99c275ae19c570393e6a820eb00ff4875aef34335026

See more details on using hashes here.

File details

Details for the file graphrag_chunking-3.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for graphrag_chunking-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eac20da10b6be0e9e411bde8b3de1095679655cdae20e2db70736a2ac071a416
MD5 e3d45fb44c2ce933eb1d7200cca5051e
BLAKE2b-256 e632a57aa1d6cfba9c1f1a81fef3215ff9ba195766b00c8a8f5dc04b0e328d9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page