Autonomous Retrieval Optimization for RAG

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

s8ilabs

These details have not been verified by PyPI

Project links

Documentation

Project description

AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

AutoChunks Hero

AutoChunks is a specialized engine designed to eliminate the guesswork from Retrieval-Augmented Generation (RAG). By treating chunking as an optimization problem rather than a set of heuristics, it empirically discovers the most performant data structures for your specific documents and retrieval models.

From Heuristics to Evidence

Most RAG pipelines today rely on arbitrary settings—like a 512-token chunk size with a 10% overlap. These values are often chosen without validation, leading to:

Fragmented Context: Related information is split across multiple retrieval units.
Semantic Noise: Poorly defined boundaries dilute the signal-to-noise ratio in LLM prompts.
Retrieval Gaps: Critical information hidden in "dead zones" between chunks results in recall failure.

AutoChunks replaces trial-and-error with a data-driven tournament. It generates adversarial synthetic ground truth from your documents and pits over 15+ chunking strategies against each other to find the mathematical optimum for your corpus.

Core Pillars

The Vectorized Tournament

AutoChunks runs an exhaustive parallel search across multiple strategy families—Recursive, Semantic, Layout-Aware, and Hybrid. Every candidate is evaluated in a high-speed NumPy-accelerated retrieval simulation, measuring performance across hundreds of queries in seconds.

Adversarial Synthetic QA

The system performs a structural audit of your documents to generate "needle-in-a-haystack" question-answer pairs. This ensures that your chunking strategy is optimized against real-world search intent, not just random text splits.

Optimization Goals

Align your data engineering with your business objectives. Choose from intent-based presets that guide the engine toward specific outcomes:

Balanced Ranking: Optimizes for general-purpose retrieval quality.
Speed and Precision: Minimizes LLM reading time by prioritizing Rank #1 hits.
Comprehensive Retrieval: Prioritizes recall for compliance or legal use cases.
Cost Efficiency: Minimizes vector storage and inference costs for massive datasets.

Advanced Feature Set

Hybrid Semantic-Statistical Chunker: Uses real-time embedding distance analysis to detect topic shifts while maintaining strict token limits.
Framework Bridges: Native adapters for LangChain, LlamaIndex, and Haystack, allowing you to benchmark and optimize your existing framework code directly.
Layout-Aware Processing: High-fidelity extraction that respects the nested structures of PDFs, HTML sections, and Markdown hierarchies.
Fidelity Inspector: A visual debugging dashboard to qualitatively verify how different strategies fragment complex documents.
Enterprise Security: Air-gap compatible. Supports local model deployment, SHA-256 binary fingerprinting for data privacy, and SecretStr protection for all cloud credentials.

Quick Start

Installation

pip install -r requirements.txt

Note: For GPU acceleration with Local Embeddings or Ragas, please refer to the Getting Started guide.

Launch the Dashboard

The most effective way to optimize your data is through the visual interactive dashboard.

python -m autochunk.web.server

Navigate to http://localhost:8000 to begin your first optimization run.

Python API

from autochunk import AutoChunker

# Initialize in Light Mode for rapid iteration
optimizer = AutoChunker(mode="light")

# Discover the optimal plan for your dataset
plan, report = optimizer.optimize(
    documents_path="./my_data_folder",
    objective="balanced"
)

# Apply the winning strategy
chunks = plan.apply("./new_documents", optimizer)

Documentation and Resources

Developed for the RAG and LLM Community. AutoChunks is released under the Apache License 2.0.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

s8ilabs

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.0.12

Feb 3, 2026

0.0.9

Feb 3, 2026

This version

0.0.8

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autochunks-0.0.8.tar.gz (72.8 kB view details)

Uploaded Feb 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autochunks-0.0.8-py3-none-any.whl (91.7 kB view details)

Uploaded Feb 2, 2026 Python 3

File details

Details for the file autochunks-0.0.8.tar.gz.

File metadata

Download URL: autochunks-0.0.8.tar.gz
Upload date: Feb 2, 2026
Size: 72.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochunks-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`a7d796d1b62dcd3c084436879831e6eb0636960ca4e57d18bd759e13ec4ab48a`
MD5	`d452ed2f446913c7765aead22ddca824`
BLAKE2b-256	`404daf55c5dbcf42114358cad5c850d3354c360bd8348c00a016868e596e4e35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochunks-0.0.8.tar.gz:

Publisher: publish.yml on s8ilabs/AutoChunks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autochunks-0.0.8.tar.gz
- Subject digest: a7d796d1b62dcd3c084436879831e6eb0636960ca4e57d18bd759e13ec4ab48a
- Sigstore transparency entry: 903844130
- Sigstore integration time: Feb 2, 2026
Source repository:
- Permalink: s8ilabs/AutoChunks@4f7ef796472ed9359448f73bcb6d81ab87f5b532
- Branch / Tag: refs/tags/v0.0.8
- Owner: https://github.com/s8ilabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4f7ef796472ed9359448f73bcb6d81ab87f5b532
- Trigger Event: push

File details

Details for the file autochunks-0.0.8-py3-none-any.whl.

File metadata

Download URL: autochunks-0.0.8-py3-none-any.whl
Upload date: Feb 2, 2026
Size: 91.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochunks-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`817ce4042415c81db012feada87db6691eb6e2f3a8a98f32069ea8fa4ad0466d`
MD5	`c3d12fc1ff11eb85bc23a2b87bae0594`
BLAKE2b-256	`44147018e62dbb1983cbf5b0b18ae908c5bafe3c1d414176d6f87484f9968369`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochunks-0.0.8-py3-none-any.whl:

Publisher: publish.yml on s8ilabs/AutoChunks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autochunks-0.0.8-py3-none-any.whl
- Subject digest: 817ce4042415c81db012feada87db6691eb6e2f3a8a98f32069ea8fa4ad0466d
- Sigstore transparency entry: 903844206
- Sigstore integration time: Feb 2, 2026
Source repository:
- Permalink: s8ilabs/AutoChunks@4f7ef796472ed9359448f73bcb6d81ab87f5b532
- Branch / Tag: refs/tags/v0.0.8
- Owner: https://github.com/s8ilabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4f7ef796472ed9359448f73bcb6d81ab87f5b532
- Trigger Event: push

autochunks 0.0.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

From Heuristics to Evidence

Core Pillars

The Vectorized Tournament

Adversarial Synthetic QA

Optimization Goals

Advanced Feature Set

Quick Start

Installation

Launch the Dashboard

Python API

Documentation and Resources

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance