Autonomous Retrieval Optimization for RAG
Project description
AutoChunks
The Intelligent Data Optimization Layer for RAG Engineering
AutoChunks is a specialized engine designed to eliminate the guesswork from Retrieval-Augmented Generation (RAG). By treating chunking as an optimization problem rather than a set of heuristics, it empirically discovers the most performant data structures for your specific documents and retrieval models.
🚀 Key Features
- The Vectorized Tournament: Runs parallel searches across 15+ strategy families (Recursive, Semantic, Layout-Aware) using NumPy-accelerated simulation.
- Adversarial Synthetic QA: Automatically generates "needle-in-a-haystack" QA pairs to test your data structure against real-world search intent.
- Multi-Objective Optimization: Align engineering with goals like Speed and Precision, Cost Efficiency, or Comprehensive Recall.
- Framework Native: Built-in bridges for LangChain, LlamaIndex, and Haystack.
- Enterprise Ready: Air-gap compatible, local model support, and SHA-256 binary fingerprinting.
📦 Installation
Install the stable version from PyPI:
pip install autochunks
For GPU acceleration or RAGAS semantic evaluation, see the Advanced Installation Guide.
🛠️ Usage
Launch the Dashboard
The easiest way to optimize is through the interactive visual dashboard:
autochunks serve
Navigate to http://localhost:8000 to start your first optimization run.
CLI Optimization
Search for the best plan directly from the terminal:
autochunks optimize --docs ./my_data_folder --mode light --objective balanced
Python API
from autochunk import AutoChunker
# Initialize and Discover the optimal plan
optimizer = AutoChunker(mode="light")
plan, report = optimizer.optimize(documents="./my_data", objective="balanced")
# Apply the winning strategy
chunks = plan.apply("./new_documents", optimizer)
Development
If you want to contribute or build from source:
-
Clone the repository:
git clone https://github.com/s8ilabs/AutoChunks.git cd AutoChunks
-
Setup virtual environment:
python -m venv venv source venv/bin/activate # venv\Scripts\activate on Windows
-
Install in editable mode:
pip install -e .
-
Running Tests:
pytest tests/
📖 Documentation and Resources
- Full Documentation Portal
- PyPI Project Page
- Getting Started Guide
- The Optimization Lifecycle
- Metric Definitions and Scoring
Developed with ❤️ for the RAG and LLM Community. AutoChunks is released under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autochunks-0.0.12.tar.gz.
File metadata
- Download URL: autochunks-0.0.12.tar.gz
- Upload date:
- Size: 91.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
606317ec400cf0874b91cc4123cbb962b59b76daf54deebeba99aa032417049c
|
|
| MD5 |
3ccad8eb42330e6725601872f01569e4
|
|
| BLAKE2b-256 |
bde2143bdbf41b1863231ed1e62714054637fe60790581d1ec0e61f347afb11d
|
Provenance
The following attestation bundles were made for autochunks-0.0.12.tar.gz:
Publisher:
publish.yml on s8ilabs/AutoChunks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autochunks-0.0.12.tar.gz -
Subject digest:
606317ec400cf0874b91cc4123cbb962b59b76daf54deebeba99aa032417049c - Sigstore transparency entry: 907413348
- Sigstore integration time:
-
Permalink:
s8ilabs/AutoChunks@a8c4ae4020e53a99487da0c70e87956d5cc577f4 -
Branch / Tag:
refs/tags/v0.0.12 - Owner: https://github.com/s8ilabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8c4ae4020e53a99487da0c70e87956d5cc577f4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file autochunks-0.0.12-py3-none-any.whl.
File metadata
- Download URL: autochunks-0.0.12-py3-none-any.whl
- Upload date:
- Size: 117.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5304af2136424b3971cc311430490501ddb0b7e050b2ad5bc8553fddc2eca71c
|
|
| MD5 |
76ffa1a7cf56f84944a79cb0f4ca7adb
|
|
| BLAKE2b-256 |
28ffd648b16231bdd6cd3d728d41ebb4e9eac2ec6987f1ec4c4d395cae2eb0c2
|
Provenance
The following attestation bundles were made for autochunks-0.0.12-py3-none-any.whl:
Publisher:
publish.yml on s8ilabs/AutoChunks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
autochunks-0.0.12-py3-none-any.whl -
Subject digest:
5304af2136424b3971cc311430490501ddb0b7e050b2ad5bc8553fddc2eca71c - Sigstore transparency entry: 907413370
- Sigstore integration time:
-
Permalink:
s8ilabs/AutoChunks@a8c4ae4020e53a99487da0c70e87956d5cc577f4 -
Branch / Tag:
refs/tags/v0.0.12 - Owner: https://github.com/s8ilabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8c4ae4020e53a99487da0c70e87956d5cc577f4 -
Trigger Event:
push
-
Statement type: