Skip to main content

Autonomous Retrieval Optimization for RAG

Project description

AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

PyPI Version Documentation License

AutoChunks is a specialized engine designed to eliminate the guesswork from Retrieval-Augmented Generation (RAG). By treating chunking as an optimization problem rather than a set of heuristics, it empirically discovers the most performant data structures for your specific documents and retrieval models.

AutoChunks Architecture


🚀 Key Features

  • The Vectorized Tournament: Runs parallel searches across 15+ strategy families (Recursive, Semantic, Layout-Aware) using NumPy-accelerated simulation.
  • Adversarial Synthetic QA: Automatically generates "needle-in-a-haystack" QA pairs to test your data structure against real-world search intent.
  • Multi-Objective Optimization: Align engineering with goals like Speed and Precision, Cost Efficiency, or Comprehensive Recall.
  • Framework Native: Built-in bridges for LangChain, LlamaIndex, and Haystack.
  • Enterprise Ready: Air-gap compatible, local model support, and SHA-256 binary fingerprinting.

📦 Installation

Install the stable version from PyPI:

pip install autochunks

For GPU acceleration or RAGAS semantic evaluation, see the Advanced Installation Guide.


🛠️ Usage

Launch the Dashboard

The easiest way to optimize is through the interactive visual dashboard:

autochunks serve

Navigate to http://localhost:8000 to start your first optimization run.

CLI Optimization

Search for the best plan directly from the terminal:

autochunks optimize --docs ./my_data_folder --mode light --objective balanced

Python API

from autochunk import AutoChunker

# Initialize and Discover the optimal plan
optimizer = AutoChunker(mode="light")
plan, report = optimizer.optimize(documents="./my_data", objective="balanced")

# Apply the winning strategy
chunks = plan.apply("./new_documents", optimizer)

Development

If you want to contribute or build from source:

  1. Clone the repository:

    git clone https://github.com/s8ilabs/AutoChunks.git
    cd AutoChunks
    
  2. Setup virtual environment:

    python -m venv venv
    source venv/bin/activate  # venv\Scripts\activate on Windows
    
  3. Install in editable mode:

    pip install -e .
    
  4. Running Tests:

    pytest tests/
    

📖 Documentation and Resources


Developed with ❤️ for the RAG and LLM Community. AutoChunks is released under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autochunks-0.0.12.tar.gz (91.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autochunks-0.0.12-py3-none-any.whl (117.2 kB view details)

Uploaded Python 3

File details

Details for the file autochunks-0.0.12.tar.gz.

File metadata

  • Download URL: autochunks-0.0.12.tar.gz
  • Upload date:
  • Size: 91.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochunks-0.0.12.tar.gz
Algorithm Hash digest
SHA256 606317ec400cf0874b91cc4123cbb962b59b76daf54deebeba99aa032417049c
MD5 3ccad8eb42330e6725601872f01569e4
BLAKE2b-256 bde2143bdbf41b1863231ed1e62714054637fe60790581d1ec0e61f347afb11d

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochunks-0.0.12.tar.gz:

Publisher: publish.yml on s8ilabs/AutoChunks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file autochunks-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: autochunks-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 117.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochunks-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 5304af2136424b3971cc311430490501ddb0b7e050b2ad5bc8553fddc2eca71c
MD5 76ffa1a7cf56f84944a79cb0f4ca7adb
BLAKE2b-256 28ffd648b16231bdd6cd3d728d41ebb4e9eac2ec6987f1ec4c4d395cae2eb0c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochunks-0.0.12-py3-none-any.whl:

Publisher: publish.yml on s8ilabs/AutoChunks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page