Skip to main content

Autonomous Retrieval Optimization for RAG

Project description

AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

PyPI Version Documentation License

AutoChunks is a specialized engine designed to eliminate the guesswork from Retrieval-Augmented Generation (RAG). By treating chunking as an optimization problem rather than a set of heuristics, it empirically discovers the most performant data structures for your specific documents and retrieval models.

AutoChunks Architecture


🚀 Key Features

  • The Vectorized Tournament: Runs parallel searches across 15+ strategy families (Recursive, Semantic, Layout-Aware) using NumPy-accelerated simulation.
  • Adversarial Synthetic QA: Automatically generates "needle-in-a-haystack" QA pairs to test your data structure against real-world search intent.
  • Multi-Objective Optimization: Align engineering with goals like Speed and Precision, Cost Efficiency, or Comprehensive Recall.
  • Framework Native: Built-in bridges for LangChain, LlamaIndex, and Haystack.
  • Enterprise Ready: Air-gap compatible, local model support, and SHA-256 binary fingerprinting.

📦 Installation

Install the stable version from PyPI:

pip install autochunks

For GPU acceleration or RAGAS semantic evaluation, see the Advanced Installation Guide.


🛠️ Usage

Launch the Dashboard

The easiest way to optimize is through the interactive visual dashboard:

autochunks serve

Navigate to http://localhost:8000 to start your first optimization run.

CLI Optimization

Search for the best plan directly from the terminal:

autochunks optimize --docs ./my_data_folder --mode light --objective balanced

Python API

from autochunk import AutoChunker

# Initialize and Discover the optimal plan
optimizer = AutoChunker(mode="light")
plan, report = optimizer.optimize(documents="./my_data", objective="balanced")

# Apply the winning strategy
chunks = plan.apply("./new_documents", optimizer)

Development

If you want to contribute or build from source:

  1. Clone the repository:

    git clone https://github.com/s8ilabs/AutoChunks.git
    cd AutoChunks
    
  2. Setup virtual environment:

    python -m venv venv
    source venv/bin/activate  # venv\Scripts\activate on Windows
    
  3. Install in editable mode:

    pip install -e .
    
  4. Running Tests:

    pytest tests/
    

📖 Documentation and Resources


Developed with ❤️ for the RAG and LLM Community. AutoChunks is released under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autochunks-0.0.9.tar.gz (75.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autochunks-0.0.9-py3-none-any.whl (99.9 kB view details)

Uploaded Python 3

File details

Details for the file autochunks-0.0.9.tar.gz.

File metadata

  • Download URL: autochunks-0.0.9.tar.gz
  • Upload date:
  • Size: 75.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochunks-0.0.9.tar.gz
Algorithm Hash digest
SHA256 5d05d09b448e45290d39c1dc0b534d688c66cfdbda529686089ce555d0c39294
MD5 96eeeb992d46342b015085f4bfaa6c0f
BLAKE2b-256 f9355bc4a626c6ac278b77e2adfa329f4afae7c6ec7af59bbddbaa3ebf3ae577

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochunks-0.0.9.tar.gz:

Publisher: publish.yml on s8ilabs/AutoChunks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file autochunks-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: autochunks-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 99.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autochunks-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 4eac076461a407344b5a99c1c039e4bd6f6fcf8b316961427a4a4691f8420e6c
MD5 85b1355d6af86586ad84e8a75ed77624
BLAKE2b-256 e08bf8a796321bceda85054ebc60a05a991981353122f2aa34587c30d17e3c5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for autochunks-0.0.9-py3-none-any.whl:

Publisher: publish.yml on s8ilabs/AutoChunks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page