Skip to main content

End-to-end LLM Fine-tuning Framework

Project description

Adaptron

End-to-end LLM Fine-tuning Framework


Overview

Adaptron is a plugin-based framework that takes you from raw documents to a deployed, fine-tuned language model. It orchestrates six pipeline stages -- Ingest, Understand, Synthesize, Train, Evaluate, Deploy -- and exposes the entire workflow through a Python API, a CLI, a FastAPI backend, and a Next.js web UI.

Features

  • Multi-format ingestion -- PDF, DOCX, CSV, and SQL data sources
  • Semantic understanding -- chunking, entity extraction, quality scoring, schema inference
  • Instruction synthesis -- template-based training data generation
  • Multiple training strategies -- QLoRA (Unsloth+PEFT), Full Fine-Tuning, Continual Pre-Training, Distillation, DPO alignment
  • Domain evaluation -- automated scoring of fine-tuned models
  • One-click deployment -- Ollama, GGUF export, HuggingFace Hub push
  • Training Strategy Wizard -- answers seven questions, picks the best training mode and base model automatically
  • Playground -- chat with deployed models, compare outputs side-by-side, toggle RAG augmentation
  • Plugin system -- register custom ingesters, trainers, deployers, or any stage component
  • Event-driven pipeline -- real-time progress via EventBus and WebSocket

Quick Start

# Install with all optional dependencies
pip install -e ".[all]"

# Initialize a project
adaptron init --project-dir my-project
cd my-project

# Edit adaptron.yaml, then run the pipeline
adaptron run

Installation

Requires Python 3.11+.

# Core only (config, CLI, pipeline orchestrator)
pip install -e .

# With specific extras
pip install -e ".[train]"       # torch, transformers, PEFT, TRL, bitsandbytes
pip install -e ".[ingest]"      # pypdf, python-docx, pandas, sqlalchemy
pip install -e ".[understand]"  # spacy, sentence-transformers
pip install -e ".[rag]"         # chromadb, sentence-transformers
pip install -e ".[api]"         # fastapi, uvicorn, websockets
pip install -e ".[deploy]"      # huggingface-hub

# Everything
pip install -e ".[all]"

# Development (pytest, ruff, coverage)
pip install -e ".[dev]"

Usage

Python API

from adaptron.core.config import PipelineConfig, WizardAnswers

# Let the wizard pick the best strategy
answers = WizardAnswers(
    primary_goal="qa_docs",
    data_sources=["docs"],
    data_freshness="static",
    hardware="mid",
    timeline="medium",
    accuracy="professional",
    model_size="small",
)
config = PipelineConfig.from_wizard(answers)

print(config.base_model)       # e.g. Qwen/Qwen2.5-7B-Instruct
print(config.training_modes)   # e.g. ['qlora', 'rag']

# Or load from YAML
config = PipelineConfig.from_yaml("adaptron.yaml")

CLI Commands

adaptron version              # Show version
adaptron init                 # Create adaptron.yaml with defaults
adaptron run                  # Execute the pipeline
adaptron wizard               # Interactive training strategy wizard
adaptron playground           # Chat with a finetuned model via Ollama
adaptron playground --rag     # Chat with RAG context augmentation

Web UI

# Start the FastAPI backend
uvicorn adaptron.api.main:create_app --factory --reload

# In a separate terminal, start the Next.js frontend
cd web && npm install && npm run dev

Then open http://localhost:3000 to access the Wizard, Dashboard, and Playground.

Architecture

                         adaptron.yaml
                              |
                              v
  +-------+  +------------+  +------------+  +-------+  +----------+  +--------+
  |Ingest | ->| Understand | ->| Synthesize | ->| Train | ->| Evaluate | ->| Deploy |
  +-------+  +------------+  +------------+  +-------+  +----------+  +--------+
   PDF,DOCX   Chunker,        Instruction     QLoRA,     Domain        Ollama,
   CSV,SQL    Entities,        templates      FullFT,    scoring       GGUF,
              Quality,                        CPT,DPO,                 HF Hub
              Schema                          Distill

  All stages are plugins registered in the global PluginRegistry.
  The PipelineOrchestrator runs them sequentially, emitting events
  via EventBus that the WebSocket API streams to the frontend.

Plugin System

Every pipeline component is a plugin. Register your own with the @register_plugin decorator:

from adaptron.core.registry import register_plugin
from adaptron.ingest.base import BaseIngester
from adaptron.ingest.models import DataSource, RawDocument

@register_plugin("ingester", "my_custom")
class MyCustomIngester(BaseIngester):
    def ingest(self, source: DataSource) -> list[RawDocument]:
        # Your custom ingestion logic
        ...

    def supported_types(self) -> list[str]:
        return ["my_custom"]

Retrieve plugins at runtime:

from adaptron.core.registry import global_registry

ingester_cls = global_registry.get("ingest", "my_custom")

Pipeline Stages

Stage Module Plugins Description
Ingest adaptron.ingest pdf, docx, csv, sql Extract text and metadata from data sources
Understand adaptron.understand chunker, entities, quality, schema Semantic chunking, entity extraction, quality scoring, schema inference
Synthesize adaptron.synthesize instruction Generate instruction-response training pairs from chunks
Train adaptron.train qlora, full_ft, cpt, distill, alignment (DPO) Fine-tune or align the base model
Evaluate adaptron.evaluate domain Score model outputs against domain-specific criteria
Deploy adaptron.deploy ollama, gguf, huggingface Export and deploy the fine-tuned model

Playground

The playground lets you interact with deployed models:

  • Chat mode -- streaming conversation with any Ollama-hosted model
  • RAG toggle -- augment prompts with relevant context from ChromaDB
  • Comparison mode -- send the same prompt to two models side-by-side (web UI)
  • CLI access -- adaptron playground --model adaptron-mymodel --rag

Configuration

Adaptron uses a YAML configuration file (adaptron.yaml):

# Wizard answers -- drive automatic strategy selection
wizard:
  primary_goal: qa_docs          # qa_docs | erp_edw | report_gen | specialist
  data_sources:
    - docs                       # docs | erp | edw
  data_freshness: static         # static | monthly | daily | realtime
  hardware: mid                  # low | mid | high | cloud
  timeline: medium               # fast | medium | long | unlimited
  accuracy: professional         # professional | enterprise | mission
  model_size: small              # tiny | small | medium | large

# Manual overrides
overrides:
  epochs: 3
  learning_rate: 0.0002
  batch_size: 4
  lora_rank: 64
  max_seq_length: 2048
  quantization: Q4_K_M

data:
  input_dir: ./data
  output_dir: ./output

deploy:
  targets:
    - gguf
    - ollama

Development

# Clone and install in dev mode
git clone <repo-url> && cd Adaptron
pip install -e ".[all,dev]"

# Run the test suite
pytest --tb=short -q

# Lint
ruff check adaptron/ tests/

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptron-0.1.0.tar.gz (126.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adaptron-0.1.0-py3-none-any.whl (96.0 kB view details)

Uploaded Python 3

File details

Details for the file adaptron-0.1.0.tar.gz.

File metadata

  • Download URL: adaptron-0.1.0.tar.gz
  • Upload date:
  • Size: 126.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for adaptron-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e43f87c9b48eb599a2ac34289287adf60fbfad188d1c32138a32f8afa6077aa2
MD5 4a2d3aead1b5961a514e3c8ebe710d41
BLAKE2b-256 d87b776a0802cc16de0d27067deca2c857606e1dbad4ac9fadabb629e9b65027

See more details on using hashes here.

File details

Details for the file adaptron-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: adaptron-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 96.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for adaptron-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1fe5eaee467d341b3128d3c0232abc1ad1f39e668534b3e7423689bf8ac6138a
MD5 6c5bfb05e140c922f2990f28b958e01d
BLAKE2b-256 e3357a1a96b6fa9282f4955512188a7b1672ea79ac838250873dd57cc118dbcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page