End-to-end LLM Fine-tuning Framework
Project description
Adaptron
End-to-end LLM Fine-tuning Framework
Overview
Adaptron is a plugin-based framework that takes you from raw documents to a deployed, fine-tuned language model. It orchestrates six pipeline stages -- Ingest, Understand, Synthesize, Train, Evaluate, Deploy -- and exposes the entire workflow through a Python API, a CLI, a FastAPI backend, and a Next.js web UI.
Features
- Multi-format ingestion -- PDF, DOCX, CSV, and SQL data sources
- Semantic understanding -- chunking, entity extraction, quality scoring, schema inference
- Instruction synthesis -- template-based training data generation
- Multiple training strategies -- QLoRA (Unsloth+PEFT), Full Fine-Tuning, Continual Pre-Training, Distillation, DPO alignment
- Domain evaluation -- automated scoring of fine-tuned models
- One-click deployment -- Ollama, GGUF export, HuggingFace Hub push
- Training Strategy Wizard -- answers seven questions, picks the best training mode and base model automatically
- Playground -- chat with deployed models, compare outputs side-by-side, toggle RAG augmentation
- Plugin system -- register custom ingesters, trainers, deployers, or any stage component
- Event-driven pipeline -- real-time progress via EventBus and WebSocket
Quick Start
# Install with all optional dependencies
pip install -e ".[all]"
# Initialize a project
adaptron init --project-dir my-project
cd my-project
# Edit adaptron.yaml, then run the pipeline
adaptron run
Installation
Requires Python 3.11+.
# Core only (config, CLI, pipeline orchestrator)
pip install -e .
# With specific extras
pip install -e ".[train]" # torch, transformers, PEFT, TRL, bitsandbytes
pip install -e ".[ingest]" # pypdf, python-docx, pandas, sqlalchemy
pip install -e ".[understand]" # spacy, sentence-transformers
pip install -e ".[rag]" # chromadb, sentence-transformers
pip install -e ".[api]" # fastapi, uvicorn, websockets
pip install -e ".[deploy]" # huggingface-hub
# Everything
pip install -e ".[all]"
# Development (pytest, ruff, coverage)
pip install -e ".[dev]"
Usage
Python API
from adaptron.core.config import PipelineConfig, WizardAnswers
# Let the wizard pick the best strategy
answers = WizardAnswers(
primary_goal="qa_docs",
data_sources=["docs"],
data_freshness="static",
hardware="mid",
timeline="medium",
accuracy="professional",
model_size="small",
)
config = PipelineConfig.from_wizard(answers)
print(config.base_model) # e.g. Qwen/Qwen2.5-7B-Instruct
print(config.training_modes) # e.g. ['qlora', 'rag']
# Or load from YAML
config = PipelineConfig.from_yaml("adaptron.yaml")
CLI Commands
adaptron version # Show version
adaptron init # Create adaptron.yaml with defaults
adaptron run # Execute the pipeline
adaptron wizard # Interactive training strategy wizard
adaptron playground # Chat with a finetuned model via Ollama
adaptron playground --rag # Chat with RAG context augmentation
Web UI
# Start the FastAPI backend
uvicorn adaptron.api.main:create_app --factory --reload
# In a separate terminal, start the Next.js frontend
cd web && npm install && npm run dev
Then open http://localhost:3000 to access the Wizard, Dashboard, and Playground.
Architecture
adaptron.yaml
|
v
+-------+ +------------+ +------------+ +-------+ +----------+ +--------+
|Ingest | ->| Understand | ->| Synthesize | ->| Train | ->| Evaluate | ->| Deploy |
+-------+ +------------+ +------------+ +-------+ +----------+ +--------+
PDF,DOCX Chunker, Instruction QLoRA, Domain Ollama,
CSV,SQL Entities, templates FullFT, scoring GGUF,
Quality, CPT,DPO, HF Hub
Schema Distill
All stages are plugins registered in the global PluginRegistry.
The PipelineOrchestrator runs them sequentially, emitting events
via EventBus that the WebSocket API streams to the frontend.
Plugin System
Every pipeline component is a plugin. Register your own with the @register_plugin decorator:
from adaptron.core.registry import register_plugin
from adaptron.ingest.base import BaseIngester
from adaptron.ingest.models import DataSource, RawDocument
@register_plugin("ingester", "my_custom")
class MyCustomIngester(BaseIngester):
def ingest(self, source: DataSource) -> list[RawDocument]:
# Your custom ingestion logic
...
def supported_types(self) -> list[str]:
return ["my_custom"]
Retrieve plugins at runtime:
from adaptron.core.registry import global_registry
ingester_cls = global_registry.get("ingest", "my_custom")
Pipeline Stages
| Stage | Module | Plugins | Description |
|---|---|---|---|
| Ingest | adaptron.ingest |
pdf, docx, csv, sql |
Extract text and metadata from data sources |
| Understand | adaptron.understand |
chunker, entities, quality, schema |
Semantic chunking, entity extraction, quality scoring, schema inference |
| Synthesize | adaptron.synthesize |
instruction |
Generate instruction-response training pairs from chunks |
| Train | adaptron.train |
qlora, full_ft, cpt, distill, alignment (DPO) |
Fine-tune or align the base model |
| Evaluate | adaptron.evaluate |
domain |
Score model outputs against domain-specific criteria |
| Deploy | adaptron.deploy |
ollama, gguf, huggingface |
Export and deploy the fine-tuned model |
Playground
The playground lets you interact with deployed models:
- Chat mode -- streaming conversation with any Ollama-hosted model
- RAG toggle -- augment prompts with relevant context from ChromaDB
- Comparison mode -- send the same prompt to two models side-by-side (web UI)
- CLI access --
adaptron playground --model adaptron-mymodel --rag
Configuration
Adaptron uses a YAML configuration file (adaptron.yaml):
# Wizard answers -- drive automatic strategy selection
wizard:
primary_goal: qa_docs # qa_docs | erp_edw | report_gen | specialist
data_sources:
- docs # docs | erp | edw
data_freshness: static # static | monthly | daily | realtime
hardware: mid # low | mid | high | cloud
timeline: medium # fast | medium | long | unlimited
accuracy: professional # professional | enterprise | mission
model_size: small # tiny | small | medium | large
# Manual overrides
overrides:
epochs: 3
learning_rate: 0.0002
batch_size: 4
lora_rank: 64
max_seq_length: 2048
quantization: Q4_K_M
data:
input_dir: ./data
output_dir: ./output
deploy:
targets:
- gguf
- ollama
Development
# Clone and install in dev mode
git clone <repo-url> && cd Adaptron
pip install -e ".[all,dev]"
# Run the test suite
pytest --tb=short -q
# Lint
ruff check adaptron/ tests/
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adaptron-0.1.0.tar.gz.
File metadata
- Download URL: adaptron-0.1.0.tar.gz
- Upload date:
- Size: 126.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e43f87c9b48eb599a2ac34289287adf60fbfad188d1c32138a32f8afa6077aa2
|
|
| MD5 |
4a2d3aead1b5961a514e3c8ebe710d41
|
|
| BLAKE2b-256 |
d87b776a0802cc16de0d27067deca2c857606e1dbad4ac9fadabb629e9b65027
|
File details
Details for the file adaptron-0.1.0-py3-none-any.whl.
File metadata
- Download URL: adaptron-0.1.0-py3-none-any.whl
- Upload date:
- Size: 96.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fe5eaee467d341b3128d3c0232abc1ad1f39e668534b3e7423689bf8ac6138a
|
|
| MD5 |
6c5bfb05e140c922f2990f28b958e01d
|
|
| BLAKE2b-256 |
e3357a1a96b6fa9282f4955512188a7b1672ea79ac838250873dd57cc118dbcb
|