Autonomous LLM training research stack - build and ship autonomous research systems
Project description
Autonomous Research Stack
Build and ship autonomous LLM training research systems
Version: v0.7.3 | License: MIT | Python: 3.11+
An autonomous research stack for continuously improving LLM training through automated experimentation. Inspired by Karpathy's autoresearch, designed for single-GPU research labs.
Quick Start
# Install
pip install autoresearch-stack
# Or from source
git clone https://github.com/iknowkungfubar/autoresearch-stack.git
cd autoresearch-stack
pip install -e .
# Configure (at least one API key)
export ANTHROPIC_API_KEY=sk-ant-...
# or: export OPENAI_API_KEY=sk-...
# Run the data pipeline
autoresearch --prepare-only
# Run 10 autonomous experiments
autoresearch --experiments 10
# Run with custom config
autoresearch -c my_config.yaml -i training_data.txt --experiments 100
# Python module syntax also works
python -m autoresearch --help
Demo: Numpy Training (no GPU required)
# Test the training pipeline without PyTorch
python train_any_llm.py --demo
This runs a complete training loop using the numpy demo model, exercising the curriculum scheduler, loss tracking, and convergence detection — no GPU or PyTorch needed.
Features
Data Pipeline
| Module | What it does |
|---|---|
data_intelligence.py |
Corpus cleaning, noise detection, text repair |
synthetic_data.py |
LLM-powered generation with Evol-Instruct |
curriculum.py |
Adaptive scheduling (linear, exponential, step, adaptive) |
storage.py |
SQLite experiment database with JSONL fallback |
Experiment Engine
| Module | What it does |
|---|---|
memory.py |
Vector store with semantic search (ChromaDB optional) |
prioritization.py |
Bandit-based selection (UCB1, epsilon-greedy, Thompson) |
hypothesis.py |
LLM-driven hypothesis generation with rule-based fallback |
feedback.py |
Reward computation, failure classification (13 types) |
multi_agent.py |
Multi-agent architecture (research, hypothesis, execution, evaluation) |
Infrastructure
| Module | What it does |
|---|---|
sandbox.py |
Safe code execution with AST-based validation |
checkpoint.py |
State persistence and resume |
monitor.py |
Real-time status and progress bars |
daemon.py |
Background execution with health checks and auto-restart |
distribute.py |
Multi-node cluster management (Docker/K8s) |
LLM Integration
| Module | What it does |
|---|---|
providers.py |
17+ LLM providers (Anthropic, OpenAI, OpenRouter, Ollama, vLLM, etc.) |
orchestrators.py |
7 agent orchestrators (CrewAI, AutoGen, LangChain, etc.) |
train_any_llm.py |
Training abstraction (numpy demo + optional PyTorch) |
Reporting & Analysis
| Module | What it does |
|---|---|
report.py |
Markdown experiment reports with comparison |
figures.py |
Matplotlib visualizations with graceful fallback |
stats.py |
Summary statistics and convergence analysis |
paper.py |
Research paper generation (Markdown/LaTeX) |
peer_review.py |
Peer review simulation (5 reviewer profiles) |
Configuration
All configuration lives in config.yaml. Environment variables override YAML values:
export ANTHROPIC_API_KEY=sk-ant-... # API key (never put in config file!)
export EXPERIMENT_BUDGET=1000 # Override max experiments
export LEARNING_RATE=0.0005 # Override model LR
export SYNTHETIC_USE_LLM=true # Enable LLM data generation
export MEMORY_ENABLED=true # Enable vector memory
Provider Support
Cloud: Anthropic (Claude), OpenAI (GPT-4/4o), OpenRouter, Google Vertex AI, Azure OpenAI, Mistral AI, Cohere, Zen AI
Local: Ollama, vLLM, LM Studio, llama.cpp, LiteLLM, KoboldCPP, LocalAI, Text Generation WebUI
Orchestrators: OpenCode, OpenCrew, AgentForge, CrewAI, AutoGen, LangChain, LlamaIndex
The Metric
val_bpb (validation bits per byte) — Lower is better. The single optimization target.
Project Constraints (Never Changed)
- val_bpb is the ONLY metric
- ONE change per experiment
- Revert on regression
- Single-GPU focused
Development Status
|| Version | Status | Tests | Coverage | Type Safety | ||---------|--------|-------|----------|-------------| || v0.7.3 | Current | 148 ✅ | 73% | 0 mypy errors | | v0.7.2 | Shipped | 104 ✅ | 57% | 43 errors | | v0.7.0 | Shipped | 53 ✅ | — | — |
Testing
# Run all tests
pytest tests/ -q
# With coverage
pytest tests/ -q --cov=./
# Run specific test file
pytest tests/test_providers.py -v
Docker
docker build -t autoresearch-stack .
docker run --rm -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY autoresearch-stack
# Multi-node cluster
docker compose up
References
- Karpathy autoresearch — val_bpb metric
- Ouroboros — Self-modifying systems
- AI Scientist — Paper generation
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for detailed guidelines on our development process, coding standards, PR workflow, and code of conduct.
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autoresearch_stack-0.7.3.tar.gz.
File metadata
- Download URL: autoresearch_stack-0.7.3.tar.gz
- Upload date:
- Size: 120.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c977da021d78284b4e74faff23d3b71789ae6c57d5c9cabecd5c7d3240ff4f93
|
|
| MD5 |
5feb4f179c3d9a81c28991d34a24a1e6
|
|
| BLAKE2b-256 |
2fdb7bf9ed1d6b44ba854895f450c6376751a9d30ffe03d0bb8f85b88db48415
|
File details
Details for the file autoresearch_stack-0.7.3-py3-none-any.whl.
File metadata
- Download URL: autoresearch_stack-0.7.3-py3-none-any.whl
- Upload date:
- Size: 91.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13134168842dacef673ab917fe8ae528532d3f56ea9c1aa27f314a8994070dbf
|
|
| MD5 |
552421463baa8532bebac487aa3ec281
|
|
| BLAKE2b-256 |
6f881627526a1e1a26f39e0469e617ecccba79c610ec60eb5f239299d6ec1d4e
|