Create, retrain, ablate, and convert models for local Ollama
Project description
ollama-forge
Get models from Hugging Face, convert them, add adapters, and run them in Ollama — without needing deep expertise. One place for fetch, convert, adapters, and recipes.
Install: pip install ollama-forge or uv tool install ollama-forge — PyPI. From this repo: uv sync then uv run ollama-forge; or uv tool install . to use the CLI from anywhere.
Quick start:
ollama-forge fetch TheBloke/Llama-2-7B-GGUF --name my-model && ollama run my-model
Or the shortest path: ollama-forge start --name my-model then ollama run my-model.
Documentation (Wiki)
Detailed guides live in the wiki/:
| Topic | Description |
|---|---|
| Installation | Setup, check, doctor, setup-llama-cpp |
| Quick Start | start / quickstart, profiles, task presets |
| Auto & Plan | Auto-detect source, dry-run planner |
| Fetch & Convert | GGUF from HF, GGUF file → Ollama |
| Recipes | One-file YAML/JSON build |
| Modelfile | Ollama Modelfile basics |
| Adapters | LoRA: search, recommend, fetch-adapter, retrain |
| Training Data | JSONL validate, prepare, train script |
| Retrain Pipeline | Data → adapter → Ollama |
| Abliterate | Refusal removal |
| Security Eval | LLM security evaluation: prompt sets, KPIs, UI |
| Downsizing | Teacher → student distillation |
| Hugging Face without GGUF | When the repo has no GGUF |
| Quantization | Smaller/faster GGUF (Q4_K_M, Q8_0, etc.) |
| CI / Automation | Example GitHub Actions |
| Command Reference | All commands |
Why ollama-forge
- One place — Fetch, convert, adapters, recipes; no scattered scripts.
- Simple — Clear commands and docs; try things without being an ML expert.
- Local-first — Get models running in Ollama on your machine.
Setup (one-time)
- Python 3.10+. From PyPI:
pip install ollama-forgeoruv tool install ollama-forge(PyPI). From repo:uv syncthenuv run ollama-forge; useuv tool install .from the repo root to putollama-forgeon your PATH. - Ollama — Install and ensure
ollamais on your PATH. - Verify:
ollama-forge check— see what’s installed.ollama-forge doctorfor diagnosis;doctor --fixto apply safe fixes. See Installation for optional llama.cpp (finetune/quantize). - Optional extras:
pip install ollama-forge[net]addsrequestsfor HTTP paths (proxy, security-eval, download-lists);ollama-forge[abliterate]for abliterate run/proxy (see Abliterate). - Optional: Run Ruff and tests before commit/push:
git config core.hooksPath .githooks. See .githooks/README.md. To fix lint before pushing without hooks:./scripts/lint-fix.sh.
Commands at a glance
| What you want | Command |
|---|---|
| Easiest one-command start | start or quickstart [--name my-model] |
| Auto-detect source and run | auto <source> [--name my-model] |
| Preview operations (dry-run) | plan <quickstart|auto|doctor-fix|adapters-apply> ... |
| GGUF from HF → Ollama | fetch <repo_id> --name <name> |
| GGUF file → Ollama | convert --gguf <path> --name <name> |
| Find / use adapters | adapters search, adapters recommend, fetch-adapter, retrain |
| One-file config build | build recipe.yaml |
| Check / fix environment | check, doctor [--fix] |
| Install llama.cpp | setup-llama-cpp |
Full list: Command Reference. Run ollama-forge --help for options.
Simplest workflows
Beginner (one command):
uv run ollama-forge start --name my-model
ollama run my-model
Uses default model + balanced profile. Use --profile fast|balanced|quality|low-vram and --task chat|coding|creative. See Quick Start.
Auto (any source): Recipe, GGUF path, HF repo, base model, or adapter — the tool detects and runs the right flow:
uv run ollama-forge auto ./recipe.yaml
uv run ollama-forge auto TheBloke/Llama-2-7B-GGUF --name my-model
uv run ollama-forge auto llama3.2 --name my-assistant --system "You are helpful."
See Auto & Plan.
Fetch from Hugging Face: When the repo has GGUF files:
uv run ollama-forge fetch TheBloke/Llama-2-7B-GGUF --name my-model
ollama run my-model
Use --quant Q4_K_M to pick size. For gated or private repos, set HF_TOKEN or run huggingface-cli login. See Fetch & Convert.
Local GGUF: uv run ollama-forge convert --gguf /path/to/model.gguf --name my-model. Optional --quantize Q4_K_M (needs llama.cpp on PATH). See Quantization.
Recipe (one file): uv run ollama-forge build recipe.yaml. See Recipes for format and examples. Sampling options (temperature, top_p, repeat_penalty) are available on fetch, convert, build, and create-from-base (Modelfile, Recipes).
Adapters: adapters search "llama lora", then fetch-adapter <repo> --base <base> --name <name>, or retrain --base <base> --adapter <path> --name <name>. See Adapters.
Training data → model: Validate JSONL, prepare for trainer, generate script: train --data ./data/ --base llama3.2 --name my-model --write-script train.sh. See Training Data and Retrain Pipeline.
Other topics
- Hugging Face repo without GGUF — Convert with llama.cpp first, then
convert. Wiki. - Refusal removal (abliterate) —
abliterate compute-dir; optional deps:uv sync --extra abliterate. For agents with tool support use the lightweight proxy:abliterate proxy --name <name>. Wiki. - Downsizing (distillation) —
downsize --teacher <hf> --student <hf> --name <name>. Wiki. - LLM security evaluation — Run prompt sets against Ollama/serve, score refusal/compliance, get ASR and KPIs:
security-eval run <prompt_set>. Optional UI:uv sync --extra security-eval-uithensecurity-eval ui. Wiki: Security Eval. - CI — Example GitHub Actions in CI / Automation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollama_forge-1.0.1.tar.gz.
File metadata
- Download URL: ollama_forge-1.0.1.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d02ef925e6b91c11fabc94e2096a42ed36acad2a64070b7d0dfc8288d0591a11
|
|
| MD5 |
d2d328fa278cbdb5076f036dba28a8df
|
|
| BLAKE2b-256 |
93ea13c2c92dc22746e93f2a077ae50c80209b86a0851575f7e442b4c3dbf1a8
|
File details
Details for the file ollama_forge-1.0.1-py3-none-any.whl.
File metadata
- Download URL: ollama_forge-1.0.1-py3-none-any.whl
- Upload date:
- Size: 857.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d823b39bcc3f7fcf5878fe1e7f864c8d6ceb3e7e980ec63fc67c26bc9cd008c
|
|
| MD5 |
00b00f2801bb2d90177d363c97c11f6f
|
|
| BLAKE2b-256 |
3baf06a538d4423f7c008c9370c44705ab6db90b6d5e60f95f2b0dcd8f9bdeb2
|