Train small LLMs and deploy them for fast structured extraction on CPU
Project description
Fuse
Train small LLMs and deploy them for fast structured extraction on CPU.
Fuse lets you pull any GGUF model from HuggingFace, run zero-shot structured extraction with dynamic schemas, fine-tune with LoRA via Unsloth/HuggingFace, and export to GGUF for fast CPU inference. No predefined Pydantic models required.
Install
With uv (recommended)
uv add fusellm
With training support:
uv add "fusellm[training]"
Run without installing
# One-shot extraction — no install needed
uvx fusellm extract "Sarah Chen is a 34-year-old architect at Stripe" \
--model bartowski/Llama-3.2-1B-Instruct-GGUF \
--fields "name:str,age:int,job_title:str"
# Or with a config file
uvx fusellm extract "SpaceX was founded in 2002" \
--config extract_company.yaml
With pip
pip install fusellm
pip install "fusellm[training]"
Quick Start
Pull a model from HuggingFace and extract
import fuse
# Auto-downloads the best Q4 GGUF from HuggingFace Hub
backend = fuse.LlamaCppBackend(model_name="bartowski/Llama-3.2-1B-Instruct-GGUF")
extractor = fuse.Extractor(backend)
# Zero-shot structured extraction — no Pydantic model needed
result = extractor.extract_from_fields(
"Sarah Chen is a 34-year-old software architect at Stripe.",
{"name": str, "age": int, "job_title": str, "company": str}
)
# {'name': 'Sarah Chen', 'age': 34, 'job_title': 'software architect', 'company': 'Stripe'}
Use a local GGUF model
backend = fuse.LlamaCppBackend(model_path="./models/llama-3.2-1b-q4.gguf")
extractor = fuse.Extractor(backend)
result = extractor.extract_from_fields(
"John is 30 years old and knows Python and Rust",
{"name": str, "age": int, "skills": list[str]}
)
# {'name': 'John', 'age': 30, 'skills': ['Python', 'Rust']}
Config-driven extraction
config = fuse.InferenceConfig(
model_name="bartowski/Phi-4-mini-instruct-GGUF",
n_ctx=4096,
n_threads=8,
temperature=0.0,
)
backend = fuse.LlamaCppBackend.from_config(config)
Extract from a JSON schema
schema = fuse.SchemaBuilder.from_json_schema({
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"skills": {"type": "array", "items": {"type": "string"}},
},
"required": ["name", "age"],
})
result = extractor.extract("John is 30 and knows Rust", schema)
Let the LLM infer the schema
result = extractor.extract_from_description(
"The Series A raised $15M from Sequoia, following a $2.5M seed from YC.",
"Extract monetary amounts, funding round type, and investor names"
)
CLI
Extract with a config file
fuse extract "Sarah Chen is a 34-year-old architect at Stripe" \
--config examples/extract_person.yaml
extract_person.yaml:
model:
model_name: "bartowski/Llama-3.2-1B-Instruct-GGUF"
n_ctx: 2048
temperature: 0.0
fields:
name: str
age: int
job_title: str
company: str
prompt_format: llama
max_tokens: 256
Extract with inline flags
# HuggingFace model — auto-downloads
fuse extract "SpaceX was founded in 2002" \
--model bartowski/Phi-4-mini-instruct-GGUF \
--fields "company:str,year:int,industry:str"
# Local GGUF model
fuse extract "John is 30" \
--model ./model.gguf \
--fields "name:str,age:int"
# Using a JSON schema file
fuse extract "John is 30 and knows Python" \
--model bartowski/Llama-3.2-1B-Instruct-GGUF \
--schema schema.json
Train
fuse train --config examples/train_extraction.yaml
Quantize to GGUF
fuse quantize --model ./output --output model.gguf --method q4_0
Supported Models
Any GGUF model on HuggingFace works. Some good small models for CPU extraction:
| Model | Size | HuggingFace Repo |
|---|---|---|
| Llama 3.2 1B Instruct | ~1GB Q4 | bartowski/Llama-3.2-1B-Instruct-GGUF |
| Llama 3.2 3B Instruct | ~2GB Q4 | bartowski/Llama-3.2-3B-Instruct-GGUF |
| Qwen 2.5 1.5B Instruct | ~1GB Q4 | bartowski/Qwen2.5-1.5B-Instruct-GGUF |
| Phi-4 Mini Instruct | ~2.5GB Q4 | bartowski/Phi-4-mini-instruct-GGUF |
Models are auto-downloaded and cached to ~/.cache/fuse/models/.
Development
uv sync --extra dev
uv run nox # All CI checks
uv run nox -s lint # Ruff lint + format
uv run nox -s typecheck # ty type check
uv run nox -s tests # Pytest across Python 3.11-3.13
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fusellm-0.0.2.tar.gz.
File metadata
- Download URL: fusellm-0.0.2.tar.gz
- Upload date:
- Size: 200.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3322db58c6573201e14b11649b9796e0eea7f397220e56f3fcbba662da629142
|
|
| MD5 |
b94d711d4fffc29aaec18c5adc0cdee6
|
|
| BLAKE2b-256 |
d940587544fef5b7aff9c50adfede7dfde1c183e80929ba9246424d5161c305a
|
Provenance
The following attestation bundles were made for fusellm-0.0.2.tar.gz:
Publisher:
release.yml on sandeep-selvaraj/fuse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fusellm-0.0.2.tar.gz -
Subject digest:
3322db58c6573201e14b11649b9796e0eea7f397220e56f3fcbba662da629142 - Sigstore transparency entry: 1154468214
- Sigstore integration time:
-
Permalink:
sandeep-selvaraj/fuse@f0f7535c17e11206f5dead8cef641e073e35f3f1 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/sandeep-selvaraj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f0f7535c17e11206f5dead8cef641e073e35f3f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fusellm-0.0.2-py3-none-any.whl.
File metadata
- Download URL: fusellm-0.0.2-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b58162254c57cf9d63430de76355834675d66efbe16442a361f2be97f7baef9
|
|
| MD5 |
024c5b8fcaa13db5571a1a82abdb0fa4
|
|
| BLAKE2b-256 |
c37b0fd05dfafa1ea7fc3b94c0ba3a655507a100f84339e2b28fbb1d2da67b97
|
Provenance
The following attestation bundles were made for fusellm-0.0.2-py3-none-any.whl:
Publisher:
release.yml on sandeep-selvaraj/fuse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fusellm-0.0.2-py3-none-any.whl -
Subject digest:
0b58162254c57cf9d63430de76355834675d66efbe16442a361f2be97f7baef9 - Sigstore transparency entry: 1154468215
- Sigstore integration time:
-
Permalink:
sandeep-selvaraj/fuse@f0f7535c17e11206f5dead8cef641e073e35f3f1 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/sandeep-selvaraj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f0f7535c17e11206f5dead8cef641e073e35f3f1 -
Trigger Event:
push
-
Statement type: