Distill frontier model intelligence into tiny, local specialist models

These details have not been verified by PyPI

Project links

Project description

Crasis

Train once. Run forever. Pay nothing.

"I had 847 unread emails and was paying $34/month to run GPT-4 over my inbox to tell me which ones needed attention. Four minutes later I had an 11MB model that does the same job locally, processes my entire inbox in 11 seconds, costs nothing per inference, and my emails never leave my laptop. Crasis is the tool I built to do that."

Email Urgency Demo

The Problem

Current AI agents are brilliant generalists doing the work of specialists. You are burning tokens — and surrendering privacy — to answer questions like:

"Is this WhatsApp message asking about my pricing?"
"Does this email need a reply today?"
"Is this customer angry?"

A frontier model answering those questions is a nuclear weapon aimed at a mailbox. You're paying $20–50/month, waiting 2–5 seconds per query, and sending your private data to a cloud server — for a yes/no answer a 20MB model could give you in under 100ms, locally, for free, forever.

The token model is extractive by design. Every query is a toll. The provider is incentivized for you to stay dependent. There is no learning curve benefit passed to you. Ever.

Crasis breaks that.

The Solution

Crasis uses a frontier model once — to understand your problem and generate synthetic training data. That intelligence is then distilled into a tiny specialist model that lives on your device.

After that, the frontier model is never called again.

You describe your problem in plain English
            ↓
Crasis calls a frontier model once → generates training data
            ↓
Crasis trains a tiny specialist (4–160MB) on your hardware
            ↓
Specialist deploys locally — runs forever, zero API cost

The frontier model is the architect. The specialist is the worker. You only need the architect once.

The Numbers

	Frontier API	Crasis Specialist
Model size	4GB+	4–160MB
Cost per query	$0.001–0.01	$0.00
Latency	2–5 seconds	<100ms
Works offline	No	Yes
Data leaves device	Yes	Never
Gets cheaper over time	No	Already free
Accuracy on narrow tasks - synthetic data	~97%	~95–99%

See the SCORECARD for numbers on holdout/realistic data.

Pre-Built Specialists

Ten specialists, ready to pull. These cover the tasks people are most commonly paying frontier models to handle. Download, deploy, never pay for them again.

crasis pull whatsapp-triage      # Pricing/availability inquiry detector
crasis pull email-urgency        # Reply-now vs read-later classifier
crasis pull sentiment-gate       # High-arousal anger detection
crasis pull meeting-parser       # Extract who/when/what from scheduling messages
crasis pull pricing-detector     # Is this message asking about cost?
crasis pull spam-filter          # Personalizable noise classifier
crasis pull support-router       # Multi-class ticket categorization
crasis pull social-classifier    # Is this mention worth responding to?
crasis pull invoice-intent       # Payment/billing message detection
crasis pull availability-handler # Scheduling request → calendar link trigger

Each specialist:

Ships as an ONNX model — runs anywhere, no GPU required
Is 4–160MB on disk (most are under 11MB)
Classifies in under 100ms on a laptop CPU
Was trained on 3,000–10,000 synthetic examples generated from distillable frontier models
Comes with a spec, eval results, and example inference code

Evaluating Accuracy

Pre-built specialists ship with hand-authored holdout fixtures — real-world-style examples that were not generated by the training pipeline. These give an honest accuracy number, separate from the synthetic training metrics.

# Run holdout eval on any specialist
crasis eval -s specialists/spam-filter/spec.yaml \
    -m ./models/spam-filter-onnx \
    --holdout tests/fixtures/spam-filter.jsonl

Output shows both numbers side by side:

  Accuracy (synthetic) : 0.9920   ← how well it learned the training data
  Accuracy (holdout)   : 0.7000   ← how well it works on real text
  Synthetic-real gap   : +0.2920  ← the honest gap

Full results for all 10 specialists are in SCORECARD.md. The two-number format (synthetic + holdout) is how Crasis reports accuracy — because publishing only synthetic accuracy would be misleading.

Build Your Own

Three steps. One coffee.

1. Write a Spec

Describe your problem in plain English. A spec is not code — it's a contract.

# specs/refund-detector.yaml
crasis_spec: v1
name: refund-detector
description: "Detect when a customer message is explicitly requesting a refund"

task:
  type: binary_classification
  trigger: "Customer is asking for their money back"
  ignore: "Customer is complaining but not requesting a refund"

constraints:
  max_model_size_mb: 20
  max_inference_ms: 100
  connectivity: none

quality:
  min_accuracy: 0.95
  eval_on: [ambiguous_phrasing, angry_tone, multiple_languages]

training:
  strategy: synthetic
  volume: 5000

2. Generate Training Data

Crasis calls OpenRouter with enforce_distillable_text: true — routing only to models whose licenses explicitly permit their outputs to be used for training. No ToS violations. No banned keys. Clean provenance on every sample.

export OPENROUTER_API_KEY=sk-or-v1-...
crasis generate --spec specs/refund-detector.yaml --count 5000

Generates ~5,000 labeled examples. Takes ~45 minutes. Costs ~$15 in API credits.

3. Train the Specialist

Runs locally on your GPU. RTX 4060 completes a BERT-Tiny distillation in under 30 minutes.

crasis train --spec specs/refund-detector.yaml --data ./data/refund-detector/train.jsonl

Outputs a ~4.3MB ONNX model. Deploy it anywhere.

crasis export --spec specs/refund-detector.yaml --model ./models/refund-detector

Or: One Command

The three steps above are the manual path — useful for debugging or custom pipelines. For most use cases, crasis build runs the full pipeline in sequence:

crasis build --spec specs/refund-detector.yaml

Generate → train → eval → export. One command, one deployable ONNX package.

Inference

from crasis import Specialist

# Load once
model = Specialist.load("./models/refund-detector-onnx")

# Classify forever — no API calls, no latency, no cost
result = model.classify("I want my money back, this is ridiculous")
# → {"label": "positive", "confidence": 0.97, "latency_ms": 43}

result = model.classify("Your product is terrible but I'll keep it")
# → {"label": "negative", "confidence": 0.94, "latency_ms": 38}

Architecture

trainonce.dev / Crasis Studio   ← Hosted UI, custom builder
        │
Crasis CLI (this repo)          ← FOSS, MIT, runs anywhere
        │
        ├── Spec Parser         ← Converts plain English to training contract
        ├── Data Factory        ← OpenRouter + enforce_distillable_text
        ├── Training Pipeline   ← BERT-Tiny / BERT-Mini / BERT-Small distillation
        ├── Eval Harness        ← Validates against spec quality gates
        ├── ONNX Exporter       ← Universal deployment target

The Specialist Swarm (advanced)

At scale, a single conductor (frontier model) routes tasks to a swarm of local specialists. The frontier model handles edge cases and novel inputs. Specialists handle the 95% routine case — instantly, locally, for free.

Conductor (frontier model — called rarely)
    │
    ├── WhatsApp message → whatsapp-triage specialist
    ├── Email arrives → email-urgency specialist  
    ├── Support ticket → support-router specialist
    └── Novel input → conductor handles, logs for future specialist

The swarm grows as you encounter new patterns. Each new specialist makes the conductor cheaper to run. Costs decrease as capability increases. This is the opposite of how token costs scale today.

Roadmap

Now — FOSS Core

Spec format v1
OpenRouter data factory with enforce_distillable_text
BERT-Tiny / BERT-Mini training pipeline
ONNX export and local inference
Ten pre-built specialists

Soon — Crasis Studio

Don't want to manage API keys, clean training data, or run your own GPU?

Crasis Studio handles the full pipeline. Describe your problem, receive a deployable specialist. Pay for the build once. Run it forever.

Pricing is based on training complexity — number of samples, task type, and compute time. Transparent, quoted upfront, no subscriptions for the build itself.

Later — Enterprise

On-premise Crasis deployment for regulated environments. HIPAA, GDPR, ITAR-adjacent workflows. Private specialist registries. Audit trails. SLAs. Contact us.

Why This Is Legal

Crasis uses enforce_distillable_text: true on all OpenRouter calls. This flag routes exclusively to models whose authors have explicitly permitted their outputs to be used for training and distillation — Llama variants, Nemotron, DeepSeek, and others.

You are not distilling Claude or GPT-4. You are using openly-licensed models as teachers, exactly as their authors intended. Every training sample has clean provenance. The EU AI Act (August 2026) requires exactly this kind of auditability. Crasis provides it by design.

Hardware Requirements

To build specialists:

Any GPU with 6GB+ VRAM (RTX 4060 completes in ~30 minutes)
Or: CPU-only training (~4 hours for small specialists)
Or: Crasis Studio handles this for you

To run specialists:

Any laptop, Raspberry Pi, mobile device, or edge compute target
ONNX runtime — available everywhere
No GPU required for inference
Minimum RAM: ~50MB per specialist

Contributing

The ten pre-built specialists are the beginning. If you train a specialist for a task not covered here, we want it in this repo.

To contribute a specialist:

Train against a spec with min_accuracy: 0.93 or higher
Include the spec, eval results, and at least 100 held-out test examples
Open a PR — specialists that pass eval are merged and published to the Crasis specialist registry

The goal: 100 community specialists by end of 2026. Every one of them a task nobody needs to pay tokens for anymore.

The Bigger Picture

The frontier model era solved the hardest problem in AI: general reasoning at scale. That problem is solved. The next problem is different — how do you take that general capability and make it fast, cheap, private, and permanent for the 95% of tasks that don't require general reasoning?

That's what Crasis is for.

Frontier models are brilliant generalists. Specialists are trained workers. You wouldn't hire a Harvard MBA to answer your phone every time it rings. You'd train a receptionist once and let them handle it forever.

Stop renting intelligence. Own it.

License

MIT. Build cool things. Own your intelligence. Stop paying for tokens you don't need.

Crasis is built by Novitas Ventures LLC. The hosted pipeline and Studio are available at trainonce.dev.

Named after the linguistic process of crasis — the contraction of two elements into one. That's what we do to intelligence: compress the full capability of a frontier model into a single-purpose specialist.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.0

Mar 13, 2026

1.2.0

Mar 10, 2026

This version

1.1.2

Mar 10, 2026

1.1.0

Mar 10, 2026

0.1.1

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crasis-1.1.2.tar.gz (54.1 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crasis-1.1.2-py3-none-any.whl (41.6 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file crasis-1.1.2.tar.gz.

File metadata

Download URL: crasis-1.1.2.tar.gz
Upload date: Mar 10, 2026
Size: 54.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crasis-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`0afacca11cb9ff5a3de473b8d8bdc7af7dccd766f0eb14d7d77882c118fc5678`
MD5	`915c695f06617b481d73a933a84e2d7a`
BLAKE2b-256	`92bb5b012e91217866756583d58aa7fe78453a27cc6ce69a1c85f39162cb6dfd`

See more details on using hashes here.

File details

Details for the file crasis-1.1.2-py3-none-any.whl.

File metadata

Download URL: crasis-1.1.2-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 41.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crasis-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d74e39ed788ab6006b1a33d0131568c84e81bb3da9978ca7df0bbf664a7225e`
MD5	`7a97c5b612bb25fbae2e545fafd29d18`
BLAKE2b-256	`273fc0827b3e2b568a01dc69a76981e33fe43ef1e14c05ecdc1b4c27e0908acb`

See more details on using hashes here.

crasis 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Crasis

Train once. Run forever. Pay nothing.

The Problem

The Solution

The Numbers

Pre-Built Specialists

Evaluating Accuracy

Build Your Own

1. Write a Spec

2. Generate Training Data

3. Train the Specialist

Or: One Command

Inference

Architecture

Roadmap

Why This Is Legal

Hardware Requirements

Contributing

The Bigger Picture

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes