Skip to main content

Distill frontier model intelligence into tiny, local specialist models

Project description

Crasis

Train once. Run forever. Pay nothing.

"I had 847 unread emails and was paying $34/month to run GPT-4 over my inbox to tell me which ones needed attention. Four minutes later I had an 11MB model that does the same job locally, processes my entire inbox in 11 seconds, costs nothing per inference, and my emails never leave my laptop. Crasis is the tool I built to do that."

Email Urgency Demo


The Problem

Current AI agents are brilliant generalists doing the work of specialists. You are burning tokens — and surrendering privacy — to answer questions like:

  • "Is this WhatsApp message asking about my pricing?"
  • "Does this email need a reply today?"
  • "Is this customer angry?"

A frontier model answering those questions is a nuclear weapon aimed at a mailbox. You're paying $20–50/month, waiting 2–5 seconds per query, and sending your private data to a cloud server — for a yes/no answer a 20MB model could give you in under 100ms, locally, for free, forever.

The token model is extractive by design. Every query is a toll. The provider is incentivized for you to stay dependent. There is no learning curve benefit passed to you. Ever.

Crasis breaks that.


The Solution

Crasis uses a frontier model once — to understand your problem and generate synthetic training data. That intelligence is then distilled into a tiny specialist model that lives on your device.

After that, the frontier model is never called again.

You describe your problem in plain English
            ↓
Crasis calls a frontier model once → generates training data
            ↓
Crasis trains a tiny specialist (4–160MB) on your hardware
            ↓
Specialist deploys locally — runs forever, zero API cost

The frontier model is the architect. The specialist is the worker. You only need the architect once.


The Numbers

Frontier API Crasis Specialist
Model size 4GB+ 4–160MB
Cost per query $0.001–0.01 $0.00
Latency 2–5 seconds <100ms
Works offline No Yes
Data leaves device Yes Never
Gets cheaper over time No Already free
Accuracy on narrow tasks - synthetic data ~97% ~95–99%

See the SCORECARD for numbers on holdout/realistic data.


Pre-Built Specialists

Ten specialists, ready to pull. These cover the tasks people are most commonly paying frontier models to handle. Download, deploy, never pay for them again.

crasis pull whatsapp-triage      # Pricing/availability inquiry detector
crasis pull email-urgency        # Reply-now vs read-later classifier
crasis pull sentiment-gate       # High-arousal anger detection
crasis pull meeting-parser       # Extract who/when/what from scheduling messages
crasis pull pricing-detector     # Is this message asking about cost?
crasis pull spam-filter          # Personalizable noise classifier
crasis pull support-router       # Multi-class ticket categorization
crasis pull social-classifier    # Is this mention worth responding to?
crasis pull invoice-intent       # Payment/billing message detection
crasis pull availability-handler # Scheduling request → calendar link trigger

Each specialist:

  • Ships as an ONNX model — runs anywhere, no GPU required
  • Is 4–160MB on disk (most are under 11MB)
  • Classifies in under 100ms on a laptop CPU
  • Was trained on 3,000–10,000 synthetic examples generated from distillable frontier models
  • Comes with a spec, eval results, and example inference code

Evaluating Accuracy

Pre-built specialists ship with hand-authored holdout fixtures — real-world-style examples that were not generated by the training pipeline. These give an honest accuracy number, separate from the synthetic training metrics.

# Run holdout eval on any specialist
crasis eval -s specialists/spam-filter/spec.yaml \
    -m ./models/spam-filter-onnx \
    --holdout tests/fixtures/spam-filter.jsonl

Output shows both numbers side by side:

  Accuracy (synthetic) : 0.9920   ← how well it learned the training data
  Accuracy (holdout)   : 0.7000   ← how well it works on real text
  Synthetic-real gap   : +0.2920  ← the honest gap

Full results for all 10 specialists are in SCORECARD.md. The two-number format (synthetic + holdout) is how Crasis reports accuracy — because publishing only synthetic accuracy would be misleading.


Build Your Own

Three steps. One coffee.

1. Write a Spec

Describe your problem in plain English. A spec is not code — it's a contract.

# specs/refund-detector.yaml
crasis_spec: v1
name: refund-detector
description: "Detect when a customer message is explicitly requesting a refund"

task:
  type: binary_classification
  trigger: "Customer is asking for their money back"
  ignore: "Customer is complaining but not requesting a refund"

constraints:
  max_model_size_mb: 20
  max_inference_ms: 100
  connectivity: none

quality:
  min_accuracy: 0.95
  eval_on: [ambiguous_phrasing, angry_tone, multiple_languages]

training:
  strategy: synthetic
  volume: 5000

2. Generate Training Data

Crasis calls OpenRouter with enforce_distillable_text: true — routing only to models whose licenses explicitly permit their outputs to be used for training. No ToS violations. No banned keys. Clean provenance on every sample.

export OPENROUTER_API_KEY=sk-or-v1-...
crasis generate --spec specs/refund-detector.yaml --count 5000

Generates ~5,000 labeled examples. Takes ~45 minutes. Costs ~$15 in API credits.

3. Train the Specialist

Runs locally on your GPU. RTX 4060 completes a BERT-Tiny distillation in under 30 minutes.

crasis train --spec specs/refund-detector.yaml --data ./data/refund-detector/train.jsonl

Outputs a ~4.3MB ONNX model. Deploy it anywhere.

crasis export --spec specs/refund-detector.yaml --model ./models/refund-detector

Or: One Command

The three steps above are the manual path — useful for debugging or custom pipelines. For most use cases, crasis build runs the full pipeline in sequence:

crasis build --spec specs/refund-detector.yaml

Generate → train → eval → export. One command, one deployable ONNX package.


Inference

from crasis import Specialist

# Load once
model = Specialist.load("./models/refund-detector-onnx")

# Classify forever — no API calls, no latency, no cost
result = model.classify("I want my money back, this is ridiculous")
# → {"label": "positive", "confidence": 0.97, "latency_ms": 43}

result = model.classify("Your product is terrible but I'll keep it")
# → {"label": "negative", "confidence": 0.94, "latency_ms": 38}

Architecture

trainonce.dev / Crasis Studio   ← Hosted UI, custom builder
        │
Crasis CLI (this repo)          ← FOSS, MIT, runs anywhere
        │
        ├── Spec Parser         ← Converts plain English to training contract
        ├── Data Factory        ← OpenRouter + enforce_distillable_text
        ├── Training Pipeline   ← BERT-Tiny / BERT-Mini / BERT-Small distillation
        ├── Eval Harness        ← Validates against spec quality gates
        ├── ONNX Exporter       ← Universal deployment target

The Specialist Swarm (advanced)

At scale, a single conductor (frontier model) routes tasks to a swarm of local specialists. The frontier model handles edge cases and novel inputs. Specialists handle the 95% routine case — instantly, locally, for free.

Conductor (frontier model — called rarely)
    │
    ├── WhatsApp message → whatsapp-triage specialist
    ├── Email arrives → email-urgency specialist  
    ├── Support ticket → support-router specialist
    └── Novel input → conductor handles, logs for future specialist

The swarm grows as you encounter new patterns. Each new specialist makes the conductor cheaper to run. Costs decrease as capability increases. This is the opposite of how token costs scale today.


Roadmap

Now — FOSS Core

  • Spec format v1
  • OpenRouter data factory with enforce_distillable_text
  • BERT-Tiny / BERT-Mini training pipeline
  • ONNX export and local inference
  • Ten pre-built specialists

Soon — Crasis Studio

Don't want to manage API keys, clean training data, or run your own GPU?

Crasis Studio handles the full pipeline. Describe your problem, receive a deployable specialist. Pay for the build once. Run it forever.

Pricing is based on training complexity — number of samples, task type, and compute time. Transparent, quoted upfront, no subscriptions for the build itself.

Later — Enterprise

On-premise Crasis deployment for regulated environments. HIPAA, GDPR, ITAR-adjacent workflows. Private specialist registries. Audit trails. SLAs. Contact us.


Why This Is Legal

Crasis uses enforce_distillable_text: true on all OpenRouter calls. This flag routes exclusively to models whose authors have explicitly permitted their outputs to be used for training and distillation — Llama variants, Nemotron, DeepSeek, and others.

You are not distilling Claude or GPT-4. You are using openly-licensed models as teachers, exactly as their authors intended. Every training sample has clean provenance. The EU AI Act (August 2026) requires exactly this kind of auditability. Crasis provides it by design.


Hardware Requirements

To build specialists:

  • Any GPU with 6GB+ VRAM (RTX 4060 completes in ~30 minutes)
  • Or: CPU-only training (~4 hours for small specialists)
  • Or: Crasis Studio handles this for you

To run specialists:

  • Any laptop, Raspberry Pi, mobile device, or edge compute target
  • ONNX runtime — available everywhere
  • No GPU required for inference
  • Minimum RAM: ~50MB per specialist

Contributing

The ten pre-built specialists are the beginning. If you train a specialist for a task not covered here, we want it in this repo.

To contribute a specialist:

  1. Train against a spec with min_accuracy: 0.93 or higher
  2. Include the spec, eval results, and at least 100 held-out test examples
  3. Open a PR — specialists that pass eval are merged and published to the Crasis specialist registry

The goal: 100 community specialists by end of 2026. Every one of them a task nobody needs to pay tokens for anymore.


The Bigger Picture

The frontier model era solved the hardest problem in AI: general reasoning at scale. That problem is solved. The next problem is different — how do you take that general capability and make it fast, cheap, private, and permanent for the 95% of tasks that don't require general reasoning?

That's what Crasis is for.

Frontier models are brilliant generalists. Specialists are trained workers. You wouldn't hire a Harvard MBA to answer your phone every time it rings. You'd train a receptionist once and let them handle it forever.

Stop renting intelligence. Own it.


License

MIT. Build cool things. Own your intelligence. Stop paying for tokens you don't need.


Crasis is built by Novitas Ventures LLC. The hosted pipeline and Studio are available at trainonce.dev.

Named after the linguistic process of crasis — the contraction of two elements into one. That's what we do to intelligence: compress the full capability of a frontier model into a single-purpose specialist.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crasis-1.1.2.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crasis-1.1.2-py3-none-any.whl (41.6 kB view details)

Uploaded Python 3

File details

Details for the file crasis-1.1.2.tar.gz.

File metadata

  • Download URL: crasis-1.1.2.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crasis-1.1.2.tar.gz
Algorithm Hash digest
SHA256 0afacca11cb9ff5a3de473b8d8bdc7af7dccd766f0eb14d7d77882c118fc5678
MD5 915c695f06617b481d73a933a84e2d7a
BLAKE2b-256 92bb5b012e91217866756583d58aa7fe78453a27cc6ce69a1c85f39162cb6dfd

See more details on using hashes here.

File details

Details for the file crasis-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: crasis-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 41.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crasis-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8d74e39ed788ab6006b1a33d0131568c84e81bb3da9978ca7df0bbf664a7225e
MD5 7a97c5b612bb25fbae2e545fafd29d18
BLAKE2b-256 273fc0827b3e2b568a01dc69a76981e33fe43ef1e14c05ecdc1b4c27e0908acb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page