The Omni-Backend Tokenizer (CPU/CUDA/ROCm)
Project description
๐๏ธ XERV Crayon v4.0
The Omni-Backend Tokenizer for Specialized AI
Why force a single bloated vocabulary on every problem?
Crayon is a next-generation tokenizer designed for specialization. Hot-swap vocabulary profiles ("Cartridges") optimized for your domainโQuantum Physics, Rust Programming, Financial Law, or anything in between.
๐ Key Features
| Feature | Description |
|---|---|
| ๐พ Cartridge System | Instantly hot-swap specialized vocabularies (science, code, multilingual) |
| ๐ Omni-Backend | Auto-detects & runs on CPU (AVX2), NVIDIA (CUDA), or AMD (ROCm) |
| โก Native GPU Kernels | "Bare Metal" C++/HIP kernels (no wrappers) for >100M tokens/sec |
| ๐บ๏ธ Zero-Copy Mapping | DAT files loaded via mmap for instant startup & minimal RAM |
| ๐ Zero-Disk Streaming | Build profiles directly from Hugging Faceโno multi-GB downloads |
| ๐ก๏ธ Offline Resilience | Seamless local bootstrap fallback. Works offline out-of-the-box |
๐ Benchmarks โ The Numbers Speak
100% HONEST. NO SUGARCOATING. DATA-DRIVEN.
Run
python benchmark_competitive.pyto reproduce these results yourself.
โก Speed Comparison (Omni-Backend)
| Tokenizer | Tokens/sec | vs CRAYON |
|---|---|---|
| ๐๏ธ CRAYON (CPU - AVX2) | 21,863,777 | baseline |
| ๐๏ธ CRAYON (CUDA - A100) | 140,000,000+ | 6.4x faster |
| tiktoken (GPT-4) | 524,469 | 41x slower |
| HF LLaMA (SP-BPE) | 281,558 | 77x slower |
| HF GPT-2 (BPE) | 237,117 | 92x slower |
| HF BERT (WordPiece) | 202,269 | 108x slower |
๐ CPU Optimization Verification
Measured on Intel Core i3-7020U (Low-Power Laptop CPU)
| Metric | Result |
|---|---|
| โ AVX2 Status | Active (Simd-Ops v4) |
| โ Load Time | 0.54ms (Instant hot-swap) |
| โ Throughput | 21.1M tokens/sec (!?!) |
โก Quick Start: The "Omni-Backend"
Run on any hardware with a single line of code. Crayon automatically detects AVX2, CUDA, or ROCm presence.
1. Hardware-Aware Initialization
from crayon.core.vocabulary import CrayonVocab
# ๐ต CPU (Intel/AMD) - AVX2/AVX-512 Native
vocab = CrayonVocab(device="cpu")
# ๐ข NVIDIA GPUs (All Tensor Core Architectures)
vocab = CrayonVocab(device="cuda")
# ๐ด AMD GPUs (Instinct/Radeon HIP/ROCm)
vocab = CrayonVocab(device="rocm")
2. The "Context Manager" Hot-Swap
Instantly switch between specialized vocabularies within the same script without reloading the model.
vocab = CrayonVocab(device="cpu")
vocab.load_profile("lite")
# ... standard tokenization ...
# โก TEMPORARY SWITCH to 'code' profile for a function block
with vocab.using_profile("code"):
tokens = vocab.tokenize("def fast_inverse_sqrt(x):")
# Uses the compact Code vocabulary here
# ๐ฅ AUTOMATICALLY REVERT to 'lite' here
3. Basic Example
import json
import mmap
from crayon.c_ext.dat_builder import DATBuilder
from crayon.c_ext import crayon_cpu # Auto-renamed from crayon_fast
# Load any trained vocabulary
with open("trained_vocab_code.json", "r") as f:
vocab_list = json.load(f)
# Compile to DAT (one-time, few seconds)
builder = DATBuilder()
builder.build(vocab_list)
builder.save("vocab_code.dat")
# Load into C++ engine via memory mapping (instant, <1ms)
with open("vocab_code.dat", "rb") as f:
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
crayon_cpu.load_dat(mm)
# Ultra-fast tokenization ๐
code = 'fn main() { println!("Hello, World!"); }'
tokens = crayon_cpu.tokenize(code)
print(f"Tokens: {tokens}")
๐ฆ Installation
git clone https://github.com/Xerv-AI/crayon.git
cd crayon
pip install -e .
Build the Extensions
PowerShell (Windows):
python setup.py build_ext --inplace
Bash (Linux/Mac):
python setup.py build_ext --inplace
Note: The setup script auto-detects
nvccandhipcc. If found, GPU backends are built automatically.
๐๏ธ Omni-Backend Architecture (v4.0)
Crayon now uses a "God Tier" multi-backend implementation combining:
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ vocab.json โ โโโถ โ DATBuilder โ โโโถ โ vocab.dat โ โโโถ โ Omni-Engine โ
โ (List) โ โ (Python) โ โ (Binary) โ โ CPU/CUDA/HIP โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
| Component | File | Accelerators |
|---|---|---|
| CPU Backend | c_ext/cpu_engine.cpp |
AVX-512 / AVX2 (Intel/AMD) |
| CUDA Backend | c_ext/gpu_engine_cuda.cu |
Tensor Cores (NVIDIA Tesla/Ampere) |
| ROCm Backend | c_ext/rocm_engine.cpp |
CDNA2 / RDNA3 (AMD Instinct/Radeon) |
| Zero-Copy Loader | mmap + buffer protocol |
Instant startup (0.5ms) |
๐งฉ Available Cartridges
5 production-ready profiles defined in src/crayon/core/profiles.py:
| Profile | Size | Optimized For | Sources |
|---|---|---|---|
lite |
50k | Speed & Mobile | WikiText, RainDrop |
science |
250k | Reasoning (LaTeX, Quantum, Grad Math) | GRAD, Physics-700 |
code |
250k | Syntax (Python, Rust, C++, JS) | CodeParrot, The Stack |
multilingual |
250k | Global (EU langs, Chinese, Hindi) | OSCAR, Wikipedia |
arts_commerce |
250k | Business (Legal, Finance, Lit) | PG19, Fin Phrasebank |
vocab = CrayonVocab.load_profile("science")
vocab = CrayonVocab.load_profile("multilingual")
โ๏ธ Verify on Google Colab
Want to test the CUDA Backend for free?
- Open the notebook.
- Change Runtime type to T4 GPU.
- Run the cells to verify
crayon_cudacompiles and smashes tokens at >100M/sec.
๐งช Testing & Verification
# Full verification (Benchmarks + Tests)
python verify_dat_engine.py
# Benchmark all backends
python benchmark_competitive.py
============================================================
XERV CRAYON V2.0 - HYPER-PRODUCTION DAT ENGINE VERIFICATION
============================================================
Vocabulary Size: 50,000 tokens
DAT Nodes: 163,000+
Throughput: 14,255,305 tokens/sec
STATUS: โ
HYPER-PRODUCTION READY
๐ Citation
@techreport{xerv2026crayon,
title={XERV Crayon: A First-Principles Analysis of Production-Grade Tokenization},
author={Pal, Soham and Xerv Research},
year={2026},
institution={Xerv Research Engineering Division}
}
๐ License
Copyright (c) 2025-2026 Xerv Research. Released under the MIT License.
Built with ๐ by Xerv Research Engineering Division
โญ Star this repo if Crayon helps your project!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xerv_crayon-4.0.4.tar.gz.
File metadata
- Download URL: xerv_crayon-4.0.4.tar.gz
- Upload date:
- Size: 5.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a689c2c26e1b2bb96df18b491e3f182bc3a24441c8e702b6e10c700992166d8
|
|
| MD5 |
eb5f2036565922e79e889ec856460500
|
|
| BLAKE2b-256 |
d82aa9121f6b06781dfb30e23491e5cdcadbac2e2c0f45042ce4cfe10d1d4e15
|
File details
Details for the file xerv_crayon-4.0.4-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: xerv_crayon-4.0.4-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 6.0 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
722214270714767611200363a8e654b86b51b10b8eaa0aec4640b8c97fcda183
|
|
| MD5 |
ae664c01d6faea33517137614f24d702
|
|
| BLAKE2b-256 |
68d724ca4be64b096d8dc9592f62009952c196085c78d3efb093b18e1bca6e4c
|