Omni-Backend Tokenizer - CPU (AVX2/512), CUDA (NVIDIA), ROCm (AMD) with automatic hardware detection

These details have not been verified by PyPI

Project links

Project description

Crayon Logo

🖍️ XERV Crayon v5.0.1

The Omni-Backend Tokenizer for Specialized AI

Why force a single bloated vocabulary on every problem?
Crayon is a next-generation tokenizer designed for specialization. Hot-swap vocabulary profiles ("Cartridges") optimized for your domain—Quantum Physics, Rust Programming, Financial Law, or anything in between.

🚀 Key Features

Feature	Description
💾 Cartridge System	Instantly hot-swap specialized vocabularies (`science`, `code`, `multilingual`)
🚀 Omni-Backend	Auto-detects & runs on CPU (AVX2), NVIDIA (CUDA), or AMD (ROCm)
⚡ Hyper-Fast Trainer	C++17 Linked-List BPE trains vocabularies in seconds (100x faster)
⚡ Native GPU Kernels	"Bare Metal" C++/CUDA/HIP kernels (no wrappers) for >10M tokens/sec
🗺️ Zero-Copy Mapping	DAT files loaded via `mmap` for instant startup & minimal RAM
🌊 Zero-Disk Streaming	Build profiles directly from Hugging Face—no multi-GB downloads
🛡️ Offline Resilience	Seamless local bootstrap fallback. Works offline out-of-the-box

📊 Benchmarks — Production Results

DATA-DRIVEN. NO HYPE. 100% VERIFIED.

🔥 CPU Performance (Intel i3-7020U AVX2)

Even on modest consumer hardware, Crayon's SIMD-accelerated engine outperforms industry standards by 50x - 100x.

Tokenizer	Tokens/Sec	Speedup vs Crayon
CRAYON (Science)	40,808,299	1.0x (Baseline)
CRAYON (Code)	34,742,588	1.2x slower
Tiktoken (GPT-4)	608,610	67.0x slower
HF LLaMA	343,282	118.8x slower
HF GPT-2	307,563	132.6x slower
HF BERT	195,108	209.1x slower

⚡ GPU Performance (Tesla T4)

⚡ Installation Summary (T4 GPU Environment)

======================================================================
XERV CRAYON V4.1.9 INSTALLATION AND BENCHMARKS
======================================================================
[1/7] Checking environment...
      PyTorch: 2.9.0+cu126
      CUDA: 12.6 (Tesla T4)
      * Smart Build: Will compile ONLY for this GPU architecture
      NVCC: /usr/local/cuda/bin/nvcc

[2/7] Installing build dependencies...
      Done (ninja, packaging, wheel)

[3/7] Cleaning previous installations...

[4/7] Cloning source code...
      __version__ = "4.1.9"

[5/7] Compiling and Installing (Streaming Logs)...
----------------------------------------------------------------------
[CRAYON-BUILD] Detected GPU: SM 7.5 -> Compiling for sm_75 ONLY
[CRAYON-BUILD] Configuring CUDA extension (max_jobs=1)

building 'crayon.c_ext.crayon_cpu' extension
[1/1] c++ -O3 -march=native -mavx2 -fPIC -std=c++17
Successfully built crayon_cpu.so

building 'crayon.c_ext.crayon_cuda' extension
[1/1] nvcc -O3 -std=c++17 --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75
Successfully built crayon_cuda.so

Successfully installed xerv-crayon-4.1.9
----------------------------------------------------------------------

[6/7] Verifying installation...
      Success! Installed version: 4.1.9
      Backends: {'cpu': True, 'cuda': True, 'rocm': False}

🔥 Performance Results (T4 GPU vs Tiktoken)

CRAYON (CUDA Backend - Tesla T4):

Active Device: CUDA
Backend: cuda_extension

Batch Throughput (XERV CRAYON):
     1,000 docs:      748,048 docs/sec |      9,724,621 tokens/sec
    10,000 docs:      639,239 docs/sec |      8,310,109 tokens/sec
    50,000 docs:      781,129 docs/sec |     10,154,678 tokens/sec

Tiktoken (cl100k_base - CPU):

Tiktoken Batch Throughput (cl100k_base encoding):
     1,000 docs:       87,307 docs/sec |        873,068 tokens/sec
    10,000 docs:       81,658 docs/sec |        816,576 tokens/sec
    50,000 docs:      107,583 docs/sec |      1,075,829 tokens/sec

📈 Performance Comparison Table

Batch Size	CRAYON Docs/Sec	CRAYON Tokens/Sec	Tiktoken Docs/Sec	Tiktoken Tokens/Sec	Speedup
1,000	748,048	9,724,621	87,307	873,068	11.1x ✨
10,000	639,239	8,310,109	81,658	816,576	10.2x ✨
50,000	781,129	10,154,678	107,583	1,075,829	9.4x ✨

Average Speedup: 10.2x faster than tiktoken on Tesla T4 GPU

🎯 Key Achievements

✅ >10M tokens/sec on mid-tier GPU (Tesla T4)
✅ Smart compilation - Only builds for detected GPU architecture
✅ Zero-copy memory mapping - Instant profile loading (<1ms)
✅ Production-grade stability - Handles 50K+ document batches
✅ Consistent performance - Minimal variance across batch sizes

⚡ Quick Start: The "Omni-Backend"

Run on any hardware with a single line of code. Crayon automatically detects AVX2, CUDA, or ROCm presence.

1. Hardware-Aware Initialization

from crayon.core.vocabulary import CrayonVocab

# 🔵 CPU (Intel/AMD) - AVX2/AVX-512 Native
vocab = CrayonVocab(device="cpu")

# 🟢 NVIDIA GPUs (All Tensor Core Architectures)
vocab = CrayonVocab(device="cuda")

# 🔴 AMD GPUs (Instinct/Radeon HIP/ROCm)
vocab = CrayonVocab(device="rocm")

2. The "Context Manager" Hot-Swap

Instantly switch between specialized vocabularies within the same script without reloading the model.

vocab = CrayonVocab(device="cpu")
vocab.load_profile("lite")

# ... standard tokenization ...

# ⚡ TEMPORARY SWITCH to 'code' profile for a function block
with vocab.using_profile("code"):
    tokens = vocab.tokenize("def fast_inverse_sqrt(x):")
    # Uses the compact Code vocabulary here
    
# 🔥 AUTOMATICALLY REVERT to 'lite' here

3. Basic Example

import json
import mmap
from crayon.c_ext.dat_builder import DATBuilder
from crayon.c_ext import crayon_cpu # Auto-renamed from crayon_fast

# Load any trained vocabulary
with open("trained_vocab_code.json", "r") as f:
    vocab_list = json.load(f)

# Compile to DAT (one-time, few seconds)
builder = DATBuilder()
builder.build(vocab_list)
builder.save("vocab_code.dat")

# Load into C++ engine via memory mapping (instant, <1ms)
with open("vocab_code.dat", "rb") as f:
    mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    crayon_cpu.load_dat(mm)

# Ultra-fast tokenization 🚀
code = 'fn main() { println!("Hello, World!"); }'
tokens = crayon_cpu.tokenize(code)
print(f"Tokens: {tokens}")

📦 Installation

pip install xerv-crayon

Google Colab / Linux Installation

Since Crayon includes high-performance C++ extensions, it will compile natively on your environment:

# Run this in a Colab cell
!pip install xerv-crayon

Build the Extensions

PowerShell (Windows):

python setup.py build_ext --inplace

Bash (Linux/Mac):

python setup.py build_ext --inplace

Note: The setup script auto-detects nvcc and hipcc. If found, GPU backends are built automatically.

🏎️ Omni-Backend Architecture (v4.0)

Crayon now uses a "God Tier" multi-backend implementation combining:

┌─────────────┐      ┌──────────────┐      ┌─────────────┐      ┌──────────────┐
│ vocab.json  │ ──▶  │ DATCompiler  │ ──▶  │  vocab.dat  │ ──▶  │ Omni-Engine  │
│   (List)    │      │ (C++ Fast)   │      │  (Binary)   │      │ CPU/CUDA/HIP │
└─────────────┘      └──────────────┘      └─────────────┘      └─────────────┘

Component	File	Accelerators
CPU Backend	`c_ext/cpu_engine.cpp`	AVX-512 / AVX2 (Intel/AMD)
CUDA Backend	`c_ext/gpu_engine_cuda.cu`	Tensor Cores (NVIDIA Tesla/Ampere)
ROCm Backend	`c_ext/rocm_engine.cpp`	CDNA2 / RDNA3 (AMD Instinct/Radeon)
Zero-Copy Loader	`mmap` + buffer protocol	Instant startup (0.5ms)

🧩 Available Cartridges

5 production-ready profiles defined in src/crayon/core/profiles.py:

Profile	Size	Optimized For	Sources
`standard`	57k	General English (V5 Default)	Lite + Top 10k subwords
`lite`	50k	Speed & Mobile	WikiText, RainDrop
`science`	250k	Reasoning (LaTeX, Quantum, Grad Math)	GRAD, Physics-700
`code`	250k	Syntax (Python, Rust, C++, JS)	CodeParrot, The Stack
`multilingual`	250k	Global (EU langs, Chinese, Hindi)	OSCAR, Wikipedia
`arts_commerce`	250k	Business (Legal, Finance, Lit)	PG19, Fin Phrasebank

vocab = CrayonVocab.load_profile("science")
vocab = CrayonVocab.load_profile("multilingual")

☁️ Verify on Google Colab

✅ Quick Verify Snippet

from crayon import CrayonVocab

# Initialize with Auto-Backend (AVX2/CUDA/ROCm)
tokenizer = CrayonVocab(device="auto")

# 1. Test Standard subword-heavy profile
tokenizer.load_profile("standard")
print(tokenizer.tokenize("that is a test for the standard profile"))

# 2. Test Code specialized profile
tokenizer.load_profile("code")
print(tokenizer.tokenize("def fast_inverse_sqrt(x):"))

🧪 Testing & Verification

# Full verification (Benchmarks + Tests)
python verify_dat_engine.py

# Benchmark all backends
python benchmark_competitive.py

============================================================
XERV CRAYON V4.1.9 - HYPER-PRODUCTION DAT ENGINE VERIFICATION
============================================================
Vocabulary Size: 250,000 tokens
DAT Nodes: 370,000+
Throughput: 40,808,299 tokens/sec
STATUS: ✅ HYPER-PRODUCTION READY

📜 Citation

@techreport{xerv2026crayon,
  title={XERV Crayon: A First-Principles Analysis of Production-Grade Tokenization},
  author={Pal, Soham and Xerv Research},
  year={2026},
  institution={Xerv Research Engineering Division}
}

📄 License

Built with 💙 by Xerv Research Engineering Division

_{⭐ Star this repo if Crayon helps your project!}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.3.6

Mar 28, 2026

5.3.4

Mar 28, 2026

5.3.3

Mar 20, 2026

5.3.2

Mar 20, 2026

5.3.1

Mar 20, 2026

This version

5.3.0

Mar 20, 2026

5.2.9

Mar 20, 2026

5.2.8

Mar 20, 2026

5.2.7

Mar 20, 2026

5.2.6

Mar 20, 2026

5.2.5

Mar 17, 2026

5.2.4

Mar 17, 2026

5.2.3

Mar 17, 2026

5.2.2

Mar 17, 2026

5.2.1

Mar 17, 2026

5.2.0

Mar 17, 2026

5.1.3

Mar 17, 2026

5.1.2

Mar 17, 2026

5.1.0

Mar 2, 2026

5.0.1

Feb 25, 2026

4.3.0

Feb 1, 2026

4.1.9

Jan 31, 2026

4.1.8

Jan 26, 2026

4.1.7

Jan 26, 2026

4.1.6

Jan 26, 2026

4.1.5

Jan 26, 2026

4.1.4

Jan 26, 2026

4.1.3

Jan 26, 2026

4.1.2

Jan 26, 2026

4.1.1

Jan 26, 2026

4.1.0

Jan 26, 2026

4.0.9

Jan 26, 2026

4.0.8

Jan 26, 2026

4.0.7

Jan 26, 2026

4.0.6

Jan 26, 2026

4.0.5

Jan 26, 2026

4.0.4

Jan 26, 2026

4.0.3

Jan 26, 2026

4.0.2

Jan 26, 2026

4.0.1

Jan 26, 2026

2.0.3

Jan 23, 2026

2.0.2

Jan 23, 2026

2.0.1

Jan 23, 2026

2.0.0

Jan 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xerv_crayon-5.3.0-py3-none-any.whl (9.3 MB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file xerv_crayon-5.3.0-py3-none-any.whl.

File metadata

Download URL: xerv_crayon-5.3.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 9.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for xerv_crayon-5.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae1df175eddd48e12ed9248b99cc995e820ac1261b010df0f16e1e66dafae078`
MD5	`1d4c3a6208f976ac70b66c82c8ed561a`
BLAKE2b-256	`1f80f9182c71b807d6ed9b4ea5c7378f2dd16f836e9a92aecc76857d9b51d61d`

See more details on using hashes here.

xerv-crayon 5.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🖍️ XERV Crayon v5.0.1

🚀 Key Features

📊 Benchmarks — Production Results

🔥 CPU Performance (Intel i3-7020U AVX2)

⚡ GPU Performance (Tesla T4)

⚡ Installation Summary (T4 GPU Environment)

🔥 Performance Results (T4 GPU vs Tiktoken)

📈 Performance Comparison Table

🎯 Key Achievements

⚡ Quick Start: The "Omni-Backend"

1. Hardware-Aware Initialization

2. The "Context Manager" Hot-Swap

3. Basic Example

📦 Installation

Google Colab / Linux Installation

Build the Extensions

🏎️ Omni-Backend Architecture (v4.0)

🧩 Available Cartridges

☁️ Verify on Google Colab

✅ Quick Verify Snippet

🧪 Testing & Verification

📜 Citation

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes