Skip to main content

Omni-Backend Tokenizer - CPU (AVX2/512), CUDA (NVIDIA), ROCm (AMD) with automatic hardware detection

Project description

Crayon Logo

๐Ÿ–๏ธ XERV Crayon v5.0.1

The Omni-Backend Tokenizer for Specialized AI

PyPI version License: MIT Python 3.12+ CUDA ROCm AVX2

Why force a single bloated vocabulary on every problem?
Crayon is a next-generation tokenizer designed for specialization. Hot-swap vocabulary profiles ("Cartridges") optimized for your domainโ€”Quantum Physics, Rust Programming, Financial Law, or anything in between.


๐Ÿš€ Key Features

Feature Description
๐Ÿ’พ Cartridge System Instantly hot-swap specialized vocabularies (science, code, multilingual)
๐Ÿš€ Omni-Backend Auto-detects & runs on CPU (AVX2), NVIDIA (CUDA), or AMD (ROCm)
โšก Hyper-Fast Trainer C++17 Linked-List BPE trains vocabularies in seconds (100x faster)
โšก Native GPU Kernels "Bare Metal" C++/CUDA/HIP kernels (no wrappers) for >10M tokens/sec
๐Ÿ—บ๏ธ Zero-Copy Mapping DAT files loaded via mmap for instant startup & minimal RAM
๐ŸŒŠ Zero-Disk Streaming Build profiles directly from Hugging Faceโ€”no multi-GB downloads
๐Ÿ›ก๏ธ Offline Resilience Seamless local bootstrap fallback. Works offline out-of-the-box

๐Ÿ“Š Benchmarks โ€” Production Results

DATA-DRIVEN. NO HYPE. 100% VERIFIED.

๐Ÿ”ฅ CPU Performance (Intel i3-7020U AVX2)

Even on modest consumer hardware, Crayon's SIMD-accelerated engine outperforms industry standards by 50x - 100x.

Tokenizer Tokens/Sec Speedup vs Crayon
CRAYON (Science) 40,808,299 1.0x (Baseline)
CRAYON (Code) 34,742,588 1.2x slower
Tiktoken (GPT-4) 608,610 67.0x slower
HF LLaMA 343,282 118.8x slower
HF GPT-2 307,563 132.6x slower
HF BERT 195,108 209.1x slower

โšก GPU Performance (Tesla T4)

โšก Installation Summary (T4 GPU Environment)

======================================================================
XERV CRAYON V4.1.9 INSTALLATION AND BENCHMARKS
======================================================================
[1/7] Checking environment...
      PyTorch: 2.9.0+cu126
      CUDA: 12.6 (Tesla T4)
      * Smart Build: Will compile ONLY for this GPU architecture
      NVCC: /usr/local/cuda/bin/nvcc

[2/7] Installing build dependencies...
      Done (ninja, packaging, wheel)

[3/7] Cleaning previous installations...

[4/7] Cloning source code...
      __version__ = "4.1.9"

[5/7] Compiling and Installing (Streaming Logs)...
----------------------------------------------------------------------
[CRAYON-BUILD] Detected GPU: SM 7.5 -> Compiling for sm_75 ONLY
[CRAYON-BUILD] Configuring CUDA extension (max_jobs=1)

building 'crayon.c_ext.crayon_cpu' extension
[1/1] c++ -O3 -march=native -mavx2 -fPIC -std=c++17
Successfully built crayon_cpu.so

building 'crayon.c_ext.crayon_cuda' extension
[1/1] nvcc -O3 -std=c++17 --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75
Successfully built crayon_cuda.so

Successfully installed xerv-crayon-4.1.9
----------------------------------------------------------------------

[6/7] Verifying installation...
      Success! Installed version: 4.1.9
      Backends: {'cpu': True, 'cuda': True, 'rocm': False}

๐Ÿ”ฅ Performance Results (T4 GPU vs Tiktoken)

CRAYON (CUDA Backend - Tesla T4):

Active Device: CUDA
Backend: cuda_extension

Batch Throughput (XERV CRAYON):
     1,000 docs:      748,048 docs/sec |      9,724,621 tokens/sec
    10,000 docs:      639,239 docs/sec |      8,310,109 tokens/sec
    50,000 docs:      781,129 docs/sec |     10,154,678 tokens/sec

Tiktoken (cl100k_base - CPU):

Tiktoken Batch Throughput (cl100k_base encoding):
     1,000 docs:       87,307 docs/sec |        873,068 tokens/sec
    10,000 docs:       81,658 docs/sec |        816,576 tokens/sec
    50,000 docs:      107,583 docs/sec |      1,075,829 tokens/sec

๐Ÿ“ˆ Performance Comparison Table

Batch Size CRAYON Docs/Sec CRAYON Tokens/Sec Tiktoken Docs/Sec Tiktoken Tokens/Sec Speedup
1,000 748,048 9,724,621 87,307 873,068 11.1x โœจ
10,000 639,239 8,310,109 81,658 816,576 10.2x โœจ
50,000 781,129 10,154,678 107,583 1,075,829 9.4x โœจ

Average Speedup: 10.2x faster than tiktoken on Tesla T4 GPU

๐ŸŽฏ Key Achievements

  • โœ… >10M tokens/sec on mid-tier GPU (Tesla T4)
  • โœ… Smart compilation - Only builds for detected GPU architecture
  • โœ… Zero-copy memory mapping - Instant profile loading (<1ms)
  • โœ… Production-grade stability - Handles 50K+ document batches
  • โœ… Consistent performance - Minimal variance across batch sizes

โšก Quick Start: The "Omni-Backend"

Run on any hardware with a single line of code. Crayon automatically detects AVX2, CUDA, or ROCm presence.

1. Hardware-Aware Initialization

from crayon.core.vocabulary import CrayonVocab

# ๐Ÿ”ต CPU (Intel/AMD) - AVX2/AVX-512 Native
vocab = CrayonVocab(device="cpu")

# ๐ŸŸข NVIDIA GPUs (All Tensor Core Architectures)
vocab = CrayonVocab(device="cuda")

# ๐Ÿ”ด AMD GPUs (Instinct/Radeon HIP/ROCm)
vocab = CrayonVocab(device="rocm")

2. The "Context Manager" Hot-Swap

Instantly switch between specialized vocabularies within the same script without reloading the model.

vocab = CrayonVocab(device="cpu")
vocab.load_profile("lite")

# ... standard tokenization ...

# โšก TEMPORARY SWITCH to 'code' profile for a function block
with vocab.using_profile("code"):
    tokens = vocab.tokenize("def fast_inverse_sqrt(x):")
    # Uses the compact Code vocabulary here
    
# ๐Ÿ”ฅ AUTOMATICALLY REVERT to 'lite' here

3. Basic Example

import json
import mmap
from crayon.c_ext.dat_builder import DATBuilder
from crayon.c_ext import crayon_cpu # Auto-renamed from crayon_fast

# Load any trained vocabulary
with open("trained_vocab_code.json", "r") as f:
    vocab_list = json.load(f)

# Compile to DAT (one-time, few seconds)
builder = DATBuilder()
builder.build(vocab_list)
builder.save("vocab_code.dat")

# Load into C++ engine via memory mapping (instant, <1ms)
with open("vocab_code.dat", "rb") as f:
    mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    crayon_cpu.load_dat(mm)

# Ultra-fast tokenization ๐Ÿš€
code = 'fn main() { println!("Hello, World!"); }'
tokens = crayon_cpu.tokenize(code)
print(f"Tokens: {tokens}")

๐Ÿ“ฆ Installation

pip install xerv-crayon

Google Colab / Linux Installation

Since Crayon includes high-performance C++ extensions, it will compile natively on your environment:

# Run this in a Colab cell
!pip install xerv-crayon

Build the Extensions

PowerShell (Windows):

python setup.py build_ext --inplace

Bash (Linux/Mac):

python setup.py build_ext --inplace

Note: The setup script auto-detects nvcc and hipcc. If found, GPU backends are built automatically.


๐ŸŽ๏ธ Omni-Backend Architecture (v4.0)

Crayon now uses a "God Tier" multi-backend implementation combining:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ vocab.json  โ”‚ โ”€โ”€โ–ถ  โ”‚ DATCompiler  โ”‚ โ”€โ”€โ–ถ  โ”‚  vocab.dat  โ”‚ โ”€โ”€โ–ถ  โ”‚ Omni-Engine  โ”‚
โ”‚   (List)    โ”‚      โ”‚ (C++ Fast)   โ”‚      โ”‚  (Binary)   โ”‚      โ”‚ CPU/CUDA/HIP โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Component File Accelerators
CPU Backend c_ext/cpu_engine.cpp AVX-512 / AVX2 (Intel/AMD)
CUDA Backend c_ext/gpu_engine_cuda.cu Tensor Cores (NVIDIA Tesla/Ampere)
ROCm Backend c_ext/rocm_engine.cpp CDNA2 / RDNA3 (AMD Instinct/Radeon)
Zero-Copy Loader mmap + buffer protocol Instant startup (0.5ms)

๐Ÿงฉ Available Cartridges

5 production-ready profiles defined in src/crayon/core/profiles.py:

Profile Size Optimized For Sources
standard 57k General English (V5 Default) Lite + Top 10k subwords
lite 50k Speed & Mobile WikiText, RainDrop
science 250k Reasoning (LaTeX, Quantum, Grad Math) GRAD, Physics-700
code 250k Syntax (Python, Rust, C++, JS) CodeParrot, The Stack
multilingual 250k Global (EU langs, Chinese, Hindi) OSCAR, Wikipedia
arts_commerce 250k Business (Legal, Finance, Lit) PG19, Fin Phrasebank
vocab = CrayonVocab.load_profile("science")
vocab = CrayonVocab.load_profile("multilingual")

โ˜๏ธ Verify on Google Colab

โœ… Quick Verify Snippet

from crayon import CrayonVocab

# Initialize with Auto-Backend (AVX2/CUDA/ROCm)
tokenizer = CrayonVocab(device="auto")

# 1. Test Standard subword-heavy profile
tokenizer.load_profile("standard")
print(tokenizer.tokenize("that is a test for the standard profile"))

# 2. Test Code specialized profile
tokenizer.load_profile("code")
print(tokenizer.tokenize("def fast_inverse_sqrt(x):"))

๐Ÿงช Testing & Verification

# Full verification (Benchmarks + Tests)
python verify_dat_engine.py

# Benchmark all backends
python benchmark_competitive.py
============================================================
XERV CRAYON V4.1.9 - HYPER-PRODUCTION DAT ENGINE VERIFICATION
============================================================
Vocabulary Size: 250,000 tokens
DAT Nodes: 370,000+
Throughput: 40,808,299 tokens/sec
STATUS: โœ… HYPER-PRODUCTION READY

๐Ÿ“œ Citation

@techreport{xerv2026crayon,
  title={XERV Crayon: A First-Principles Analysis of Production-Grade Tokenization},
  author={Pal, Soham and Xerv Research},
  year={2026},
  institution={Xerv Research Engineering Division}
}

๐Ÿ“„ License

Copyright (c) 2025-2026 Xerv Research. Released under the MIT License.


Built with ๐Ÿ’™ by Xerv Research Engineering Division

โญ Star this repo if Crayon helps your project!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xerv_crayon-5.2.9.tar.gz (9.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xerv_crayon-5.2.9-py3-none-any.whl (9.3 MB view details)

Uploaded Python 3

File details

Details for the file xerv_crayon-5.2.9.tar.gz.

File metadata

  • Download URL: xerv_crayon-5.2.9.tar.gz
  • Upload date:
  • Size: 9.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for xerv_crayon-5.2.9.tar.gz
Algorithm Hash digest
SHA256 4b52e68a449ccf642801e620ee4e6c8559169b1c72943d6aaa9fc99e366b7cbf
MD5 fde1e84f12c6878c73cb4d18e9dfbb61
BLAKE2b-256 6b3b66fb7448d808fb38975b3a1af0250e158f87678e273806f8b9bce472fb50

See more details on using hashes here.

File details

Details for the file xerv_crayon-5.2.9-py3-none-any.whl.

File metadata

  • Download URL: xerv_crayon-5.2.9-py3-none-any.whl
  • Upload date:
  • Size: 9.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for xerv_crayon-5.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6618d841aa6418c0662f2e71c0dadc94615d7c75b477b368ead6b021f53443ca
MD5 52a0de327c9a4704c9ac12e40ff45711
BLAKE2b-256 d4f13a667a761987faa8c459d7897b78f4d06b55992fa2a9ee9dcbc9538f63cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page