splintr-rs

Fast Rust BPE tokenizer with Python bindings

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

farhansyah

These details have not been verified by PyPI

Project description

Splintr

A high-performance BPE tokenizer built with Rust with Python bindings, focused on speed, safety, and resource optimization.

The Problem

Tokenization is everywhere in modern AI. Whether you're building LLM applications, training models, or processing data pipelines, you're tokenizing text constantly. But existing tokenizers have a problem: they're slow.

When you need to tokenize batches of prompts, documents, or training data, you're stuck waiting. Python-based tokenizers can't fully leverage modern multi-core CPUs. You need something faster.

The Solution

Splintr brings Rust performance to Python. Built from the ground up for speed and efficiency:

Batch Encoding Throughput

Configuration	Splintr	Tiktoken	HuggingFace	TokenDagger
1,000 texts	111 MB/s	9 MB/s	28 MB/s	9 MB/s
500 texts	107 MB/s	10 MB/s	27 MB/s	8 MB/s
100 texts	69 MB/s	7 MB/s	20 MB/s	6 MB/s

10-12x faster than tiktoken. 4x faster than HuggingFace. Built in Rust, accessible from Python.

Quick Start

Python

pip install splintr-rs

from splintr import Tokenizer

# Load a pretrained vocabulary
tokenizer = Tokenizer.from_pretrained("cl100k_base")  # OpenAI GPT-4/3.5
# tokenizer = Tokenizer.from_pretrained("llama3")      # Meta Llama 3 family
# tokenizer = Tokenizer.from_pretrained("deepseek_v3") # DeepSeek V3/R1
# tokenizer = Tokenizer.from_pretrained("mistral_v1")  # Mistral 7B v0.1/v0.2
# tokenizer = Tokenizer.from_pretrained("mistral_v2")  # Mistral 7B v0.3, Codestral
# tokenizer = Tokenizer.from_pretrained("mistral_v3")  # Mistral NeMo, Large 2

# Encode and decode
tokens = tokenizer.encode("Hello, world!")
text = tokenizer.decode(tokens)

# Batch encode (10-12x faster)
texts = ["Hello, world!", "How are you?", "Machine learning is fun!"]
batch_tokens = tokenizer.encode_batch(texts)

See the API Guide for complete documentation and examples.

Rust

[dependencies]
splintr = "*"  # or pin to a specific version

use splintr::{Tokenizer, CL100K_BASE_PATTERN};

let tokenizer = Tokenizer::new(encoder, special_tokens, CL100K_BASE_PATTERN)?;
let tokens = tokenizer.encode("Hello, world!");
let batch_tokens = tokenizer.encode_batch(&texts);

See the API Guide and docs.rs for complete Rust documentation.

Key Features

Performance where it matters:

12x faster batch encoding - Parallel processing across multiple texts using Rayon
3-4x faster single text encoding - Optimized sequential algorithm for typical use cases
Smart parallelization - Sequential for small texts (<1MB), parallel for large datasets
LRU caching - Avoid redundant encoding of frequently seen text chunks

Built for production:

Compatible vocabularies - Supports cl100k_base, o200k_base (OpenAI), Llama 3 family (Meta), DeepSeek V3 (DeepSeek), and Mistral V1/V2/V3 (Mistral AI)
Streaming decoders - Real-time LLM output display with proper UTF-8 handling (guide)
54 agent tokens - Built-in support for chat, CoT reasoning, ReAct agents, tool calling, RAG citations (docs)
Battle-tested algorithms - Regexr with JIT (pure Rust), Aho-Corasick for special tokens, linked-list BPE

Cross-platform:

Python bindings via PyO3 (Linux, macOS, Windows)
Native Rust library for maximum performance

Performance Deep Dive

All benchmarks performed on Linux (6.16.8-arch3-1) with 24 CPU cores, comparing against tiktoken (reference Python implementation), Hugging Face tokenizers, and TokenDagger.

Single Text Encoding

For single texts, splintr achieves 3-4x faster encoding across various text sizes:

Single Text Encoding Comparison

Latency by content type:

Latency Comparison

Consistent low latency across Python code, JSON, English prose, and Chinese text makes splintr ideal for interactive applications and real-time processing.

Batch Encoding

The real magic happens with batches. Splintr parallelizes across texts to achieve 10-12x speedup:

Batch Speedup vs Tiktoken

Higher speedups on larger batches where parallelization overhead is amortized. Perfect for:

Training data preprocessing
Bulk document tokenization
API batch processing
Data pipeline throughput

Design Decision: Sequential by Default

Splintr uses sequential encoding for single texts and parallel encoding across batches based on empirical benchmarking:

Sequential vs Rayon Internal Parallelization

Key findings:

Sequential is faster for texts up to ~1MB (typical LLM prompts and documents)
Rayon's parallelization overhead only pays off at ~1MB+ text sizes
Most real-world inputs are well under 1MB
encode() uses sequential processing for optimal single-text performance
encode_batch() parallelizes across multiple texts for maximum throughput
encode_rayon() available for the rare cases where you have >1MB single texts

This architecture ensures splintr is optimized for the most common tokenization patterns in LLM applications.

Running Benchmarks Yourself

# Clone and install
git clone https://github.com/ml-rust/splintr.git
cd splintr
pip install -e .
pip install tiktoken

# Run the benchmark suite
cd benchmarks
python benchmark.py --model cl100k_base --output results/my_benchmark.json

# View results
cat results/my_benchmark.md

The benchmark suite tests single text encoding, batch encoding, streaming decoder performance, and special token handling across various content types.

Regex Backends

Splintr uses a pure-Rust regex engine (regexr) by default, with optional PCRE2 support for compatibility.

Default Backend (regexr):

Pure Rust implementation (no C dependencies)
JIT compilation and SIMD acceleration
Native UTF-8 and Unicode property support

Optional PCRE2 Backend:

from splintr import Tokenizer

# Default: regexr backend (pure Rust)
tokenizer = Tokenizer.from_pretrained("cl100k_base")

# Optional: switch to PCRE2 (requires --features pcre2)
tokenizer = Tokenizer.from_pretrained("cl100k_base").pcre2(True)

To enable PCRE2, build with the feature flag:

maturin develop --release --features pcre2

Benchmarking:

# Compare backends (requires PCRE2 feature)
python benchmarks/benchmark_regexr_comparison.py --model cl100k_base

# Visual comparison with charts
python benchmarks/benchmark_regexr_viz.py --model cl100k_base

Streaming Decoders

For real-time LLM applications where tokens arrive one at a time, Splintr provides streaming decoders that handle UTF-8 boundary alignment:

# Regular streaming decoder (cl100k_base, o200k_base, llama3)
decoder = tokenizer.streaming_decoder()

# ByteLevel streaming decoder (deepseek_v3, GPT-2)
decoder = tokenizer.byte_level_streaming_decoder()

# Process tokens as they arrive
for token_id in token_stream:
    if text := decoder.add_token(token_id):
        print(text, end="", flush=True)
print(decoder.flush())

Why streaming decoders? BPE tokens don't align with UTF-8 character boundaries. A multi-byte character like "世" might split across tokens. The streaming decoder buffers incomplete sequences and only outputs complete characters.

See the API Guide for detailed usage, examples, and best practices.

Supported Vocabularies

Vocabulary	Used By	Vocabulary Size	Special Tokens	Import Constant
`cl100k_base`	GPT-4, GPT-3.5-turbo	~100,000	5 + 54 agent	`CL100K_BASE_PATTERN`
`o200k_base`	GPT-4o	~200,000	2 + 54 agent	`O200K_BASE_PATTERN`
`llama3`	Llama 3, 3.1, 3.2, 3.3 (Meta)	~128,000	11 + 54 agent	`LLAMA3_PATTERN`
`deepseek_v3`	DeepSeek V3, DeepSeek R1	~128,000	17 + 54 agent	`LLAMA3_PATTERN`
`mistral_v1`	Mistral 7B v0.1/v0.2, Mixtral 8x7B	~32,000	3 + 54 agent	`SENTENCEPIECE_PATTERN`
`mistral_v2`	Mistral 7B v0.3, Codestral, 8x22B	~32,768	10 + 54 agent	`SENTENCEPIECE_PATTERN`
`mistral_v3`	Mistral NeMo, Large 2, Pixtral	~131,000	10 + 54 agent	`MISTRAL_V3_PATTERN`

OpenAI standard tokens:

cl100k_base: <|endoftext|>, <|fim_prefix|>, <|fim_middle|>, <|fim_suffix|>, <|endofprompt|>
o200k_base: <|endoftext|>, <|endofprompt|>

Meta Llama 3 standard tokens:

llama3: <|begin_of_text|>, <|end_of_text|>, <|start_header_id|>, <|end_header_id|>, <|eot_id|>, <|eom_id|> (3.1+), <|python_tag|> (3.1+), <|step_id|> (3.2-Vision), <|image|> (3.2-Vision)

DeepSeek V3 standard tokens:

deepseek_v3: <｜begin▁of▁sentence｜>, <｜end▁of▁sentence｜>, <think>, </think>, <｜User｜>, <｜Assistant｜>, <|EOT|>, FIM tokens (<｜fim▁hole｜>, <｜fim▁begin｜>, <｜fim▁end｜>), tool calling tokens (<｜tool▁calls▁begin｜>, <｜tool▁call▁begin｜>, etc.)

Mistral standard tokens:

mistral_v1: <unk>, <s>, </s> (SentencePiece native)
mistral_v2: Same as V1 + control tokens: [INST], [/INST], [TOOL_CALLS], [AVAILABLE_TOOLS], [/AVAILABLE_TOOLS], [TOOL_RESULTS], [/TOOL_RESULTS]
mistral_v3: <unk>, <s>, </s> + control tokens (Tekken/Tiktoken-based, NOT SentencePiece)

Agent Tokens (54 per model)

Splintr extends all vocabularies with 54 specialized tokens for building agent systems:

from splintr import Tokenizer, CL100K_AGENT_TOKENS

tokenizer = Tokenizer.from_pretrained("cl100k_base")
text = "<|think|>Let me reason...<|/think|>The answer is 42."
tokens = tokenizer.encode_with_special(text)
print(CL100K_AGENT_TOKENS.THINK)      # 100282
print(CL100K_AGENT_TOKENS.FUNCTION)   # 100292

Category	Example Tokens	Purpose
Conversation	`system`, `user`, `assistant`, `im_start`, `im_end`	ChatML format
Thinking	`think`	Chain-of-Thought reasoning
ReAct	`plan`, `step`, `act`, `observe`	Agent action loops
Tools	`function`, `result`, `error`	Function calling
RAG	`context`, `quote`, `cite`, `source`	Citations

See docs/special_tokens.md for the complete list and API Guide for usage examples.

How It Works

Splintr implements several optimizations that make tokenization faster:

Regexr with JIT compilation: Pure Rust regex engine with SIMD acceleration
Rayon parallelism: Leverages multiple CPU cores for batch encoding
Linked-list BPE algorithm: Avoids O(N²) complexity on pathological inputs
FxHashMap: Faster lookups than default SipHash for non-adversarial contexts
Aho-Corasick for special tokens: Fast multi-pattern matching without regex alternation
LRU cache: Avoids redundant BPE encoding of frequently seen chunks

Use Cases

LLM Applications:

Tokenizing prompts with 3-4x lower latency
Streaming decoder for real-time output display
Token counting for API cost estimation

Agent Systems:

Building ReAct agents with structured reasoning tokens
Tool-calling systems with function tokens
Chain-of-Thought reasoning with thinking tokens

Training Pipelines:

Fast batch encoding of large datasets (10-12x speedup)
Preprocessing millions of documents efficiently
Parallel tokenization across distributed systems

RAG Applications:

Structured context injection with citation tokens
Document chunking with section markers
Source tracking through tokenization

Data Processing:

Bulk document tokenization
Multi-language text processing
Real-time text preprocessing

Contributing

Contributions are welcome! Here's how you can help:

Report bugs: Open an issue with a minimal reproduction case
Suggest features: Describe your use case and why the feature would be helpful
Submit pull requests:
- Add tests for new functionality
- Run cargo test and cargo clippy before submitting
- Update documentation as needed

Development Setup

# Clone the repository
git clone https://github.com/ml-rust/splintr.git
cd splintr

# Install pre-commit hook (recommended)
cp hooks/pre-commit .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

# Build the Rust library
cargo build --release

# Build Python bindings
pip install maturin
maturin develop --release

# Run tests
cargo test                    # Rust tests
cargo clippy --all-targets    # Linting
cargo fmt --all --check       # Format check

The pre-commit hook automatically runs formatting, clippy, and tests before each commit.

Acknowledgments

Splintr builds upon concepts from:

tiktoken - OpenAI's reference BPE tokenizer
tokenizers - Hugging Face's tokenization library

The performance optimizations are informed by profiling real-world usage patterns in LLM applications.

Citation

If you use Splintr in your research, please cite:

@software{splintr,
  author = {Farhan Syah},
  title = {Splintr: High-Performance BPE Tokenizer},
  year = {2025},
  url = {https://github.com/ml-rust/splintr}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

farhansyah

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.1

Mar 15, 2026

0.9.1b1 pre-release

Mar 15, 2026

0.9.0

Mar 14, 2026

0.9.0b1 pre-release

Mar 11, 2026

0.8.0

Dec 24, 2025

This version

0.8.0b2 pre-release

Dec 24, 2025

0.7.0b4 pre-release

Dec 2, 2025

0.7.0b3 pre-release

Dec 2, 2025

0.6.0

Nov 26, 2025

0.5.0

Nov 26, 2025

0.4.0

Nov 26, 2025

0.3.0

Nov 26, 2025

0.2.0b3 pre-release

Nov 26, 2025

0.2.0b2 pre-release

Nov 26, 2025

0.1.0b1 pre-release

Nov 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splintr_rs-0.8.0b2.tar.gz (7.3 MB view details)

Uploaded Dec 24, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

splintr_rs-0.8.0b2-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded Dec 24, 2025 CPython 3.12Windows x86-64

splintr_rs-0.8.0b2-cp312-cp312-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded Dec 24, 2025 CPython 3.12macOS 11.0+ ARM64

splintr_rs-0.8.0b2-cp312-cp312-macosx_10_12_x86_64.whl (13.5 MB view details)

Uploaded Dec 24, 2025 CPython 3.12macOS 10.12+ x86-64

splintr_rs-0.8.0b2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB view details)

Uploaded Dec 24, 2025 CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file splintr_rs-0.8.0b2.tar.gz.

File metadata

Download URL: splintr_rs-0.8.0b2.tar.gz
Upload date: Dec 24, 2025
Size: 7.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splintr_rs-0.8.0b2.tar.gz
Algorithm	Hash digest
SHA256	`71e87def8abeb9e3b1a93a256653a9b02048c8ef8736aec60d6cf99687cde411`
MD5	`409131a7355b92e78e6c1c2087b013d7`
BLAKE2b-256	`4ff8c3a28ee00b4b24e1fdeaff97bfa1b9845db7b28f50717e1b7ca38fc31dcd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for splintr_rs-0.8.0b2.tar.gz:

Publisher: release.yml on ml-rust/splintr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: splintr_rs-0.8.0b2.tar.gz
- Subject digest: 71e87def8abeb9e3b1a93a256653a9b02048c8ef8736aec60d6cf99687cde411
- Sigstore transparency entry: 778838940
- Sigstore integration time: Dec 24, 2025
Source repository:
- Permalink: ml-rust/splintr@5decda1c57eb43cf31d30380320432405271fed3
- Branch / Tag: refs/tags/v0.8.0-beta.2
- Owner: https://github.com/ml-rust
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5decda1c57eb43cf31d30380320432405271fed3
- Trigger Event: push

File details

Details for the file splintr_rs-0.8.0b2-cp312-cp312-win_amd64.whl.

File metadata

Download URL: splintr_rs-0.8.0b2-cp312-cp312-win_amd64.whl
Upload date: Dec 24, 2025
Size: 13.7 MB
Tags: CPython 3.12, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splintr_rs-0.8.0b2-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`4aa8ebd157d0bd96f0b9408f4d58f007fde9e32de85be498dc8a6c6ae7831440`
MD5	`8cab88ac9c6a398cc2bc151fec7e277e`
BLAKE2b-256	`6b2bb929248f0a5b2d459c4acc7dc08f7603654ec8a1dde6c28b8bac3c546b3d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for splintr_rs-0.8.0b2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on ml-rust/splintr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: splintr_rs-0.8.0b2-cp312-cp312-win_amd64.whl
- Subject digest: 4aa8ebd157d0bd96f0b9408f4d58f007fde9e32de85be498dc8a6c6ae7831440
- Sigstore transparency entry: 778838973
- Sigstore integration time: Dec 24, 2025
Source repository:
- Permalink: ml-rust/splintr@5decda1c57eb43cf31d30380320432405271fed3
- Branch / Tag: refs/tags/v0.8.0-beta.2
- Owner: https://github.com/ml-rust
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5decda1c57eb43cf31d30380320432405271fed3
- Trigger Event: push

File details

Details for the file splintr_rs-0.8.0b2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

Download URL: splintr_rs-0.8.0b2-cp312-cp312-macosx_11_0_arm64.whl
Upload date: Dec 24, 2025
Size: 13.5 MB
Tags: CPython 3.12, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splintr_rs-0.8.0b2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`7273bc4feac924cfc32608bab457f6ec998345199524be42f1517a7922c9dd47`
MD5	`4e4f9f855b681cf18ef32ce5ac4e28c0`
BLAKE2b-256	`020dd10e4902372f4fe9991a946b13412adb37923f32f33cfea378880e5743c9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for splintr_rs-0.8.0b2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on ml-rust/splintr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: splintr_rs-0.8.0b2-cp312-cp312-macosx_11_0_arm64.whl
- Subject digest: 7273bc4feac924cfc32608bab457f6ec998345199524be42f1517a7922c9dd47
- Sigstore transparency entry: 778838949
- Sigstore integration time: Dec 24, 2025
Source repository:
- Permalink: ml-rust/splintr@5decda1c57eb43cf31d30380320432405271fed3
- Branch / Tag: refs/tags/v0.8.0-beta.2
- Owner: https://github.com/ml-rust
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5decda1c57eb43cf31d30380320432405271fed3
- Trigger Event: push

File details

Details for the file splintr_rs-0.8.0b2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

Download URL: splintr_rs-0.8.0b2-cp312-cp312-macosx_10_12_x86_64.whl
Upload date: Dec 24, 2025
Size: 13.5 MB
Tags: CPython 3.12, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splintr_rs-0.8.0b2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`544910a03214511e0510e6c90f1a05e64f2c58c7aabd3ea148cc5bf9d7f07497`
MD5	`e78b374717f14b1f62efc68869413e99`
BLAKE2b-256	`cc8af3ab036c1e3ebeba002da52bf5858f19fea688073ec930157cdab31196f5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for splintr_rs-0.8.0b2-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: release.yml on ml-rust/splintr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: splintr_rs-0.8.0b2-cp312-cp312-macosx_10_12_x86_64.whl
- Subject digest: 544910a03214511e0510e6c90f1a05e64f2c58c7aabd3ea148cc5bf9d7f07497
- Sigstore transparency entry: 778838985
- Sigstore integration time: Dec 24, 2025
Source repository:
- Permalink: ml-rust/splintr@5decda1c57eb43cf31d30380320432405271fed3
- Branch / Tag: refs/tags/v0.8.0-beta.2
- Owner: https://github.com/ml-rust
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5decda1c57eb43cf31d30380320432405271fed3
- Trigger Event: push

File details

Details for the file splintr_rs-0.8.0b2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: splintr_rs-0.8.0b2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Dec 24, 2025
Size: 13.8 MB
Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splintr_rs-0.8.0b2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e56f5e64b69875f965c57fc50a41951a13ac400593945617d9095a9cf4503d0f`
MD5	`2608677abdbe2c703df34b1470a4bc7a`
BLAKE2b-256	`5d41a748887d90bc027f61abf04e0a8fe5cd76bc75610aec4b852bbebdb85974`

See more details on using hashes here.

Provenance

The following attestation bundles were made for splintr_rs-0.8.0b2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ml-rust/splintr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: splintr_rs-0.8.0b2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: e56f5e64b69875f965c57fc50a41951a13ac400593945617d9095a9cf4503d0f
- Sigstore transparency entry: 778838960
- Sigstore integration time: Dec 24, 2025
Source repository:
- Permalink: ml-rust/splintr@5decda1c57eb43cf31d30380320432405271fed3
- Branch / Tag: refs/tags/v0.8.0-beta.2
- Owner: https://github.com/ml-rust
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5decda1c57eb43cf31d30380320432405271fed3
- Trigger Event: push

splintr-rs 0.8.0b2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

The Problem

The Solution

Quick Start

Python

Rust

Key Features

Performance Deep Dive

Single Text Encoding

Batch Encoding

Design Decision: Sequential by Default

Running Benchmarks Yourself

Regex Backends

Streaming Decoders

Supported Vocabularies

Agent Tokens (54 per model)

How It Works

Use Cases

Contributing

Development Setup

Acknowledgments

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance