Skip to main content

Modern syntax highlighting for Python 3.14t — O(n) guaranteed, zero ReDoS

Project description

⌾⌾⌾ Rosettes

PyPI version Build Status Python 3.14+ License: MIT

Modern syntax highlighting for Python 3.14t

from rosettes import highlight

html = highlight("def hello(): print('world')", "python")

Why Rosettes?

  • O(n) guaranteed — Hand-written state machines, no regex backtracking
  • Zero ReDoS — No exploitable patterns, safe for untrusted input
  • Thread-safe — Immutable state, optimized for Python 3.14t free-threading
  • Pygments compatible — Drop-in CSS class compatibility
  • 55 languages — Python, JavaScript, Rust, Go, and 51 more

Installation

pip install rosettes

Requires Python 3.14+


Quick Start

Function Description
highlight(code, lang) Generate HTML with syntax highlighting
tokenize(code, lang) Get raw tokens for custom processing
highlight_many(items) Parallel highlighting for multiple blocks
list_languages() List all 55 supported languages

Features

Feature Description Docs
Basic Highlighting highlight() and tokenize() functions Highlighting →
Parallel Processing highlight_many() for multi-core systems Parallel →
Line Highlighting Highlight specific lines, add line numbers Lines →
CSS Styling Semantic or Pygments-compatible classes Styling →
Custom Formatters Build terminal, LaTeX, or custom output Extending →

📚 Full documentation: lbliii.github.io/rosettes


Usage

Basic Highlighting — Generate HTML from code
from rosettes import highlight

# Basic highlighting
html = highlight("def foo(): pass", "python")
# <div class="rosettes" data-language="python">...</div>

# With line numbers
html = highlight(code, "python", show_linenos=True)

# Highlight specific lines
html = highlight(code, "python", hl_lines={2, 3, 4})
Parallel Processing — Speed up multiple blocks

For 8+ code blocks, use highlight_many() for parallel processing:

from rosettes import highlight_many

blocks = [
    ("def foo(): pass", "python"),
    ("const x = 1;", "javascript"),
    ("fn main() {}", "rust"),
]

# Highlight in parallel
results = highlight_many(blocks)

On Python 3.14t with free-threading, this provides 1.5-2x speedup for 50+ blocks.

Tokenization — Raw tokens for custom processing
from rosettes import tokenize

tokens = tokenize("x = 42", "python")
for token in tokens:
    print(f"{token.type.name}: {token.value!r}")
# NAME: 'x'
# WHITESPACE: ' '
# OPERATOR: '='
# WHITESPACE: ' '
# NUMBER_INTEGER: '42'
CSS Class Styles — Semantic or Pygments

Semantic (default) — Readable, self-documenting:

html = highlight(code, "python")
# <span class="syntax-keyword">def</span>
# <span class="syntax-function">hello</span>
.syntax-keyword { color: #ff79c6; }
.syntax-function { color: #50fa7b; }
.syntax-string { color: #f1fa8c; }

Pygments-compatible — Use existing themes:

html = highlight(code, "python", css_class_style="pygments")
# <span class="k">def</span>
# <span class="nf">hello</span>

Supported Languages

55 languages with full syntax support
Category Languages
Core Python, JavaScript, TypeScript, JSON, YAML, TOML, Bash, HTML, CSS, Diff
Systems C, C++, Rust, Go, Zig
JVM Java, Kotlin, Scala, Groovy, Clojure
Apple Swift
Scripting Ruby, Perl, PHP, Lua, R, PowerShell
Functional Haskell, Elixir
Data/Query SQL, CSV, GraphQL
Markup Markdown, XML
Config INI, Nginx, Dockerfile, Makefile, HCL
Schema Protobuf
Modern Dart, Julia, Nim, Gleam, V
AI/ML Mojo, Triton, CUDA, Stan
Other PKL, CUE, Tree, Kida, Jinja, Plaintext

Architecture

State Machine Lexers — O(n) guaranteed

Every lexer is a hand-written finite state machine:

┌─────────────────────────────────────────────────────────────┐
│                    State Machine Lexer                       │
│                                                              │
│  ┌─────────┐   char    ┌─────────┐   char    ┌─────────┐   │
│  │ INITIAL │ ────────► │ STRING  │ ────────► │ ESCAPE  │   │
│  │ STATE   │           │ STATE   │           │ STATE   │   │
│  └─────────┘           └─────────┘           └─────────┘   │
│      │                      │                     │         │
│      │ emit                 │ emit                │ emit    │
│      ▼                      ▼                     ▼         │
│  [Token]               [Token]               [Token]        │
└─────────────────────────────────────────────────────────────┘

Key properties:

  • Single character lookahead (O(n) guaranteed)
  • No backtracking (no ReDoS possible)
  • Immutable state (thread-safe)
  • Local variables only (no shared mutable state)
Thread Safety — Free-threading ready

All public APIs are thread-safe:

  • Lexers use only local variables during tokenization
  • Formatter state is immutable
  • Registry uses functools.cache for memoization
  • Module declares itself safe for free-threading (PEP 703)

Performance

Benchmarked against Pygments on a 10,000-line Python file:

Operation Rosettes Pygments Speedup
Tokenize 12ms 45ms 3.75x
Highlight 18ms 52ms 2.89x
Parallel (8 blocks) 22ms 48ms 2.18x

Documentation

📚 lbliii.github.io/rosettes

Section Description
Get Started Installation and quickstart
Highlighting Core highlighting APIs
Styling CSS classes and themes
Reference Complete API documentation
About Architecture and design

Development

git clone https://github.com/lbliii/rosettes.git
cd rosettes
uv sync --group dev
pytest

The Bengal Ecosystem

A structured reactive stack — every layer written in pure Python for 3.14t free-threading.

ᓚᘏᗢ Bengal Static site generator Docs
∿∿ Purr Content runtime
⌁⌁ Chirp Web framework Docs
=^..^= Pounce ASGI server Docs
)彡 Kida Template engine Docs
ฅᨐฅ Patitas Markdown parser Docs
⌾⌾⌾ Rosettes Syntax highlighter ← You are here Docs

Python-native. Free-threading ready. No npm required.


License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rosettes-0.2.0.tar.gz (133.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rosettes-0.2.0-py3-none-any.whl (202.9 kB view details)

Uploaded Python 3

File details

Details for the file rosettes-0.2.0.tar.gz.

File metadata

  • Download URL: rosettes-0.2.0.tar.gz
  • Upload date:
  • Size: 133.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rosettes-0.2.0.tar.gz
Algorithm Hash digest
SHA256 538cd22f9c96e70bb743974af360c4e03f91ab80e9ed05356e2b192ab04fe633
MD5 32f8ff7de4cc30f84146cd2c1d1fb769
BLAKE2b-256 9d543568ce0b0b12ecb9a763d8c6f77929b4ef3f60d0279f6336c877c631a23f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rosettes-0.2.0.tar.gz:

Publisher: python-publish.yml on lbliii/rosettes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rosettes-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rosettes-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 202.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rosettes-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38272fa1a3332c9b53f8c5513de16bff3134d859362d1edfde9c28fd67cc8689
MD5 585ee3a11e52acedc3853993ec18c38c
BLAKE2b-256 d15fb8eca9b4ba319ec3e7c0ef38665229a082af0a85d27898320a72ba9c7c33

See more details on using hashes here.

Provenance

The following attestation bundles were made for rosettes-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on lbliii/rosettes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page