Skip to main content

High-performance regex with JIT/SIMD optimizations

Project description

FastRegex

Build Status License: MIT Python Version PyPI version

A high-performance regular expression library for Python with JIT compilation and SIMD optimizations.

🚀 Features

  • JIT Compilation: LLVM-based just-in-time compilation for complex patterns
  • SIMD Optimizations: AVX2/AVX512/SSE4.2/NEON support for vectorized operations
  • Smart Caching: Automatic caching of compiled patterns to avoid recompilation
  • Python Integration: Seamless integration via pybind11
  • High Performance: Up to 1000x faster than standard re module for complex patterns

📊 Performance Benchmarks

Test Case Python re (ms) FastRegex (ms) Speedup
Email validation 2.250 ±0.157 0.021 ±0.002 107x ✅
Word boundaries 0.166 ±0.011 0.025 ±0.002 6.6x ✅
Complex pattern 19.665 ±4.100 0.017 ±0.003 1156x ✅
Multiline text 0.855 ±0.163 0.219 ±0.002 3.9x ✅

Key insights:

  • Up to 1156x faster for complex patterns
  • 3-100x acceleration for typical scenarios
  • Best performance on repetitive operations

🛠 Installation

From PyPI (Recommended)

pip install fastregex

From Source

git clone https://github.com/baksvell/fastregex.git
cd fastregex
pip install -e .

Prerequisites

  • CMake 3.20+
  • Python 3.10+
  • C++17 compiler (GCC/MSVC/Clang)

📖 Usage

Basic Usage

import fastregex

# Simple search
result = fastregex.search(r'\d+', 'abc123def')
print(result)  # True

# Find all matches
matches = fastregex.find_all(r'\w+', 'hello world test')
print(matches)  # ['hello', 'world', 'test']

# Replace
new_text = fastregex.replace(r'\d+', 'abc123def456', 'XXX')
print(new_text)  # 'abcXXXdefXXX'

# Compile for reuse
compiled = fastregex.compile(r'\d+')
result = compiled.search('abc123def')
print(result)  # True

Advanced Features

# Check cache statistics
print(f"Cache size: {fastregex.cache_size()}")
print(f"Hit rate: {fastregex.hit_rate():.2%}")

# SIMD capabilities
caps = fastregex.simd_capabilities()
print(f"AVX2 support: {caps['avx2']}")
print(f"AVX512 support: {caps['avx512']}")

# SIMD statistics
stats = fastregex.get_simd_stats()
print(f"Total calls: {stats['total_calls']}")

🎯 When to Use FastRegex

Use FastRegex when:

  • Complex patterns (JIT compilation shines)
  • Repetitive matching (cache pays off)
  • SIMD-friendly patterns (literals, digit checks)
  • Large texts (>1MB optimized chunks)

⚠️ Use standard re when:

  • Simple one-time matches (no JIT overhead)
  • Need 100% compatibility with Python's regex
  • Dynamic patterns (generated on-the-fly)

🔄 Hybrid approach:

import re
import fastregex as fr

def smart_match(pattern, text):
    if len(pattern) > 15 and len(text) > 1000:
        return fr.search(pattern, text)
    return re.search(pattern, text)

🧪 Testing

Run the test suite:

python -m pytest tests/

Run performance benchmarks:

python tests/benchmark.py

📚 API Reference

Core Functions

  • fastregex.match(pattern, text) - Match from start of string
  • fastregex.search(pattern, text) - Search anywhere in string
  • fastregex.find_all(pattern, text) - Find all matches
  • fastregex.replace(pattern, text, replacement) - Replace matches
  • fastregex.compile(pattern) - Compile pattern for reuse

Cache Management

  • fastregex.cache_size() - Get current cache size
  • fastregex.hit_rate() - Get cache hit rate
  • fastregex.clear_cache() - Clear the cache

SIMD Features

  • fastregex.simd_capabilities() - Get SIMD support info
  • fastregex.get_simd_stats() - Get SIMD usage statistics
  • fastregex.set_simd_mode(mode) - Set SIMD mode
  • fastregex.get_simd_mode() - Get current SIMD mode

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

🙏 Acknowledgments

  • pybind11 for Python bindings
  • LLVM for JIT compilation
  • SIMD for vectorized operations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastregex-0.1.0.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastregex-0.1.0-cp312-cp312-win_amd64.whl (4.5 kB view details)

Uploaded CPython 3.12Windows x86-64

File details

Details for the file fastregex-0.1.0.tar.gz.

File metadata

  • Download URL: fastregex-0.1.0.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for fastregex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c65a7aa1254a87c1f6d9864c67f7d830b59a8ddc67220c957f68246e19ff4339
MD5 fa7398f59c625c9cff1cb71670f82a42
BLAKE2b-256 3d8f2399399f8b592aa17635b1a17c208bf84b8c2ad752fe95fe1bee2bad48cb

See more details on using hashes here.

File details

Details for the file fastregex-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: fastregex-0.1.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for fastregex-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f6b8e4822fa877c4614a62229911a310d5d68e02c28778cd83fdf3f19489151d
MD5 c2de76173c90c3bfc0981b45d02b0ba7
BLAKE2b-256 442f8621749c67db127f1592b9fc09de9024184aebe3155643b7aa0702fd3070

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page