Skip to main content

High-performance regex with JIT/SIMD optimizations

Project description

FastRegex

Build Status License: MIT Python Version PyPI version

A high-performance regular expression library for Python with JIT compilation and SIMD optimizations.

🚀 Features

  • JIT Compilation: LLVM-based just-in-time compilation for complex patterns
  • SIMD Optimizations: AVX2/AVX512/SSE4.2/NEON support for vectorized operations
  • Smart Caching: Automatic caching of compiled patterns to avoid recompilation
  • Python Integration: Seamless integration via pybind11
  • High Performance: 1.5-5x faster than standard re module for specific use cases

📊 Performance Benchmarks

Test Case Python re (ms) FastRegex (ms) Speedup
Short literals 0.0040 0.0023 1.7x ✅
Simple patterns 0.0041 0.0025 1.6x ✅
Find all matches 0.0127 0.0095 1.3x ✅
Match operations 0.0040 0.0023 1.7x ✅

Key insights:

  • 1.5-1.9x faster for most use cases
  • Best performance on short literals and simple patterns
  • Fully compatible with standard re module behavior
  • Optimized for patterns < 50 characters

🛠 Installation

From PyPI (Recommended)

Using Docker (Recommended)

# Clone the repository
git clone https://github.com/baksvell/fastregex.git
cd fastregex

# Run with Docker
docker-compose up -d fastregex

# Enter the container
docker exec -it fastregex-dev bash

# Use FastRegex
python -c "import fastregex; print('FastRegex ready!')"

From PyPI

pip install fastregex

From Source

git clone https://github.com/baksvell/fastregex.git
cd fastregex
pip install -e .

Prerequisites

  • CMake 3.20+
  • Python 3.10+
  • C++17 compiler (GCC/MSVC/Clang)

📖 Usage

Basic Usage

import fastregex

# Simple search
result = fastregex.search(r'\d+', 'abc123def')
print(result)  # True

# Find all matches
matches = fastregex.find_all(r'\w+', 'hello world test')
print(matches)  # ['hello', 'world', 'test']

# Replace
new_text = fastregex.replace(r'\d+', 'abc123def456', 'XXX')
print(new_text)  # 'abcXXXdefXXX'

# Compile for reuse
compiled = fastregex.compile(r'\d+')
result = compiled.search('abc123def')
print(result)  # True

Advanced Features

# Check cache statistics
print(f"Cache size: {fastregex.cache_size()}")
print(f"Hit rate: {fastregex.hit_rate():.2%}")

# Pattern information
compiled = fastregex.compile(r'\d+')
print(f"Pattern: {compiled.pattern()}")
print(f"JIT compiled: {compiled.jit_compiled}")

🎯 When to Use FastRegex

Use FastRegex when:

  • Short literal patterns (1.7x faster)
  • Simple regex patterns (1.6x faster)
  • Match operations (1.7x faster)
  • Find all operations (1.3x faster)
  • Patterns < 50 characters

⚠️ Use standard re when:

  • Very large texts (>10MB)
  • Complex regex patterns with many groups
  • Need advanced regex features
  • Long patterns (>50 characters)

🔄 Hybrid approach:

import re
import fastregex as fr

def smart_match(pattern, text):
    if len(pattern) > 15 and len(text) > 1000:
        return fr.search(pattern, text)
    return re.search(pattern, text)

🧪 Testing

Run the test suite:

python -m pytest tests/

Run performance benchmarks:

python tests/benchmark.py

📚 API Reference

Core Functions

  • fastregex.match(pattern, text) - Match from start of string
  • fastregex.search(pattern, text) - Search anywhere in string
  • fastregex.find_all(pattern, text) - Find all matches
  • fastregex.replace(pattern, text, replacement) - Replace matches
  • fastregex.compile(pattern) - Compile pattern for reuse

Cache Management

  • fastregex.cache_size() - Get current cache size
  • fastregex.hit_rate() - Get cache hit rate
  • fastregex.clear_cache() - Clear the cache

Pattern Information

  • compiled.pattern() - Get the compiled pattern
  • compiled.jit_compiled - Check if pattern is JIT compiled
  • compiled.compile_time() - Get compilation time

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

🙏 Acknowledgments

  • pybind11 for Python bindings
  • LLVM for JIT compilation
  • SIMD for vectorized operations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastregex-0.1.1.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastregex-0.1.1-cp312-cp312-win_amd64.whl (4.5 kB view details)

Uploaded CPython 3.12Windows x86-64

File details

Details for the file fastregex-0.1.1.tar.gz.

File metadata

  • Download URL: fastregex-0.1.1.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for fastregex-0.1.1.tar.gz
Algorithm Hash digest
SHA256 513cab9d511007d6c2f6ddc15b790c929883f8bcf257a479c9aa64ca0afcc632
MD5 5cf5d353e6a0c53b0ef5358f7bbd5f6f
BLAKE2b-256 e2286f863b16a49316e75ccf6ce2be4d0417c8d9244ab68e47681528621272f8

See more details on using hashes here.

File details

Details for the file fastregex-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: fastregex-0.1.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for fastregex-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3e295a55c31cafb34cf173ecb7a22d08e3afe2c910e2f8cc5a6f3471f6dccfd4
MD5 1199fed40dc24ee419a79f46bb39fa83
BLAKE2b-256 ebc7e24a2ffd9a0e235e76a8afe945e9cf840532d1d1eb5c4b32960bb2715a40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page