High-performance regex with JIT/SIMD optimizations
Project description
FastRegex
A high-performance regular expression library for Python with JIT compilation and SIMD optimizations.
🚀 Features
- JIT Compilation: LLVM-based just-in-time compilation for complex patterns
- SIMD Optimizations: AVX2/AVX512/SSE4.2/NEON support for vectorized operations
- Smart Caching: Automatic caching of compiled patterns to avoid recompilation
- Python Integration: Seamless integration via pybind11
- High Performance: 1.5-5x faster than standard
remodule for specific use cases
📊 Performance Benchmarks
| Test Case | Python re (ms) | FastRegex (ms) | Speedup |
|---|---|---|---|
| Short literals | 0.0040 | 0.0023 | 1.7x ✅ |
| Simple patterns | 0.0041 | 0.0025 | 1.6x ✅ |
| Find all matches | 0.0127 | 0.0095 | 1.3x ✅ |
| Match operations | 0.0040 | 0.0023 | 1.7x ✅ |
Key insights:
- 1.5-1.9x faster for most use cases
- Best performance on short literals and simple patterns
- Fully compatible with standard
remodule behavior - Optimized for patterns < 50 characters
🛠 Installation
From PyPI (Recommended)
Using Docker (Recommended)
# Clone the repository
git clone https://github.com/baksvell/fastregex.git
cd fastregex
# Run with Docker
docker-compose up -d fastregex
# Enter the container
docker exec -it fastregex-dev bash
# Use FastRegex
python -c "import fastregex; print('FastRegex ready!')"
From PyPI
pip install fastregex
From Source
git clone https://github.com/baksvell/fastregex.git
cd fastregex
pip install -e .
Prerequisites
- CMake 3.20+
- Python 3.10+
- C++17 compiler (GCC/MSVC/Clang)
📖 Usage
Basic Usage
import fastregex
# Simple search
result = fastregex.search(r'\d+', 'abc123def')
print(result) # True
# Find all matches
matches = fastregex.find_all(r'\w+', 'hello world test')
print(matches) # ['hello', 'world', 'test']
# Replace
new_text = fastregex.replace(r'\d+', 'abc123def456', 'XXX')
print(new_text) # 'abcXXXdefXXX'
# Compile for reuse
compiled = fastregex.compile(r'\d+')
result = compiled.search('abc123def')
print(result) # True
Advanced Features
# Check cache statistics
print(f"Cache size: {fastregex.cache_size()}")
print(f"Hit rate: {fastregex.hit_rate():.2%}")
# Pattern information
compiled = fastregex.compile(r'\d+')
print(f"Pattern: {compiled.pattern()}")
print(f"JIT compiled: {compiled.jit_compiled}")
🎯 When to Use FastRegex
✅ Use FastRegex when:
- Short literal patterns (1.7x faster)
- Simple regex patterns (1.6x faster)
- Match operations (1.7x faster)
- Find all operations (1.3x faster)
- Patterns < 50 characters
⚠️ Use standard re when:
- Very large texts (>10MB)
- Complex regex patterns with many groups
- Need advanced regex features
- Long patterns (>50 characters)
🔄 Hybrid approach:
import re
import fastregex as fr
def smart_match(pattern, text):
if len(pattern) > 15 and len(text) > 1000:
return fr.search(pattern, text)
return re.search(pattern, text)
🧪 Testing
Run the test suite:
python -m pytest tests/
Run performance benchmarks:
python tests/benchmark.py
📚 API Reference
Core Functions
fastregex.match(pattern, text)- Match from start of stringfastregex.search(pattern, text)- Search anywhere in stringfastregex.find_all(pattern, text)- Find all matchesfastregex.replace(pattern, text, replacement)- Replace matchesfastregex.compile(pattern)- Compile pattern for reuse
Cache Management
fastregex.cache_size()- Get current cache sizefastregex.hit_rate()- Get cache hit ratefastregex.clear_cache()- Clear the cache
Pattern Information
compiled.pattern()- Get the compiled patterncompiled.jit_compiled- Check if pattern is JIT compiledcompiled.compile_time()- Get compilation time
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
🙏 Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastregex-0.1.1.tar.gz.
File metadata
- Download URL: fastregex-0.1.1.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
513cab9d511007d6c2f6ddc15b790c929883f8bcf257a479c9aa64ca0afcc632
|
|
| MD5 |
5cf5d353e6a0c53b0ef5358f7bbd5f6f
|
|
| BLAKE2b-256 |
e2286f863b16a49316e75ccf6ce2be4d0417c8d9244ab68e47681528621272f8
|
File details
Details for the file fastregex-0.1.1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: fastregex-0.1.1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 4.5 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e295a55c31cafb34cf173ecb7a22d08e3afe2c910e2f8cc5a6f3471f6dccfd4
|
|
| MD5 |
1199fed40dc24ee419a79f46bb39fa83
|
|
| BLAKE2b-256 |
ebc7e24a2ffd9a0e235e76a8afe945e9cf840532d1d1eb5c4b32960bb2715a40
|