High-performance fuzzy search engine using Bitap algorithm with bloom filter pre-filtering. Powered by Rust for blazing fast performance.
Project description
Flash-Fuzzy
High-performance fuzzy search engine using Bitap algorithm with bloom filter pre-filtering. Powered by Rust for blazing fast performance.
Features
- Blazing fast - Rust-powered performance with Python convenience
- Typo tolerant - Configurable edit distance (0-3 errors)
- Smart filtering - Bloom filter pre-screening for O(1) rejection
- Easy to use - Pythonic API with type hints
- Zero dependencies - Pure Rust core, no external dependencies
- Thread-safe - Safe for concurrent use
Installation
pip install flash-fuzzy
Quick Start
from flash_fuzzy import FlashFuzzy
# Create instance
ff = FlashFuzzy(threshold=0.25, max_errors=2, max_results=50)
# Add records
ff.add([
{"id": 1, "name": "Wireless Headphones", "category": "Electronics"},
{"id": 2, "name": "Mechanical Keyboard", "category": "Computers"},
{"id": 3, "name": "USB-C Cable", "category": "Accessories"},
])
# Search with typos
results = ff.search("keybord") # Note the typo
for r in results:
print(f"ID: {r.id}, Score: {r.score:.2f}")
API
FlashFuzzy
FlashFuzzy(
threshold: float = 0.25, # Minimum score (0.0-1.0)
max_errors: int = 2, # Max edit distance (0-3)
max_results: int = 50 # Max results to return
)
Methods
add(records)- Add a dict or list of dictssearch(query)- Search and return list of SearchResultremove(id)- Remove record by IDreset()- Clear all records
Properties
count- Number of recordsthreshold- Get/set thresholdmax_errors- Get/set max errorsmax_results- Get/set max results
SearchResult
id: int- Record IDscore: float- Match score (0.0-1.0)start: int- Match start positionend: int- Match end position
Advanced Examples
E-commerce Product Search
from flash_fuzzy import FlashFuzzy
from dataclasses import dataclass
@dataclass
class Product:
id: int
name: str
brand: str
category: str
class ProductSearch:
def __init__(self):
self.ff = FlashFuzzy(threshold=0.3, max_errors=2, max_results=20)
self.products = {}
def index_product(self, product: Product):
self.products[product.id] = product
search_text = f"{product.name} {product.brand} {product.category}"
self.ff.add({"id": product.id, "text": search_text})
def search(self, query: str) -> list[Product]:
results = self.ff.search(query)
return [self.products[r.id] for r in results if r.id in self.products]
# Usage
search = ProductSearch()
search.index_product(Product(1, "MacBook Pro 16", "Apple", "Laptops"))
search.index_product(Product(2, "ThinkPad X1", "Lenovo", "Laptops"))
matches = search.search("macbok") # typo
for product in matches:
print(f"{product.name} by {product.brand}")
Django Integration
from flash_fuzzy import FlashFuzzy
from django.core.cache import cache
class SearchService:
def __init__(self):
self.ff = FlashFuzzy()
self._load_from_cache()
def index_model(self, queryset, text_field='name'):
for obj in queryset:
self.ff.add({
"id": obj.pk,
"text": getattr(obj, text_field)
})
self._save_to_cache()
def search(self, query: str):
results = self.ff.search(query)
return [r.id for r in results]
def _save_to_cache(self):
# Save search index to cache
cache.set('search_index', self.ff, timeout=3600)
def _load_from_cache(self):
cached = cache.get('search_index')
if cached:
self.ff = cached
FastAPI Endpoint
from fastapi import FastAPI, Query
from flash_fuzzy import FlashFuzzy
from pydantic import BaseModel
app = FastAPI()
search_engine = FlashFuzzy(threshold=0.25, max_errors=2)
class SearchResult(BaseModel):
id: int
score: float
@app.on_event("startup")
async def load_data():
# Load your data
products = [
{"id": 1, "text": "Wireless Keyboard"},
{"id": 2, "text": "USB Mouse"},
{"id": 3, "text": "HDMI Cable"},
]
search_engine.add(products)
@app.get("/search", response_model=list[SearchResult])
async def search(q: str = Query(..., min_length=2)):
results = search_engine.search(q)
return [
SearchResult(id=r.id, score=r.score)
for r in results
]
Async/Await with asyncio
import asyncio
from flash_fuzzy import FlashFuzzy
from concurrent.futures import ThreadPoolExecutor
class AsyncSearchEngine:
def __init__(self):
self.ff = FlashFuzzy()
self.executor = ThreadPoolExecutor(max_workers=4)
async def search_async(self, query: str):
loop = asyncio.get_event_loop()
results = await loop.run_in_executor(
self.executor,
self.ff.search,
query
)
return results
# Usage
async def main():
engine = AsyncSearchEngine()
engine.ff.add({"id": 1, "text": "Python Programming"})
engine.ff.add({"id": 2, "text": "Rust Programming"})
results = await engine.search_async("pythn") # typo
for r in results:
print(f"ID: {r.id}, Score: {r.score}")
asyncio.run(main())
Performance
- Search: < 1ms for 10,000 records
- Indexing: O(n) where n = text length
- Memory: ~1KB per record
- Throughput: ~100,000 searches/second
Bloom filter pre-filtering provides O(1) rejection of non-matches before running expensive fuzzy matching.
Platform Support
| Platform | Status |
|---|---|
| Linux (x86_64, ARM64) | ✅ Supported |
| macOS (x86_64, Apple Silicon) | ✅ Supported |
| Windows (x86_64) | ✅ Supported |
Pre-built wheels available for all major platforms.
Links
- PyPI: https://pypi.org/project/flash-fuzzy/
- GitHub: https://github.com/RafaCalRob/FlashFuzzy
- Crates.io (Rust): https://crates.io/crates/flash-fuzzy-core
- NPM (JavaScript): https://www.npmjs.com/package/@bdovenbird/flashfuzzy
- Maven (Java): https://search.maven.org/artifact/com.bdovenbird/flash-fuzzy
License
MIT - see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flash_fuzzy-0.1.0.tar.gz.
File metadata
- Download URL: flash_fuzzy-0.1.0.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5f493b7f87c897cc1b37ab3912c1a906e07e414a24f3716dc6f0749faaae8e5
|
|
| MD5 |
0b013052e22d249a3724b35207d05b24
|
|
| BLAKE2b-256 |
22b54c94bda7b80eb97772b99411634632364961a5c352217acbeda817ca9a60
|
File details
Details for the file flash_fuzzy-0.1.0-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: flash_fuzzy-0.1.0-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 136.4 kB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e960055063efd965b320e0e2862ba993e96de452384212c0f0a0c64ca9e803d
|
|
| MD5 |
a412094cb14922d6e4a8d2b445e89df8
|
|
| BLAKE2b-256 |
827b256b84bf9440a5901a2460b0c2dc0a67020d10130f9d6567550a6612dc4c
|