Skip to main content

High-performance fuzzy search engine using Bitap algorithm with bloom filter pre-filtering. Powered by Rust for blazing fast performance.

Project description

Flash-Fuzzy

High-performance fuzzy search engine using Bitap algorithm with bloom filter pre-filtering. Powered by Rust for blazing fast performance.

PyPI version Python versions License: MIT

Features

  • Blazing fast - Rust-powered performance with Python convenience
  • Typo tolerant - Configurable edit distance (0-3 errors)
  • Smart filtering - Bloom filter pre-screening for O(1) rejection
  • Easy to use - Pythonic API with type hints
  • Zero dependencies - Pure Rust core, no external dependencies
  • Thread-safe - Safe for concurrent use

Installation

pip install flash-fuzzy

Quick Start

from flash_fuzzy import FlashFuzzy

# Create instance
ff = FlashFuzzy(threshold=0.25, max_errors=2, max_results=50)

# Add records
ff.add([
    {"id": 1, "name": "Wireless Headphones", "category": "Electronics"},
    {"id": 2, "name": "Mechanical Keyboard", "category": "Computers"},
    {"id": 3, "name": "USB-C Cable", "category": "Accessories"},
])

# Search with typos
results = ff.search("keybord")  # Note the typo
for r in results:
    print(f"ID: {r.id}, Score: {r.score:.2f}")

API

FlashFuzzy

FlashFuzzy(
    threshold: float = 0.25,   # Minimum score (0.0-1.0)
    max_errors: int = 2,       # Max edit distance (0-3)
    max_results: int = 50      # Max results to return
)

Methods

  • add(records) - Add a dict or list of dicts
  • search(query) - Search and return list of SearchResult
  • remove(id) - Remove record by ID
  • reset() - Clear all records

Properties

  • count - Number of records
  • threshold - Get/set threshold
  • max_errors - Get/set max errors
  • max_results - Get/set max results

SearchResult

  • id: int - Record ID
  • score: float - Match score (0.0-1.0)
  • start: int - Match start position
  • end: int - Match end position

Advanced Examples

E-commerce Product Search

from flash_fuzzy import FlashFuzzy
from dataclasses import dataclass

@dataclass
class Product:
    id: int
    name: str
    brand: str
    category: str

class ProductSearch:
    def __init__(self):
        self.ff = FlashFuzzy(threshold=0.3, max_errors=2, max_results=20)
        self.products = {}

    def index_product(self, product: Product):
        self.products[product.id] = product
        search_text = f"{product.name} {product.brand} {product.category}"
        self.ff.add({"id": product.id, "text": search_text})

    def search(self, query: str) -> list[Product]:
        results = self.ff.search(query)
        return [self.products[r.id] for r in results if r.id in self.products]

# Usage
search = ProductSearch()
search.index_product(Product(1, "MacBook Pro 16", "Apple", "Laptops"))
search.index_product(Product(2, "ThinkPad X1", "Lenovo", "Laptops"))

matches = search.search("macbok")  # typo
for product in matches:
    print(f"{product.name} by {product.brand}")

Django Integration

from flash_fuzzy import FlashFuzzy
from django.core.cache import cache

class SearchService:
    def __init__(self):
        self.ff = FlashFuzzy()
        self._load_from_cache()

    def index_model(self, queryset, text_field='name'):
        for obj in queryset:
            self.ff.add({
                "id": obj.pk,
                "text": getattr(obj, text_field)
            })
        self._save_to_cache()

    def search(self, query: str):
        results = self.ff.search(query)
        return [r.id for r in results]

    def _save_to_cache(self):
        # Save search index to cache
        cache.set('search_index', self.ff, timeout=3600)

    def _load_from_cache(self):
        cached = cache.get('search_index')
        if cached:
            self.ff = cached

FastAPI Endpoint

from fastapi import FastAPI, Query
from flash_fuzzy import FlashFuzzy
from pydantic import BaseModel

app = FastAPI()
search_engine = FlashFuzzy(threshold=0.25, max_errors=2)

class SearchResult(BaseModel):
    id: int
    score: float

@app.on_event("startup")
async def load_data():
    # Load your data
    products = [
        {"id": 1, "text": "Wireless Keyboard"},
        {"id": 2, "text": "USB Mouse"},
        {"id": 3, "text": "HDMI Cable"},
    ]
    search_engine.add(products)

@app.get("/search", response_model=list[SearchResult])
async def search(q: str = Query(..., min_length=2)):
    results = search_engine.search(q)
    return [
        SearchResult(id=r.id, score=r.score)
        for r in results
    ]

Async/Await with asyncio

import asyncio
from flash_fuzzy import FlashFuzzy
from concurrent.futures import ThreadPoolExecutor

class AsyncSearchEngine:
    def __init__(self):
        self.ff = FlashFuzzy()
        self.executor = ThreadPoolExecutor(max_workers=4)

    async def search_async(self, query: str):
        loop = asyncio.get_event_loop()
        results = await loop.run_in_executor(
            self.executor,
            self.ff.search,
            query
        )
        return results

# Usage
async def main():
    engine = AsyncSearchEngine()
    engine.ff.add({"id": 1, "text": "Python Programming"})
    engine.ff.add({"id": 2, "text": "Rust Programming"})

    results = await engine.search_async("pythn")  # typo
    for r in results:
        print(f"ID: {r.id}, Score: {r.score}")

asyncio.run(main())

Performance

  • Search: < 1ms for 10,000 records
  • Indexing: O(n) where n = text length
  • Memory: ~1KB per record
  • Throughput: ~100,000 searches/second

Bloom filter pre-filtering provides O(1) rejection of non-matches before running expensive fuzzy matching.

Platform Support

Platform Status
Linux (x86_64, ARM64) ✅ Supported
macOS (x86_64, Apple Silicon) ✅ Supported
Windows (x86_64) ✅ Supported

Pre-built wheels available for all major platforms.

Links

License

MIT - see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_fuzzy-0.1.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_fuzzy-0.1.0-cp313-cp313-win_amd64.whl (136.4 kB view details)

Uploaded CPython 3.13Windows x86-64

File details

Details for the file flash_fuzzy-0.1.0.tar.gz.

File metadata

  • Download URL: flash_fuzzy-0.1.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for flash_fuzzy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b5f493b7f87c897cc1b37ab3912c1a906e07e414a24f3716dc6f0749faaae8e5
MD5 0b013052e22d249a3724b35207d05b24
BLAKE2b-256 22b54c94bda7b80eb97772b99411634632364961a5c352217acbeda817ca9a60

See more details on using hashes here.

File details

Details for the file flash_fuzzy-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for flash_fuzzy-0.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 1e960055063efd965b320e0e2862ba993e96de452384212c0f0a0c64ca9e803d
MD5 a412094cb14922d6e4a8d2b445e89df8
BLAKE2b-256 827b256b84bf9440a5901a2460b0c2dc0a67020d10130f9d6567550a6612dc4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page