An embeddable, in-process search engine written in Rust

These details have not been verified by PyPI

Project description

lucisearch

The SQLite of Search — an embeddable, in-process search engine.

No cluster to manage. No HTTP layer. No JVM. pip install and search.

pip install lucisearch

Quick Start

import luci

# Create an index with field mappings
index = luci.Index.create("products.luci", {
    "properties": {
        "title": {"type": "text"},
        "description": {"type": "text"},
        "category": {"type": "keyword"},
        "price": {"type": "float"},
        "in_stock": {"type": "boolean"},
    }
})

# Index documents
index.bulk([
    {"title": "Wireless Headphones", "description": "Noise-cancelling bluetooth headphones", "category": "electronics", "price": 79.99, "in_stock": True},
    {"title": "Running Shoes", "description": "Lightweight trail running shoes", "category": "sports", "price": 129.99, "in_stock": True},
    {"title": "Coffee Maker", "description": "Programmable drip coffee maker", "category": "kitchen", "price": 49.99, "in_stock": False},
])

# Search — dict (ES-style)
results = index.search({"match": {"title": "headphones"}}, 10)
for hit in results.hits:
    print(f'{hit.score:.2f}  {hit.source["title"]}')

# Search — typed (Pydantic-validated, IDE-completed)
from luci import MatchQuery
results = index.search(MatchQuery(field="title", query="headphones"), 10)

Typed query API

Every query, aggregation, sort, and search-level option has a typed Pydantic model. Construct queries with & / | / ~ operators and build search requests fluently. The typed forms validate at construction time (typos and wrong field types fail before the engine sees them) and serialize 1:1 to the dict form, so the two APIs are interchangeable.

from luci import (
    MatchQuery, TermQuery, RangeQuery, BoolQuery, KnnQuery,
    FusionQuery, AvgAgg, TermsAgg, SearchExpression, Sort,
)

# Boolean composition with operators
q = (
    MatchQuery(field="title", query="running shoes")
    & TermQuery(field="category", value="sports")
    & RangeQuery(field="price", lte=100)
)

# Equivalent explicit BoolQuery — operators desugar to all-must
q = BoolQuery(
    must=[
        MatchQuery(field="title", query="running shoes"),
        TermQuery(field="category", value="sports"),
        RangeQuery(field="price", lte=100),
    ],
)

# When you want filter (non-scoring) clauses, use BoolQuery directly:
q = BoolQuery(
    must=[MatchQuery(field="title", query="running shoes")],
    filter=[
        TermQuery(field="category", value="sports"),
        RangeQuery(field="price", lte=100),
    ],
)

# Negation
q = MatchQuery(field="title", query="shoes") & ~TermQuery(field="category", value="kids")

# Fluent SearchExpression builder
search = (
    SearchExpression()
    .with_query(q)
    .with_size(20)
    .with_sort("price", order="asc")
    .with_agg("avg_price", AvgAgg(field="price"))
    .with_agg("by_category", TermsAgg(field="category", size=10))
)
results = index.search(search)

# Or direct construction
search = SearchExpression(
    query=q,
    size=20,
    sort=[Sort(field="price")],
    aggs={"avg_price": AvgAgg(field="price")},
)

Hybrid (text + vector) search uses FusionQuery explicitly — kNN is just another ScoringExpression:

hybrid = FusionQuery(
    sources=[
        MatchQuery(field="title", query="wireless headphones"),
        KnnQuery(field="embedding", query_vector=qv, k=50),
    ],
    method="rrf",
)
results = index.search(hybrid, 10)

Queries

Luci supports the Elasticsearch query DSL. Pass any query as a Python dict, or use the typed equivalents shown above.

Full-text search

# Single field
index.search({"match": {"title": "running shoes"}}, 10)

# Multiple fields
index.search({"multi_match": {"query": "wireless", "fields": ["title", "description"]}}, 10)

# Exact phrase
index.search({"match_phrase": {"description": "trail running"}}, 10)

Filtering and boolean logic

# Term query (exact match on keyword fields)
index.search({"term": {"category": "electronics"}}, 10)

# Bool query — combine must, should, must_not, filter
index.search({
    "bool": {
        "must": [{"match": {"title": "shoes"}}],
        "should": [{"term": {"brand": "nike"}}, {"term": {"brand": "adidas"}}],
        "filter": [
            {"term": {"in_stock": True}},
            {"range": {"price": {"lte": 100}}},
        ],
        "minimum_should_match": 1,
    }
}, 10)

# Prefix, wildcard, regexp, fuzzy
index.search({"prefix": {"category": "elec"}}, 10)
index.search({"fuzzy": {"title": {"value": "headphoens", "fuzziness": 1}}}, 10)

Sorting and pagination

# Sort by field
results = index.search({
    "query": {"match_all": {}},
    "sort": [{"price": "asc"}],
    "size": 10
})

# Pagination with from/size
results = index.search({
    "query": {"match_all": {}},
    "sort": ["price"],
    "from": 20,
    "size": 10
})

# Cursor-based pagination with search_after
results = index.search({
    "query": {"match_all": {}},
    "sort": ["price"],
    "size": 10,
    "search_after": [49.99]
})

Aggregations

# Terms aggregation
results = index.search({
    "query": {"match_all": {}},
    "aggs": {"categories": {"terms": {"field": "category"}}},
    "size": 0
})
for bucket in results.aggregations["categories"]["buckets"]:
    print(f'{bucket["key"]}: {bucket["doc_count"]}')

# Metric aggregations
results = index.search({
    "query": {"match_all": {}},
    "aggs": {
        "avg_price": {"avg": {"field": "price"}},
        "price_stats": {"stats": {"field": "price"}},
        "price_ranges": {"range": {
            "field": "price",
            "ranges": [{"to": 50}, {"from": 50, "to": 100}, {"from": 100}]
        }},
    },
    "size": 0
})

# Nested aggregations
results = index.search({
    "query": {"match_all": {}},
    "aggs": {"by_category": {
        "terms": {"field": "category"},
        "aggs": {"avg_price": {"avg": {"field": "price"}}},
    }},
    "size": 0
})

Highlighting

Highlighting is a lazy per-hit method, not a request-body parameter. Call hit.highlight(field) to get a list of structured Highlight spans (text, start, end) — the consumer chooses how to render them.

results = index.search({"query": {"match": {"description": "coffee"}}})
for hit in results.hits:
    for span in hit.highlight("description"):
        print(f"matched {span.text!r} at {span.start}..{span.end}")

Vector search (kNN)

# Create index with vector field
index = luci.Index.create("vectors.luci", {
    "properties": {
        "title": {"type": "text"},
        "embedding": {"type": "dense_vector", "dims": 384},
    }
})

# kNN search
results = index.search({
    "query": {"knn": {
        "field": "embedding",
        "query_vector": [0.1, 0.2, ...],  # 384-dim vector
        "k": 10,
    }}
}, 10)

# kNN with similarity threshold
results = index.search({
    "query": {"knn": {
        "field": "embedding",
        "query_vector": query_vector,
        "k": 50,
        "threshold": 0.7,  # exclude low-similarity results
    }}
}, 10)

# kNN inside bool (vector as filter)
results = index.search({
    "query": {"bool": {
        "must": [{"match": {"title": "headphones"}}],
        "filter": [{"knn": {
            "field": "embedding",
            "query_vector": query_vector,
            "k": 100,
        }}],
    }}
}, 10)

Hybrid search (RRF fusion)

# Reciprocal Rank Fusion — combine text + vector results
results = index.search({
    "query": {"fusion": {
        "sources": [
            {"match": {"title": "wireless headphones"}},
            {"knn": {
                "field": "embedding",
                "query_vector": query_vector,
                "k": 50,
            }},
        ],
        "method": "rrf",  # or "sum", "arithmetic_mean"
    }}
}, 10)

# Weighted fusion with 3 sources
results = index.search({
    "query": {"fusion": {
        "sources": [
            {"match": {"title": "headphones"}},
            {"term": {"brand": "sony"}},
            {"knn": {"field": "embedding", "query_vector": qv, "k": 50}},
        ],
        "method": "rrf",
        "weights": [1.0, 0.5, 2.0],
        "rank_window_size": 100,
    }}
}, 10)

Geospatial queries

# Create index with geo fields
index = luci.Index.create("places.luci", {
    "properties": {
        "name": {"type": "text"},
        "location": {"type": "geo_point"},
    }
})

# Geo distance
index.search({
    "geo_distance": {
        "distance": "10km",
        "location": {"lat": 40.7128, "lon": -74.0060}
    }
}, 10)

# Geo bounding box
index.search({
    "geo_bounding_box": {
        "location": {
            "top_left": {"lat": 41.0, "lon": -74.5},
            "bottom_right": {"lat": 40.5, "lon": -73.5}
        }
    }
}, 10)

Document CRUD

# Add with explicit ID
index.add({"_id": "prod-1", "title": "Widget", "price": 9.99})

# Get by ID
doc = index.get("prod-1")

# Update (partial merge)
index.update("prod-1", {"price": 7.99})

# Delete by ID
index.delete("prod-1")

# Delete by query
index.delete_by_query({"term": {"category": "discontinued"}})

# Count
count = index.count({"term": {"in_stock": True}})

Transactions

By default, add() and bulk() auto-commit after every call. For batch operations with atomic commit/rollback semantics, use a transaction:

# Sync transaction
with index.transaction() as txn:
    txn.add({"title": "doc 1", "category": "tech"})
    txn.add({"title": "doc 2", "category": "science"})
    # commits on clean exit, rolls back on exception

# Async transaction (for asyncio)
async with index.async_transaction() as txn:
    txn.add({"title": "doc 3"})
    txn.add({"title": "doc 4"})

While a transaction is open, add() and bulk() from other threads block until the transaction completes.

Multi-process Safety

Multiple processes can read the same .luci file concurrently. Writes are serialized via a cross-process lock — the second writer blocks until the first finishes (with a configurable timeout).

# Set write lock timeout (default: 5 seconds)
index = luci.Index.create("shared.luci", write_timeout=10.0)

# Change mid-session
index.set_write_timeout(2.0)

# Per-operation override
index.add(doc, write_timeout=1.0)
index.bulk(docs, write_timeout=30.0)

Field Types

Type	Description
`text`	Full-text search with BM25 scoring and analysis
`keyword`	Exact match, sorting, aggregations
`integer`, `long`	Signed integers
`float`, `double`	Floating point numbers
`boolean`	`true` / `false`
`date`	Date/time values
`dense_vector`	Fixed-dimension float vectors (cosine, L2, dot product; int8 quantization)
`geo_point`	Latitude/longitude pairs
`geo_shape`	Polygons, multipolygons with spatial relations
`nested`	Arrays of objects with independent field scoping

Features

Full-text search with BM25 scoring, analyzers, phrase queries, fuzzy matching
Vector search with HNSW, int8 quantization, pre-filtering
Hybrid search with Reciprocal Rank Fusion (RRF)
20+ aggregation types — terms, avg, sum, min, max, stats, range, histogram, cardinality, percentiles, date_histogram, geo_bounds, filters, nested, and more
Geospatial — geo_distance, geo_bounding_box, geo_shape with all spatial relations
Nested documents with block-join queries and inner_hits
Highlighting with custom tags, per-field configuration
Sort by field — keyword, numeric, score, with multi-level sort
Pagination — from/size and cursor-based search_after
Collapse — deduplicate results by a keyword field
Explain — BM25 score breakdowns
Rescore — two-phase scoring with custom query weights
Single-file storage — one .luci file, no directory sprawl
Auto-commit — documents are searchable immediately after add() or bulk()
Transactions — batch writes with atomic commit/rollback (sync and async)
Multi-process safe — cross-process file locking with configurable timeout
ES-compatible JSON query DSL — same queries, same field types

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.7.1

Apr 25, 2026

0.7.0

Apr 25, 2026

0.6.2

Apr 12, 2026

0.6.1

Apr 11, 2026

0.6.0

Apr 8, 2026

0.5.0

Apr 4, 2026

0.4.1

Apr 3, 2026

0.4.0

Apr 3, 2026

0.3.0

Apr 2, 2026

0.2.1

Mar 31, 2026

0.2.0

Mar 31, 2026

0.1.1

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lucisearch-0.7.1-cp311-abi3-macosx_11_0_arm64.whl (2.2 MB view details)

Uploaded Apr 25, 2026 CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file lucisearch-0.7.1-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: lucisearch-0.7.1-cp311-abi3-macosx_11_0_arm64.whl
Upload date: Apr 25, 2026
Size: 2.2 MB
Tags: CPython 3.11+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lucisearch-0.7.1-cp311-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`618a1c4beff63c26d2d5c894de1299234de56412e2206a4a19e1c9ca154a3f38`
MD5	`7ebb7a4f23eb8431b75a036ca5fdb74a`
BLAKE2b-256	`e980aa56d9c73bc7a73834972a59d9dc96823b4dc21793961429aa06f1acf656`

See more details on using hashes here.

lucisearch 0.7.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers