Skip to main content

brinicle is a C++ vector index engine (ANN library) optimized for disk-first, low-RAM similarity search.

Project description

Version 0.0.5 Python 3.12.x Apache-2.0 License

brinicle

brinicle is a C++ retrieval engine built around disk-first, low-RAM HNSW search.

It supports:

  • raw vector similarity search
  • structured item search
  • autocomplete/query suggestion search

Benchmark

brinicle is designed for constrained environments where loading the full index into RAM is not practical.

In a 256MB RAM / 1 CPU container on MNIST 60K vectors, the benchmark result was:

System Outcome
brinicle PASS
chroma PASS
qdrant OOMKilled
weaviate OOMKilled
milvus OOMKilled

On SIFT 1M vectors, using the same in-process deployment model as FAISS and hnswlib:

System Build (s) Recall@10 Avg latency (ms) QPS
faiss 237.282 0.96999 0.092 10857.43
hnswlib 241.301 0.96364 0.093 10711.86
brinicle 243.75 0.96989 0.103 9730.65

In brinicle's benchmark suite, it reaches latency competitive with FAISS and hnswlib while keeping the index disk-backed and memory usage low.

Memory usage comparison

See the benchmark: brinicle benchmark


brinicle is designed for constrained environments where loading a full index into RAM is not practical. It keeps the same simple lifecycle across all engines:

client.init(...)
client.ingest(...)
client.finalize()
client.search(...)

Features

  • Disk-first HNSW vector search
  • Low-RAM indexing and querying
  • Streaming-first ingest: one vector/item/suggestion at a time
  • Insert, upsert, delete, and compact rebuild
  • Raw vector search through VectorEngine
  • Structured item search through ItemSearchEngine
  • Autocomplete/query suggestion search through AutocompleteEngine
  • Custom scoring for lexical item search and autocomplete
  • Python bindings over a C++ core

Install

Install from PyPI:

pip install brinicle

Or build from source:

git clone https://github.com/bicardinal/brinicle.git
cd brinicle
bash build.sh

Engines

brinicle exposes three engines with the same lifecycle.

Engine Use case Input
VectorEngine Raw ANN vector search float32 vectors
ItemSearchEngine Structured catalog/item search title, category, subcategory, attributes
AutocompleteEngine Query/title suggestions suggestion text

All engines follow the same pattern:

client.init(mode="build")

for record in records:
    client.ingest(...)

client.finalize()

results = client.search(...)

Vector search

Use VectorEngine when you already have embeddings or numeric vectors.

import numpy as np
import brinicle

D = 2
n = 5

X = np.random.randn(n, D).astype(np.float32)
Q = np.random.randn(D).astype(np.float32)

engine = brinicle.VectorEngine(
    "vector_index",
    dim=D,
    delta_ratio=0.1,
)

engine.init(mode="build")

for eid in range(n):
    engine.ingest(str(eid), X[eid])

engine.finalize()

print(engine.search(Q, k=10))

search(...) returns a list of external ids:

["3", "1", "0"]

To return distances too:

print(engine.search_with_distance(Q, k=10))

Insert

Y = np.random.randn(5, D).astype(np.float32)

engine.init(mode="insert")

for eid in range(5):
    engine.ingest(str(eid) + "x", Y[eid])

engine.finalize()

print(engine.search(Q, k=10))

Upsert

Y = np.random.randn(5, D).astype(np.float32)

engine.init(mode="upsert")

for eid in range(5):
    engine.ingest(str(eid), Y[eid])

engine.finalize()

print(engine.search(Q, k=10))

Delete

engine.delete_items(["1", "4"])

print(engine.search(Q, k=10))

Rebuild / optimize

engine.optimize_graph()

print(engine.search(Q, k=10))

Item search

ItemSearchEngine searches structured catalog-like records without requiring a traditional inverted index.

Each item can contain:

  • title
  • category
  • subcategory
  • attributes

Only title is required. The other fields are optional.

Items are encoded internally into fixed-size numeric representations and searched through brinicle's HNSW graph using a structured lexical scorer.

import brinicle

engine = brinicle.ItemSearchEngine(
    "item_index",
    dim=96,
)

engine.init(mode="build")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB Natural Titanium",
    category="Electronics",
    subcategory="Smartphones",
    attributes={
        "brand": "Apple",
        "storage": "256GB",
        "color": "Natural Titanium",
    },
)

engine.ingest(
    external_id="p2",
    title="Samsung Galaxy S24 Ultra 512GB Black",
    category="Electronics",
    subcategory="Smartphones",
    attributes={
        "brand": "Samsung",
        "storage": "512GB",
        "color": "Black",
    },
)

engine.finalize()

print(engine.search("iphone 15 pro max", k=10))

To return distances:

print(engine.search_with_distance("iphone 15", k=10))

Example with structured query fields:

results = engine.search(
    "iphone 15",
    category="Electronics",
    subcategory="Smartphones",
    attributes={
        "brand": "Apple",
    },
    k=10,
)

What can Item Search be used for?

ItemSearchEngine is useful for structured catalog-like data such as:

  • products
  • movies
  • books
  • jobs
  • real estate listings
  • restaurants
  • games
  • records with titles and attributes

Item Search is not a neural embedding model. It uses structured symbolic encoding and a configurable scorer.


Autocomplete

AutocompleteEngine provides low-RAM autocomplete and query suggestion search using brinicle's HNSW infrastructure.

It can be used to index:

  • popular queries
  • item titles
  • category names
  • curated suggestions
import brinicle

ac = brinicle.AutocompleteEngine(
    "autocomplete_index",
    dim=48,
)

ac.init(mode="build")

ac.ingest("iphone 15 pro max", "iphone 15 pro max")
ac.ingest("iphone 15 case", "iphone 15 case")
ac.ingest("samsung s24 ultra", "samsung s24 ultra")

ac.finalize()

print(ac.search("iph", k=5))

AutocompleteEngine follows the same lifecycle as the other engines:

ac.init(mode="build")
ac.ingest(...)
ac.finalize()
ac.search(...)

The current autocomplete implementation is experimental and works best when query prefixes align well with encoded token prefixes.


Streaming-first ingest

brinicle does not require loading the full dataset into memory.

Ingest is intentionally one record at a time:

client.init(mode="build")

for item in stream_items():
    client.ingest(...)

client.finalize()

Users can stream data from:

  • JSONL files
  • databases
  • APIs
  • object storage
  • custom pipelines

brinicle does not assume that your dataset fits in RAM. Rare in modern software.


Configuration

brinicle exposes common HNSW parameters:

  • M
  • ef_construction
  • ef_search
  • delta_ratio

Example:

engine = brinicle.VectorEngine(
    "vector_index",
    dim=384,
    M=48,
    ef_construction=1024,
    ef_search=512,
    delta_ratio=0.1,
)

Item search also supports lexical scoring configuration.

cfg = brinicle.LexicalConfig()

cfg.search_title_weight = 0.60
cfg.search_category_weight = 0.15
cfg.search_subcategory_weight = 0.15

engine = brinicle.ItemSearchEngine(
    "item_index",
    dim=96,
    lexical_config=cfg,
)

Autocomplete also supports its own scoring configuration.

cfg = brinicle.AutocompleteConfig()

cfg.search_position_decay = 0.5
cfg.search_length_penalty = 0.2

ac = brinicle.AutocompleteEngine(
    "autocomplete_index",
    dim=48,
    autocomplete_config=cfg,
)

Index files

For an index path such as:

engine = brinicle.VectorEngine("my_index", dim=128)

brinicle stores index files beside that base path:

my_index.main
my_index.delta
my_index.lock

High-level engines such as ItemSearchEngine and AutocompleteEngine may also store metadata such as tokenizer and encoding information beside the index.


Which engine should I use?

Use VectorEngine if you already have embeddings or numeric vectors.

Use ItemSearchEngine if you have structured catalog-like data such as products, movies, books, jobs, listings, or records with titles and attributes.

Use AutocompleteEngine if you want low-RAM query or title suggestions.


Limitations

  • brinicle is not a full-text search engine.
  • Item Search is designed for structured catalog-like records, not long documents.
  • Item Search is symbolic/lexical, not neural semantic search.
  • Autocomplete is experimental.
  • Search quality depends on normalization, tokenizer behavior, and field structure.
  • Large updates may require graph optimization or compact rebuild.

Roadmap

  • High-level item search API
  • High-level autocomplete API
  • Metadata persistence for tokenizer and encoding config
  • More benchmarks for item search and autocomplete
  • Better prefix-aware autocomplete encoding
  • Improved documentation and examples

License

brinicle is licensed under the Apache License, Version 2.0.

See the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

brinicle-0.0.5-cp314-cp314t-musllinux_1_2_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

brinicle-0.0.5-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

brinicle-0.0.5-cp314-cp314-musllinux_1_2_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ x86-64

brinicle-0.0.5-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

brinicle-0.0.5-cp313-cp313-musllinux_1_2_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

brinicle-0.0.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

brinicle-0.0.5-cp312-cp312-musllinux_1_2_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

brinicle-0.0.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

brinicle-0.0.5-cp311-cp311-musllinux_1_2_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

brinicle-0.0.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

brinicle-0.0.5-cp310-cp310-musllinux_1_2_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

brinicle-0.0.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file brinicle-0.0.5-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 f00529865f38ddd998034f56f35521cce14f9ef4f4422fa08cd8893b50040abc
MD5 2189013ba5b4cd15b7a9e0322db3fc15
BLAKE2b-256 48de7a8f40a96d9a4ec9e8cc5c3e66771b39f35a9cf831b31a0fc05959a65b2c

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9b57add9f76bdfb4b69c1ae7de4b3845dd2128f151af90d691a0129a774aad64
MD5 461a9152cde390d80e869d2847175cc2
BLAKE2b-256 60fc888e7edea94e7098c40703f991033bb4b5331848036055c15d3c9af15c3c

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp314-cp314-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp314-cp314-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0fa26317595d89f13faf1a078e14c6333375656c647a2f65ae00cf1af592db30
MD5 6495ddae3f90566ca65180cf39cff4d9
BLAKE2b-256 3f885f3da3059af3404db1567a2b9f1a51609c848fc5b12fd53c8508d332c5e7

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5dc985267cb51291ec9a4fa93847da3104e34288ba0354a0ea432e5075840810
MD5 89e5fb0e1c4fd3803aa30ac15548cb8d
BLAKE2b-256 93bd685c1a6200d20b78d5ad4eb0c236912efaa052cad7ce416f889cd0c6437f

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c7995eaf61be969285703ff73eb0ceddefdd3990ba0f25f795000659edb56c48
MD5 2269e0e2521004acd9bc668d01e38424
BLAKE2b-256 dc8f6fac2d119de92cbbb32528bc92ff5e5d4e1053f5dd012701fca7a8089e56

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 62caf208ddaa3ddc254bad32319a8ed2f044e115f61575c292887fd2220e7ee3
MD5 c603880f828e4f3c94985fcc470e7536
BLAKE2b-256 9d7470292e06af2b80a217cebc2e79d73bf21981f4b5093564a97b4a645edb81

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 71b44edd7881c00e5caf1ff9a82c31a76a08051a44603167b19a25cea4b9a53d
MD5 1db3d6a5cf0f10b6b54ed8dc75e3958f
BLAKE2b-256 a6ee8b4cecb2c205e79cff6534d54375429819dd6fabaab5af0e50f30fab9829

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b06aa428f8d117ae259f50b1148c5a4cf0d73bc773867e326a04e1a0dd25e815
MD5 a0cdfc7423c9ba2bdc5317a14e3776e4
BLAKE2b-256 16bf75faf3087e67a041ff929adb647e5791a70080593e21ac93cfa8da4142df

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 b68bba588bdaa8283e203203d44e230258f1a2b355b6f3408c9c481d733ef369
MD5 0e65d7a29d4d4bec82af8a89f6cba5b1
BLAKE2b-256 fa999a4a2e6e9c284e604d6dd48db80fb1247451e0759644e329a0e3c94495cc

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3df4d8c60934e7b5474f662a9c3ed2c0fdb83214fdef23dc6012f6372d9b229f
MD5 8dfb070183c8f0116236942a2fe74fd1
BLAKE2b-256 a67f56b76bf86812b86bbad0abe8da832dc62c8f7725095007f28a0588da10c0

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d5f6cd1e308df66a0fd4ef9ba64a0e5ea429c7693128f663da65aeafbff484c5
MD5 a2cb72078ac9c5b13f81a7d3c9f1ecfb
BLAKE2b-256 376451d19a983852985b73fade82587edcab7064ae2f95ad47292b5436343dbf

See more details on using hashes here.

File details

Details for the file brinicle-0.0.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for brinicle-0.0.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7dc9bd7d4bfe2dd7996da78343cce1eaa96f30564f5b18f5dcfc2310e9c96920
MD5 ef53cc1834bfd464a537383adf2cb4fb
BLAKE2b-256 0c7ce82902413bff2382bfc96007fd333eb62dc145679abef650fa6fb55d68ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page