search through files with fts5, vectors and get reranked results. Fast

These details have not been verified by PyPI

Project links

Project description

litesearch

NB Reading this on GitHub? The formatted documentation is nicer.

litesearch puts full-text search + SIMD vector search in a single SQLite database with automatic Reciprocal Rank Fusion (RRF) reranking — no server, no new infra, no heavy dependencies.

Module	What you get
`litesearch` (core)	`database()` · `get_store()` · `db.search()` · `rrf_merge()` · `vec_search()`
`litesearch.data`	PDF extraction & chunking (`pdf_chunks`) · multi-format file parsing (`file_parse`) · code indexing (`pkg2chunks` · `dir2chunks`) · `images_to_pdf` · FTS query preprocessing
`litesearch.utils`	ONNX encoders: `FastEncode` (text) · `FastEncodeImage` (vision) · `FastEncodeMultimodal` (joint text+image)

Install

# usearch SQLite extensions are configured automatically on first import
# (macOS needs one extra step — see litesearch.postfix)
!uv add litesearch

Quick Start

Search your documents in eight lines of code:

from litesearch import *
from model2vec import StaticModel
import numpy as np

enc   = StaticModel.from_pretrained("minishlab/potion-retrieval-32M")  # fast static embeddings
db    = database()          # SQLite + usearch SIMD extensions loaded
store = db.get_store()      # table with FTS5 index + embedding column

texts = ["attention is all you need",
         "transformers replaced recurrent networks",
         "gradient descent minimises the loss"]
embs  = enc.encode(texts)   # float32, shape (3, 512)
store.insert_all([dict(content=t, embedding=e.ravel().tobytes()) for t, e in zip(texts, embs)])

q = "self-attention mechanism"
db.search(q, enc.encode([q]).ravel().tobytes(), columns=['id','content'], dtype=np.float32, quote=True)

/Users/71293/code/litesearch/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

[{'rowid': 1,
  'id': 1,
  'content': 'attention is all you need',
  '_dist': 0.7910182476043701,
  '_rrf_score': 0.016666666666666666},
 {'rowid': 3,
  'id': 3,
  'content': 'gradient descent minimises the loss',
  '_dist': 0.9670860767364502,
  '_rrf_score': 0.01639344262295082},
 {'rowid': 2,
  'id': 2,
  'content': 'transformers replaced recurrent networks',
  '_dist': 1.0227680206298828,
  '_rrf_score': 0.016129032258064516}]

[{'rowid': 1, 'id': 1, 'content': 'attention is all you need',
  '_dist': 0.134, '_rrf_score': 0.0328},
 {'rowid': 2, 'id': 2, 'content': 'transformers replaced recurrent networks',
  '_dist': 0.264, '_rrf_score': 0.0161},
 {'rowid': 3, 'id': 3, 'content': 'gradient descent minimises the loss',
  '_dist': 0.482, '_rrf_score': 0.0161}]

_rrf_score is the fused rank score (higher = better). _dist is the cosine distance from the vector search leg.

Core API

`database()` — SQLite + SIMD

database() returns a fastlite Database patched with usearch’s SIMD distance functions. Pass a file path for persistence; omit it for an in-memory store.

db = database()   # ':memory:' by default; use database('my.db') for persistence
db.q('select sqlite_version() as sqlite_version')

[{'sqlite_version': '3.52.0'}]

The usearch extension adds SIMD-accelerated distance functions directly into SQL. Four metrics are available: cosine, sqeuclidean, inner, and divergence. All variants support f32, f16, f64, and i8 suffixes.

vecs = dict(
    v1=np.ones((100,),  dtype=np.float32).tobytes(),   # ones
    v2=np.zeros((100,), dtype=np.float32).tobytes(),   # zeros
    v3=np.full((100,), 0.25, dtype=np.float32).tobytes()  # 0.25s (same direction as v1)
)
def dist_q(metric):
    return db.q(f'''
        select
            distance_{metric}_f32(:v1,:v2) as {metric}_v1_v2,
            distance_{metric}_f32(:v1,:v3) as {metric}_v1_v3,
            distance_{metric}_f32(:v2,:v3) as {metric}_v2_v3
    ''', vecs)

for fn in ['sqeuclidean', 'divergence', 'inner', 'cosine']: print(dist_q(fn))

[{'sqeuclidean_v1_v2': 100.0, 'sqeuclidean_v1_v3': 56.25, 'sqeuclidean_v2_v3': 6.25}]
[{'divergence_v1_v2': 34.657352447509766, 'divergence_v1_v3': 12.046551704406738, 'divergence_v2_v3': 8.66433334350586}]
[{'inner_v1_v2': 1.0, 'inner_v1_v3': -24.0, 'inner_v2_v3': 1.0}]
[{'cosine_v1_v2': 1.0, 'cosine_v1_v3': 0.0, 'cosine_v2_v3': 1.0}]

Cosine distance between v1 (ones) and v3 (0.25s) is 0.0 — they point in the same direction. Both inner and divergence are also available for different retrieval trade-offs.

`get_store()` — FTS5 + Embedding Table

db.get_store() creates (or opens) a table with a content TEXT column, an embedding BLOB column, a JSON metadata column, and an FTS5 full-text index that stays in sync automatically via triggers.

store = db.get_store()   # idempotent — safe to call multiple times
store.schema

'CREATE TABLE [store] (\n   [content] TEXT NOT NULL,\n   [embedding] BLOB,\n   [metadata] TEXT,\n   [uploaded_at] FLOAT DEFAULT CURRENT_TIMESTAMP,\n   [id] INTEGER PRIMARY KEY\n)'

Pass hash=True to use a content-addressed id (SHA-1 of the content). Useful for code search and deduplication — re-inserting the same content is a no-op:

code_store = db.get_store(name='code', hash=True)
code_store.insert_all([
    dict(content='hello world', embedding=np.ones( (100,), dtype=np.float16).tobytes()),
    dict(content='hi there', embedding=np.full( (100,), 0.5, dtype=np.float16).tobytes()),
    dict(content='goodbye now', embedding=np.zeros((100,), dtype=np.float16).tobytes()),
], upsert=True, hash_id='id')
code_store(select='id,content')

[{'id': '250ce2bffa97ab21fa9ab2922d19993454a0cf28', 'content': 'hello world'},
 {'id': 'c89f43361891bfab9290bcebf182fa5978f89700', 'content': 'hi there'},
 {'id': '882293d5e5c3d3e04e8e0c4f7c01efba904d0932', 'content': 'goodbye now'}]

`db.search()` — Hybrid FTS + Vector with RRF

db.search() runs both an FTS5 keyword query and a vector similarity search, then merges the ranked lists with Reciprocal Rank Fusion. Documents that appear in both lists get a score boost — the best of both worlds.

# Re-create a clean store for the search demo
db2  = database()
st2  = db2.get_store()

phrases = [
    "attention mechanisms in neural networks",
    "transformer architecture for sequence modelling",
    "stochastic gradient descent and learning rate schedules",
    "positional encoding and token embeddings",
    "dropout regularisation reduces overfitting",
]
# use float32 vectors (matching dtype= below)
vecs2 = [np.random.default_rng(i).random(64, dtype=np.float32) for i in range(len(phrases))]
st2.insert_all([dict(content=p, embedding=v.tobytes()) for p, v in zip(phrases, vecs2)])

<Table store (content, embedding, metadata, uploaded_at, id)>

q2 = "attention"
q_vec = np.random.default_rng(42).random(64, dtype=np.float32).tobytes()
db2.search(q2, q_vec, columns=['id','content'], dtype=np.float32)

[{'rowid': 1,
  'id': 1,
  'content': 'attention mechanisms in neural networks',
  'rank': -1.116174474454989,
  '_rrf_score': 0.032539682539682535},
 {'rowid': 3,
  'id': 3,
  'content': 'stochastic gradient descent and learning rate schedules',
  '_dist': 0.20330411195755005,
  '_rrf_score': 0.016666666666666666},
 {'rowid': 2,
  'id': 2,
  'content': 'transformer architecture for sequence modelling',
  '_dist': 0.23124444484710693,
  '_rrf_score': 0.01639344262295082},
 {'rowid': 5,
  'id': 5,
  'content': 'dropout regularisation reduces overfitting',
  '_dist': 0.23238885402679443,
  '_rrf_score': 0.016129032258064516},
 {'rowid': 4,
  'id': 4,
  'content': 'positional encoding and token embeddings',
  '_dist': 0.32342469692230225,
  '_rrf_score': 0.015625}]

Pass rrf=False to see the raw FTS and vector legs separately — handy for debugging relevance:

db2.search(q2, q_vec, columns=['id','content'], dtype=np.float32, rrf=False)

{'fts': [{'id': 1,
   'content': 'attention mechanisms in neural networks',
   'rank': -1.116174474454989}],
 'vec': [{'id': 3,
   'content': 'stochastic gradient descent and learning rate schedules',
   '_dist': 0.20330411195755005},
  {'id': 2,
   'content': 'transformer architecture for sequence modelling',
   '_dist': 0.23124444484710693},
  {'id': 5,
   'content': 'dropout regularisation reduces overfitting',
   '_dist': 0.23238885402679443},
  {'id': 1,
   'content': 'attention mechanisms in neural networks',
   '_dist': 0.24136507511138916},
  {'id': 4,
   'content': 'positional encoding and token embeddings',
   '_dist': 0.32342469692230225}]}

Tip — dtype matters. Always pass the same dtype used when encoding. model2vec and most ONNX models return float32; pass dtype=np.float32. The default is float16 (matches FastEncode).

Tip — custom schemas. get_store() is a convenience. For custom schemas, call db.t['my_table'].vec_search(emb, ...) and rrf_merge(fts_results, vec_results) directly.

`litesearch.data`

Query Preprocessing

FTS5 is powerful, but raw natural-language queries often miss results. litesearch.data ships helpers to transform queries before sending them to FTS:

q = 'This is a sample query'
print('preprocessed q with defaults: `%s`' % pre(q))
print('keywords extracted: `%s`'          % pre(q, wc=False, wide=False))
print('q with wild card: `%s`'            % pre(q, extract_kw=False, wide=False, wc=True))

preprocessed q with defaults: `sample* OR query*`
keywords extracted: `sample query`
q with wild card: `This* is* a* sample* query*`

Function	What it does
`clean(q)`	strips `*` and returns `None` for empty queries
`add_wc(q)`	appends `*` to each word for prefix matching
`mk_wider(q)`	joins words with `OR` for broader matching
`kw(q)`	extracts keywords via YAKE (removes stop-words)
`pre(q)`	applies all of the above in one call

PDF Extraction

litesearch.data patches pdf_oxide.PdfDocument with bulk page-extraction methods. All methods take optional st / end page indices and return a fastcore L list:

Method	Returns
`doc.pdf_texts(st, end)`	plain text per page
`doc.pdf_markdown(st, end)`	markdown with headings + tables detected
`doc.pdf_links(st, end)`	URI strings extracted from annotations
`doc.pdf_tables(st, end)`	structured rows / cells / bbox dicts
`doc.pdf_spans(st, end)`	text spans with font size, weight, bbox
`doc.pdf_images(st, end, output_dir)`	image metadata, or save to disk
`doc.pdf_chunks(st, end)`	`(page, chunk_idx, text)` triples, markdown-chunked via chonkie

images_to_pdf(imgs, output) goes the other direction — wraps a list of images (PIL Images, bytes, or paths) into a conformant multi-page image-only PDF with no external dependencies.

doc = PdfDocument('pdfs/attention_is_all_you_need.pdf')
print(f'{doc.page_count()} pages, {len(doc.pdf_links())} links')

# plain text of page 1
doc.pdf_texts(0, 1)[0][:300]

15 pages, 18 links

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\n\n\nAttention Is All You Need\n\n\n∗\n∗\n∗\n∗\nAshish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit\nGoogle Brain Google Brain Google'

15 pages, 44 links

'Abstract\nThe dominant sequence transduction models are based on complex recurrent...'

# markdown export — headings and tables are detected automatically
md = doc.pdf_markdown()
print(f'Page 1 (markdown):\n{md[0][:400]}')

Page 1 (markdown):
# arXiv:1706.03762v7  [cs.CL]  2 Aug 2023

Provided proper attribution is provided, Google hereby grants permission to reproduce the tables and figures in this paper solely for use in journalistic or scholarly works.

## Attention Is

## All

## You Need

∗∗**Ashish Vaswani****Noam Shazeer****Niki Parmar** Google BrainGoogle BrainGoogle Research [avaswani@google.com](mailto:avaswani@google.com)[no

Page 1 (markdown):
# arXiv:1706.03762v7  [cs.CL]  2 Aug 2023

Provided proper attribution is provided, Google hereby grants permission
to reproduce the tables and figures in this paper solely for use in
journalistic or scholarly works...

doc.pdf_chunks() wraps pdf_markdown() + chonkie’s RecursiveChunker into (page, chunk_idx, text) triples — the direct input for encode_pdf_texts:

doc = PdfDocument('pdfs/attention_is_all_you_need.pdf')
chunks = doc.pdf_chunks()
print(f'{len(chunks)} chunks from {doc.page_count()} pages')
# 31 chunks from 15 pages

# (page, chunk_idx, text) triples — direct input for encode_pdf_texts
pg, ci, text = chunks[0]
print(f'page {pg}, chunk {ci}: {text[:80]}...')

Code & File Ingestion

pyparse splits a Python file or string into top-level code chunks (functions, classes, assignments) with source location metadata — ready to insert into a store:

txt = """
from fastcore.all import *
a=1
class SomeClass:
    def __init__(self,x): store_attr()
    def method(self): return self.x + a
"""
pyparse(code=txt)

[{'content': 'a=1', 'metadata': {'path': None, 'uploaded_at': None, 'name': None, 'type': 'Assign', 'lineno': 3, 'end_lineno': 3}}, {'content': 'class SomeClass:\n    def __init__(self,x): store_attr()\n    def method(self): return self.x + a', 'metadata': {'path': None, 'uploaded_at': None, 'name': 'SomeClass', 'type': 'ClassDef', 'lineno': 4, 'end_lineno': 6}}]

pkg2chunks indexes an entire installed package in one call — great for building a semantic code-search store over your dependencies:

chunks = pkg2chunks('fastlite')
print(f'{len(chunks)} chunks from fastlite')
chunks.filter(lambda d: d['metadata']['type'] == 'FunctionDef')[0]

51 chunks from fastlite

{'content': 'def t(self:Database): return _TablesGetter(self)',
 'metadata': {'path': '/Users/71293/code/litesearch/.venv/lib/python3.13/site-packages/fastlite/core.py',
  'uploaded_at': 1771806134.9519145,
  'name': 't',
  'type': 'FunctionDef',
  'lineno': 44,
  'end_lineno': 44,
  'package': 'fastlite',
  'version': '0.2.4'}}

47 chunks from fastlite

{'content': 'def t(self:Database): return _TablesGetter(self)',
 'metadata': {'path': '.../fastlite/core.py',
              'name': 't', 'type': 'FunctionDef',
              'lineno': 44, 'end_lineno': 44,
              'package': 'fastlite', 'version': '0.2.4'}}

file_parse is the single entry point for any file type — Python, Jupyter notebooks, PDF, Markdown, plain text, and compiled-language source files (JS/TS, Go, Java, Rust…). All return the same {content, metadata} dicts:

# Python → AST-parsed functions and classes
file_parse(Path('litesearch/core.py'))[:2]

# Jupyter notebook → one dict per cell
file_parse(Path('nbs/01_core.ipynb'))[:2]

# PDF → markdown-chunked text (via pdf_chunks)
file_parse(Path('pdfs/attention_is_all_you_need.pdf'))[:2]

dir2chunks indexes every file in a directory tree — analogous to pkg2chunks but for arbitrary directories rather than installed packages:

# Index all Python source files in a directory
chunks = dir2chunks('litesearch', types='py')
print(f'{len(chunks)} chunks from litesearch/')

# Mix formats: notebooks, markdown, PDFs
chunks = dir2chunks('nbs', types='ipynb,md,pdf')
print(f'{len(chunks)} chunks from nbs/')

`litesearch.utils`

`FastEncode` — ONNX Text Encoder

FastEncode wraps any ONNX model from HuggingFace Hub. It handles tokenisation, batching, optional parallel thread-pool execution, and runtime int8 quantization — all without PyTorch or Transformers.

Config	Model	Dim	Notes
`embedding_gemma` (default)	`onnx-community/embeddinggemma-300m-ONNX`	768	Strong retrieval, ~300M params
`modernbert`	`nomic-ai/modernbert-embed-base`	768	BERT-style, fast
`nomic_text_v15`	`nomic-ai/nomic-embed-text-v1.5`	768	Shares embedding space with `nomic_vision_v15`

encode_document and encode_query apply the model’s prompt templates automatically.

texts = [
    'Attention is all you need',
    'The transformer architecture uses self-attention',
    'BERT pretrains on masked language modeling',
    'GPT uses autoregressive generation',
]

# Default model — downloads once, cached
enc      = FastEncode()
doc_embs = enc.encode_document(texts)
q_emb    = enc.encode_query(['what paper introduced transformers?'])
print('doc shape:', doc_embs.shape, 'dtype:', doc_embs.dtype)  # (4, 768) float16

# Batching + parallel thread-pool
enc_fast = FastEncode(batch_size=2, parallel=2)
embs     = enc_fast.encode_document(texts)

# Runtime int8 quantization — creates model_int8.onnx on first run, reused after
enc_q = FastEncode(quantize='int8')
embs  = enc_q.encode_document(texts)

doc shape: (4, 768) dtype: float16
Encoding setup errored out with exception: No module named 'onnx'
ONNX session not initialized. Fix error during initialisation

doc shape: (2, 768) dtype: float16

`FastEncodeImage` — ONNX Image Encoder

FastEncodeImage encodes images with CLIP-style ONNX vision models. No Transformers dependency — preprocessing (resize → normalise → CHW) is done with PIL + NumPy using config stored in the model dict.

Config	Model	Dim	Notes
`nomic_vision_v15` (default)	`nomic-ai/nomic-embed-vision-v1.5`	768	Same space as `nomic_text_v15`
`clip_vit_b32`	`Qdrant/clip-ViT-B-32-vision`	512	Classic CLIP

Accepts PIL Images, file paths, or raw bytes — any mix.

`FastEncodeMultimodal` — Cross-Modal Image + Text Search

FastEncodeMultimodal wraps a model repo that ships both text and vision ONNX encoders in a single shared embedding space — a text query can retrieve images directly. Below: index Attention Is All You Need (text chunks + figures) then search for 'attention mechanism diagram'.

Unified model — siglip2_so400m (~800 MB, one download):

import json, base64, io
from PIL import Image
from IPython.display import display

enc = FastEncodeMultimodal(siglip2_so400m)   # single unified model, ~800 MB, cached on first run
doc = PdfDocument('pdfs/attention_is_all_you_need.pdf')
db  = database()
ts, ims = db.get_store('texts'), db.get_store('images')

for pg, ci, chunk, emb in encode_pdf_texts(doc, enc.text):
    ts.insert(dict(content=chunk, embedding=emb.tobytes(), metadata=json.dumps({'page': pg})))
for pg, img_bytes, emb in encode_pdf_images(doc, enc.vision):
    ims.insert(dict(content=f'page_{pg}', embedding=emb.tobytes(),
                    metadata=json.dumps({'page': pg, 'data': base64.b64encode(img_bytes).decode()})))

q = 'attention mechanism diagram'
q_emb = enc.text.encode([q])[0].tobytes()
txt_r = ts.db.search(pre(q), q_emb, table_name='texts', columns=['content']) or []
img_r = ims.vec_search(q_emb)
for r in rrf_merge(txt_r, img_r)[:6]:
    print(f"rrf={r['_rrf_score']:.4f}  {r['content'][:70]}")
    meta = json.loads(r.get('metadata', '{}'))
    if 'data' in meta:
        display(Image.open(io.BytesIO(base64.b64decode(meta['data']))).resize((200, 150)))

Paired models — nomic_text_v15 + nomic_vision_v15 share the same 768-dim space; use FastEncode and FastEncodeImage separately:

enc_text = FastEncode(nomic_text_v15)
enc_img  = FastEncodeImage(nomic_vision_v15)
db2  = database()
ts2, ims2 = db2.get_store('texts'), db2.get_store('images')

for pg, ci, chunk, emb in encode_pdf_texts(doc, enc_text):
    ts2.insert(dict(content=chunk, embedding=emb.tobytes(), metadata=json.dumps({'page': pg})))
for pg, img_bytes, emb in encode_pdf_images(doc, enc_img):
    ims2.insert(dict(content=f'page_{pg}', embedding=emb.tobytes(),
                     metadata=json.dumps({'page': pg, 'data': base64.b64encode(img_bytes).decode()})))

q_emb2 = enc_text.encode([q])[0].tobytes()
txt_r2 = ts2.db.search(pre(q), q_emb2, table_name='texts', columns=['content']) or []
img_r2 = ims2.vec_search(q_emb2)
for r in rrf_merge(txt_r2, img_r2)[:6]:
    print(f"rrf={r['_rrf_score']:.4f}  {r['content'][:70]}")
    meta = json.loads(r.get('metadata', '{}'))
    if 'data' in meta:
        display(Image.open(io.BytesIO(base64.b64decode(meta['data']))).resize((200, 150)))

rrf=0.0167  Self-attention, sometimes called intra-attention is an attention mecha
rrf=0.0167  page_3

rrf=0.0164  Attention mechanisms have become an integral part of compelling sequen
rrf=0.0164  page_2

rrf=0.0161  2,[19]. Inall but a few cases27],[ however, such attention mechanisms
rrf=0.0161  page_3

rrf=0.0167  Self-attention, sometimes called intra-attention is an attention mecha
rrf=0.0167  page_3

rrf=0.0164  Attention mechanisms have become an integral part of compelling sequen
rrf=0.0164  page_2

rrf=0.0161  2,[19]. Inall but a few cases27],[ however, such attention mechanisms
rrf=0.0161  page_3

Ideas for More Delight (Planned)

Things that would make litesearch even smoother to use:

Idea	Why it helps
`Retriever` class — bundles encoder + db into `r.search(q)`	removes the manual encode → bytes → search boilerplate
`ingest(texts, encoder, store)` helper	one-liner for embed-and-insert loops
Auto dtype detection	`search()` could infer dtype from stored embedding size, removing the `dtype=np.float32` footgun
`from_pdf(path, encoder)` / `from_dir(dir, encoder)`	index a PDF or folder in one call
Rich / tabulate display for results	pretty-print search results in notebooks
Metadata filter sugar — `filters={'source': 'doc.pdf'}`	cleaner than writing raw SQL `where` strings
CLI — `litesearch index <dir>` / `litesearch search <q>`	quick ad-hoc search without writing Python

Next Steps

examples/01_simple_rag.ipynb — ingest a folder of PDFs, chunk with chonkie, rerank with FlashRank
examples/02_tool_use.ipynb — wire litesearch into an LLM tool-use loop
core docs — full API reference for database, get_store, search, rrf_merge, vec_search
data docs — PDF methods, pyparse, pkg2chunks, query preprocessing
utils docs — FastEncode, download_model, image tools

Acknowledgements

A big thank you to @yfedoseev for pdf-oxide, which powers the PDF extraction functionality in litesearch.data.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.23

Apr 21, 2026

0.0.22

Apr 20, 2026

This version

0.0.21

Apr 17, 2026

0.0.20

Apr 15, 2026

0.0.19

Apr 14, 2026

0.0.18

Apr 13, 2026

0.0.17

Apr 13, 2026

0.0.16

Apr 12, 2026

0.0.15

Apr 11, 2026

0.0.14

Mar 12, 2026

0.0.13

Mar 11, 2026

0.0.12

Mar 10, 2026

0.0.11

Jan 12, 2026

0.0.10

Jan 12, 2026

0.0.9

Dec 16, 2025

0.0.8

Dec 16, 2025

0.0.7

Dec 5, 2025

0.0.6

Dec 5, 2025

0.0.5

Nov 28, 2025

0.0.4

Nov 27, 2025

0.0.3

Nov 21, 2025

0.0.2

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litesearch-0.0.21.tar.gz (27.9 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litesearch-0.0.21-py3-none-any.whl (29.6 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file litesearch-0.0.21.tar.gz.

File metadata

Download URL: litesearch-0.0.21.tar.gz
Upload date: Apr 17, 2026
Size: 27.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for litesearch-0.0.21.tar.gz
Algorithm	Hash digest
SHA256	`8df91aeb90529090bd8ea2b56d6c75e8edfbd780fa1a8a1202356b7e619efb70`
MD5	`dd4189b73ce1a0b1a71720aabf28231c`
BLAKE2b-256	`6c12b5d4268ee4d60374a9be4ab4241cedc8a2625d4c0751ecd2bd2ebe9d65f3`

See more details on using hashes here.

File details

Details for the file litesearch-0.0.21-py3-none-any.whl.

File metadata

Download URL: litesearch-0.0.21-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 29.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for litesearch-0.0.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ad56a127162c7be735be743b2b87886ffb3972340305a95f51c13dc1f15a0c47`
MD5	`46a27ada1cefc99c8a858b4aafda82a6`
BLAKE2b-256	`cc348b8dee13a9a2bad5a8c354afca1ec08aec2a53fd75ca6d9b1eba8d7c0ffc`

See more details on using hashes here.

litesearch 0.0.21

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

litesearch

Install

Quick Start

Core API

database() — SQLite + SIMD

get_store() — FTS5 + Embedding Table

db.search() — Hybrid FTS + Vector with RRF

litesearch.data

Query Preprocessing

PDF Extraction

Code & File Ingestion

litesearch.utils

FastEncode — ONNX Text Encoder

FastEncodeImage — ONNX Image Encoder

FastEncodeMultimodal — Cross-Modal Image + Text Search

Ideas for More Delight (Planned)

Next Steps

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`database()` — SQLite + SIMD

`get_store()` — FTS5 + Embedding Table

`db.search()` — Hybrid FTS + Vector with RRF

`litesearch.data`

`litesearch.utils`

`FastEncode` — ONNX Text Encoder

`FastEncodeImage` — ONNX Image Encoder

`FastEncodeMultimodal` — Cross-Modal Image + Text Search