search through files with fts5, vectors and get reranked results. Fast

These details have not been verified by PyPI

Project links

Project description

litesearch

NB If you’re reading this in GitHub readme, I recommend you read the more nicely formatted documentation format of this tutorial.

Litesearch is a lightweight library to set up a fastlite database with FTS5 and vector search capabilities using usearch.

Litesearch uses usearch sqlite extensions to provide fast vector search capabilities and combines it with sqlite’s FTS5 capabilities to provide hybrid search. - Litesearch uses fastlite, which is a lightweight wrapper around SQLite that makes SQLite database management delightful. It uses apsw rather than sqlite3 and provides best practices OOB. - Usearch is a cross-language package which provides vector search capabilities. We’re using its sqlite extensions here to provide fast vector search capabilities.

Lite search provides a simple way to setup this database using the database() method. You get a store with FTS5 and vector search capabilities using the get_store() method and you can search through the contents using the search() method.

Litesearch also provides document and code manipulation tools as part of the data module and onnx based text encoders as part of the utils module. - litesearch extends pymupdf Document and Page classes to extract texts, images and links easily. - litesearch provides onnx based text encoders which can be used to generate embeddings for documents and queries. - litesearch provides a quick code parsing utility to parse python files into code chunks for ingestion.

Get Started

fastlite and usearch will be installed automatically with litesearch if you do not have it already.

!pip install litesearch -qq

ERROR: Could not find a version that satisfies the requirement fastlite (from litesearch) (from versions: none)
ERROR: No matching distribution found for fastlite

Litesearch only adds dependencies it needs, so you can use import * from litesearch without worrying about heavy dependencies. > First time import will try to setup usearch extensions and installing libsqlite3 if you do not have it already. mac also needs an extra step to add libsqlite3 into it’s LC_PATH. Check postfix.py for details.

from litesearch import *

database

db = database()
db.q('select sqlite_version() as sqlite_version')

[{'sqlite_version': '3.52.0'}]

Let’s try some of usearch’s distance functions

import numpy as np

embs = dict(
    v1=np.ones((100,),dtype=np.float32).tobytes(),      # vector of ones
    v2=np.zeros((100,),dtype=np.float32).tobytes(),     # vector of zeros
    v3=np.full((100,),0.25,dtype=np.float32).tobytes()  # vector of 0.25s
)
def dist_q(metric):
    return db.q(f'''
        select
            distance_{metric}_f32(:v1,:v2) as {metric}_v1_v2,
            distance_{metric}_f32(:v1,:v3) as {metric}_v1_v3,
            distance_{metric}_f32(:v2,:v3) as {metric}_v2_v3
    ''', embs)

for fn in ['sqeuclidean', 'divergence', 'inner', 'cosine']: print(dist_q(fn))

[{'sqeuclidean_v1_v2': 100.0, 'sqeuclidean_v1_v3': 56.25, 'sqeuclidean_v2_v3': 6.25}]
[{'divergence_v1_v2': 34.657352447509766, 'divergence_v1_v3': 12.046551704406738, 'divergence_v2_v3': 8.66433334350586}]
[{'inner_v1_v2': 1.0, 'inner_v1_v3': -24.0, 'inner_v2_v3': 1.0}]
[{'cosine_v1_v2': 1.0, 'cosine_v1_v3': 0.0, 'cosine_v2_v3': 1.0}]

store

A store is a table with FTS5 and vector search capabilities.

store = db.get_store()
store.schema

'CREATE TABLE [store] (\n   [content] TEXT NOT NULL,\n   [embedding] BLOB,\n   [metadata] TEXT,\n   [uploaded_at] FLOAT DEFAULT CURRENT_TIMESTAMP,\n   [id] INTEGER PRIMARY KEY\n)'

Let’s use model2vec for fast semantic embeddings. > Checkout FastEncode in utils module for onnx based text encoders. > Check the examples folder for usage. > if you have a gpu available, you can use dtype=np.float16 for faster performance and pip install onnxruntime-gpu

from model2vec import StaticModel
enc = StaticModel.from_pretrained("minishlab/potion-retrieval-32M")

/Users/71293/code/litesearch/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

txts, q = ['this is a text', "I'm hungry", "Let's play! shall we?"], 'playing hungry'
embs = enc.encode(txts + [q])  # shape (4, 256), float32
embs

array([[-0.00419094, -0.01899163, -0.07627466, ...,  0.03772586,
        -0.01589733,  0.00763395],
       [-0.07912003, -0.03846852,  0.00235392, ..., -0.00180434,
        -0.03562816,  0.0555767 ],
       [-0.09269759,  0.02909932,  0.03291405, ...,  0.00456874,
        -0.02399384,  0.02755835],
       [-0.10443158, -0.09687435,  0.02284234, ..., -0.03197551,
         0.02392859,  0.03887533]], shape=(4, 512), dtype=float32)

usearch also works with json embeddings, but using bytes leverages simd well.

rows = [dict(content=t, embedding=e.tobytes()) for t,e in zip(txts,embs[:-1])]
store.insert_all(rows)

<Table store (content, embedding, metadata, uploaded_at, id)>

search

You can search through results using the search method of the database. the results are automatically reranked. Turn it ooff by passing rrf=False

db.search(q, embs[-1].tobytes(), columns=['id', 'content'])

[{'rowid': 1,
  'id': 1,
  'content': 'this is a text',
  '_dist': None,
  '_rrf_score': 0.016666666666666666},
 {'rowid': 2,
  'id': 2,
  'content': "I'm hungry",
  '_dist': None,
  '_rrf_score': 0.01639344262295082},
 {'rowid': 3,
  'id': 3,
  'content': "Let's play! shall we?",
  '_dist': None,
  '_rrf_score': 0.016129032258064516}]

Turning off reranking can help you understand where the results are coming from.

db.search(q, embs[-1].tobytes(), columns=['id', 'content'], rrf=False)

{'fts': [],
 'vec': [{'id': 1, 'content': 'this is a text', '_dist': None},
  {'id': 2, 'content': "I'm hungry", '_dist': None},
  {'id': 3, 'content': "Let's play! shall we?", '_dist': None}]}

Next steps

Check out the data module for document and code parsing utilities.
Check out the utils module for onnx based text encoders.
Check out the examples folder for complete examples.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.23

Apr 21, 2026

0.0.22

Apr 20, 2026

0.0.21

Apr 17, 2026

0.0.20

Apr 15, 2026

0.0.19

Apr 14, 2026

0.0.18

Apr 13, 2026

0.0.17

Apr 13, 2026

0.0.16

Apr 12, 2026

0.0.15

Apr 11, 2026

0.0.14

Mar 12, 2026

0.0.13

Mar 11, 2026

This version

0.0.12

Mar 10, 2026

0.0.11

Jan 12, 2026

0.0.10

Jan 12, 2026

0.0.9

Dec 16, 2025

0.0.8

Dec 16, 2025

0.0.7

Dec 5, 2025

0.0.6

Dec 5, 2025

0.0.5

Nov 28, 2025

0.0.4

Nov 27, 2025

0.0.3

Nov 21, 2025

0.0.2

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litesearch-0.0.12.tar.gz (302.9 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litesearch-0.0.12-py3-none-any.whl (17.8 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file litesearch-0.0.12.tar.gz.

File metadata

Download URL: litesearch-0.0.12.tar.gz
Upload date: Mar 10, 2026
Size: 302.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for litesearch-0.0.12.tar.gz
Algorithm	Hash digest
SHA256	`966f40a15155cdcbd6165210c22610118f18f9667cdb4d2c442243a376ebfbc1`
MD5	`35b6509b9cac8563dfe8d53e1a0c3139`
BLAKE2b-256	`9efd419f40fdfcbacd6684ceddbf258e21b00fd08fdd28b2645861691646c02c`

See more details on using hashes here.

File details

Details for the file litesearch-0.0.12-py3-none-any.whl.

File metadata

Download URL: litesearch-0.0.12-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 17.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for litesearch-0.0.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f16a9a18161014ea10d5165bd5a84f2c7ae0a4f404f8b060802745c7d585f6c5`
MD5	`6b952181f5c8a966dd482c8df061cb9e`
BLAKE2b-256	`f9f5474c4414b0e86753d68a85afc8f0a2330a558a1df78a113e0f00e827d88c`

See more details on using hashes here.

litesearch 0.0.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

litesearch

Get Started

database

store

search

Next steps

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes