Hybird search with SQLite AI and SQLite Vector
Project description
SQLite RAG
A hybrid search engine built on SQLite with SQLite AI and SQLite Vector extensions. SQLite RAG combines vector similarity search with full-text search (FTS5 extension) using Reciprocal Rank Fusion (RRF) for enhanced document retrieval.
Features
- Hybrid Search: Combines vector embeddings with full-text search for optimal results
- SQLite-based: Built on SQLite with AI and Vector extensions for reliability and performance
- Multi-format Text Support: Process text file formats including PDF, DOCX, Markdown, code files
- Recursive Character Text Splitter: Token-aware text chunking with configurable overlap
- Interactive CLI: Command-line interface with interactive REPL mode
- Flexible Configuration: Customizable embedding models, search weights, and chunking parameters
Installation
Prerequisites
SQLite RAG requires SQLite with extension loading support.
If you encounter extension loading issues (e.g., 'sqlite3.Connection' object has no attribute 'enable_load_extension'), follow the setup guides for macOS or Windows.
Install SQLite RAG
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install sqlite-rag
Quick Start
Download the model Embedding Gemma from Hugging Face chosen as default model:
sqlite-rag download-model unsloth/embeddinggemma-300m-GGUF embeddinggemma-300M-Q8_0.gguf
SQLite RAG comes preconfigured to work with the Embedding Gemma model. When you add a document or text, it automatically creates a new database (if one does not already exist) and uses default settings, so you can get started immediately without manual setup.
# Initialize sqliterag.sqlite database and add documents
sqlite-rag add-text "Artificial intelligence (AI) enables machines to learn from data"
sqlite-rag add /path/to/documents --recursive
# Search your documents
sqlite-rag search "explain AI"
# Interactive mode
sqlite-rag
> help
> search "interactive search"
> exit
For help run:
sqlite-rag --help
CLI Commands
Configuration
Settings are stored in the database and should be set before adding any documents.
# View available configuration options
sqlite-rag configure --help
sqlite-rag configure --model-path ./mymodels/path
# View current settings
sqlite-rag settings
To use a different database filename, use the global --database option:
# Single command with custom database
sqlite-rag --database path/to/mydb.db add-text "Let's talk about AI."
# Interactive mode with custom database
sqlite-rag --database path/to/mydb.db
Model Management
You can experiment with other models from Hugging Face by downloading them with:
# Download GGUF models from Hugging Face
sqlite-rag download-model <model-repo> <filename>
Supported File Formats
SQLite RAG supports the following file formats:
- Text:
.txt,.md,.mdx,.csv,.json,.xml,.yaml,.yml - Documents:
.pdf,.docx,.pptx,.xlsx - Code:
.c,.cpp,.css,.go,.h,.hpp,.html,.java,.js,.mjs,.kt,.php,.py,.rb,.rs,.swift,.ts,.tsx - Web Frameworks:
.svelte,.vue
Development
Installation
For development, clone the repository and install with development dependencies:
# Clone the repository
git clone https://github.com/sqliteai/sqlite-rag.git
cd sqlite-rag
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
pip install -e '.[dev]'
How It Works
- Document Processing: Files are processed and split into overlapping chunks
- Embedding Generation: Text chunks are converted to vector embeddings using AI models
- Dual Indexing: Content is indexed for both vector similarity and full-text search
- Hybrid Search: Queries are processed through both search methods
- Result Fusion: Results are combined using Reciprocal Rank Fusion for optimal relevance
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlite_rag-0.1.4-py3-none-any.whl.
File metadata
- Download URL: sqlite_rag-0.1.4-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2381a56be1375b3028a28d220769a5be00ed10c2ac895a67a5d2b487d240f614
|
|
| MD5 |
c5bb0e6975e556cd40a656772b3e37aa
|
|
| BLAKE2b-256 |
a1d0ea1793eddc73d4aed8dc86b7ef2ca0df208c78f76a4ff8c6dd7c1dc4b9e1
|
Provenance
The following attestation bundles were made for sqlite_rag-0.1.4-py3-none-any.whl:
Publisher:
pypi-package.yaml on sqliteai/sqlite-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlite_rag-0.1.4-py3-none-any.whl -
Subject digest:
2381a56be1375b3028a28d220769a5be00ed10c2ac895a67a5d2b487d240f614 - Sigstore transparency entry: 627342765
- Sigstore integration time:
-
Permalink:
sqliteai/sqlite-rag@8bd55dbf0a94d20c093d65f0d03824c29d0f895d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/sqliteai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-package.yaml@8bd55dbf0a94d20c093d65f0d03824c29d0f895d -
Trigger Event:
workflow_dispatch
-
Statement type: