A lightweight Python client around sqlite-vec for CRUD and similarity search.
Project description
sqlite-vec-client
A lightweight Python client around sqlite-vec that lets you store texts, JSON metadata, and float32 embeddings in SQLite and run fast similarity search.
Features
- Simple API: One class,
SQLiteVecClient, for CRUD and search. - Vector index via sqlite-vec: Uses a
vec0virtual table under the hood. - Automatic sync: Triggers keep the base table and vector index aligned.
- Typed results: Clear return types for results and searches.
- Filtering helpers: Fetch by
rowid,text, ormetadata. - Pagination & sorting: List records with
limit,offset, and order. - Bulk operations: Efficient
update_many(),get_all()generator, and transaction support. - Backup tooling: High-level
backup()andrestore()helpers for disaster recovery workflows.
Requirements
Installation
Install from PyPI:
pip install sqlite-vec-client
Or:
git clone https://github.com/atasoglu/sqlite-vec-client
cd sqlite-vec-client
pip install .
Quick start
from sqlite_vec_client import SQLiteVecClient
# Initialize a client bound to a specific table in a database file
client = SQLiteVecClient(table="documents", db_path="./example.db")
# Create schema (base table + vec index); choose embedding dimension and distance
client.create_table(dim=384, distance="cosine")
# Add some texts with embeddings (one embedding per text)
texts = ["hello world", "lorem ipsum", "vector databases"]
embs = [
[0.1, 0.2, 0.3, *([0.0] * 381)],
[0.05, 0.04, 0.03, *([0.0] * 381)],
[0.2, 0.1, 0.05, *([0.0] * 381)],
]
rowids = client.add(texts=texts, embeddings=embs)
# Similarity search returns (rowid, text, distance)
query_emb = [0.1, 0.2, 0.3, *([0.0] * 381)]
hits = client.similarity_search(embedding=query_emb, top_k=3)
# Fetch full rows (rowid, text, metadata, embedding)
rows = client.get_many(rowids)
client.close()
Export/Import
Export and import data in JSON or CSV formats for backups, migrations, and data sharing:
# Export to JSON (includes embeddings)
count = client.export_to_json("backup.jsonl")
# Export to CSV (human-readable, optional embeddings)
count = client.export_to_csv("data.csv", include_embeddings=False)
# Export filtered data
count = client.export_to_json(
"important.jsonl",
filters={"priority": "high"}
)
# Import from JSON
count = client.import_from_json("backup.jsonl")
# Import from CSV
count = client.import_from_csv("data.csv")
# Backup and restore workflow
client.export_to_json("backup.jsonl")
# ... data loss ...
client.import_from_json("backup.jsonl")
See examples/export_import_example.py for more examples.
Quick backup & restore helpers
# Create a JSONL backup
client.backup("backup.jsonl")
# Restore later (optionally skip duplicates)
client.restore("backup.jsonl", skip_duplicates=True)
# Work with CSV
client.backup("backup.csv", format="csv", include_embeddings=True)
client.restore("backup.csv", format="csv", skip_duplicates=True)
Metadata Filtering
Efficiently filter records by metadata fields using SQLite's JSON functions:
# Filter by single field
results = client.filter_by_metadata({"category": "python"})
# Filter by multiple fields
results = client.filter_by_metadata({"category": "python", "year": 2024})
# Nested JSON paths
results = client.filter_by_metadata({"author.name": "Alice"})
# Count matching records
count = client.count_by_metadata({"category": "python"})
# Combined similarity search + metadata filtering
hits = client.similarity_search_with_filter(
embedding=query_vector,
filters={"category": "python"},
top_k=5
)
# Pagination
results = client.filter_by_metadata(
{"category": "python"},
limit=10,
offset=0
)
See examples/metadata_filtering.py and examples/advanced_metadata_queries.py for more examples.
Bulk Operations
The client provides optimized methods for bulk operations:
# Bulk update multiple records
updates = [
(rowid1, "new text", {"key": "value"}, None),
(rowid2, None, {"updated": True}, new_embedding),
]
count = client.update_many(updates)
# Memory-efficient iteration over all records
for rowid, text, metadata, embedding in client.get_all(batch_size=100):
process(text)
# Atomic transactions
with client.transaction():
client.add(texts, embeddings)
client.update_many(updates)
client.delete_many(old_ids)
See examples/batch_operations.py for more examples.
How it works
SQLiteVecClient stores data in {table} and mirrors embeddings in {table}_vec (a vec0 virtual table). SQLite triggers keep both in sync when rows are inserted, updated, or deleted. Embeddings are serialized as packed float32 bytes for compact storage.
Logging
The library includes built-in logging support using Python's standard logging module. By default, logging is set to WARNING level.
Configure log level via environment variable:
export SQLITE_VEC_CLIENT_LOG_LEVEL=DEBUG # Linux/macOS
set SQLITE_VEC_CLIENT_LOG_LEVEL=DEBUG # Windows
Or programmatically:
import logging
from sqlite_vec_client import get_logger
logger = get_logger()
logger.setLevel(logging.DEBUG) # DEBUG, INFO, WARNING, ERROR, CRITICAL
Available log levels:
DEBUG: Detailed information for diagnosing issuesINFO: General informational messages about operationsWARNING: Warning messages (default)ERROR: Error messagesCRITICAL: Critical error messages
See examples/logging_example.py for a complete example.
Testing
The project has comprehensive test coverage (91%+) with 75 tests covering:
- Unit tests for utilities and validation
- Integration tests for all client operations
- Security tests for SQL injection prevention
- Edge cases and error handling
See TESTING.md for detailed testing documentation.
Development
Setup
Install development dependencies:
pip install -r requirements-dev.txt
pre-commit install
Testing
The project uses pytest with comprehensive test coverage (89%+).
Run all tests:
pytest
Run with verbose output:
pytest -v
Run specific test categories:
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
Coverage (terminal + XML for CI):
pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml
The CI workflow uploads the generated coverage.xml as an artifact for downstream dashboards.
Run specific test file:
pytest tests/test_client.py
pytest tests/test_validation.py
pytest tests/test_security.py
pytest tests/test_utils.py
Code Quality
Format code:
ruff format .
Lint code:
ruff check .
Type checking:
mypy sqlite_vec_client/
Run all quality checks:
ruff check . && ruff format . && mypy sqlite_vec_client/ && pytest
Benchmarks
Run benchmarks:
python -m benchmarks
Configure benchmarks: Edit benchmarks/config.yaml to customize:
- Dataset sizes (default: 100, 1000, 10000, 50000)
- Embedding dimension (default: 384)
- Distance metric (default: cosine)
- Database modes (file, memory)
- Similarity search iterations and top-k values
Documentation
- CONTRIBUTING.md - Contribution guidelines
- CHANGELOG.md - Version history
- TESTING.md - Testing documentation
- Docs site (MkDocs) - Serve locally with
mkdocs serve - Examples - Usage examples
- basic_usage.py - Basic CRUD operations
- metadata_filtering.py - Metadata filtering and queries
- advanced_metadata_queries.py - Advanced metadata filtering with nested paths
- export_import_example.py - Export/import data in JSON and CSV formats
- transaction_example.py - Transaction management with all CRUD operations
- batch_operations.py - Bulk operations
- logging_example.py - Logging configuration
- Benchmarks - Performance benchmarks
Contributing
Contributions are very welcome! See CONTRIBUTING.md for guidelines.
License
MIT - See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlite_vec_client-2.4.0.tar.gz.
File metadata
- Download URL: sqlite_vec_client-2.4.0.tar.gz
- Upload date:
- Size: 29.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2d11daa8ad9eea7efd7c793e6435f2c316aeceb36945932e945d0dd2fcc6041
|
|
| MD5 |
26eb2e1a2e12383aa426284133514b09
|
|
| BLAKE2b-256 |
5693b698a28ddf9e0c58697e1be795bcc5ff446905b7f84bf13dde6789b7ed1f
|
File details
Details for the file sqlite_vec_client-2.4.0-py3-none-any.whl.
File metadata
- Download URL: sqlite_vec_client-2.4.0-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48d6e33cb58b21f458fdb0c41942c76f051dd8c6ff8e95bd269b18c3e0701812
|
|
| MD5 |
a39db2246e5abecdca4a428466c650dc
|
|
| BLAKE2b-256 |
134966392dddd6272c3c5c8572ba02414bfdc2ceba071e4cc48650a1fcf09d7b
|