Skip to main content

A lightweight local vector-aware database for Python

Project description

menteedb

menteedb is a lightweight local Python library that combines table-like records with optional vector similarity search, fluent query API, and optional encryption.

Features

  • Define tables with a schema.
  • Insert structured records.
  • Fluent Query Builder - no SQL, pure Python with field selection, filtering, and conditions.
  • Optional AES-256-GCM encryption with automatic key derivation.
  • Binary MessagePack format - ~50% smaller files than JSON, automatic fallback to JSON.
  • Enable vector search on one text field per table.
  • Fast text contains search per table.
  • Query by field filters and/or semantic similarity.
  • Persist data locally with append-only files for speed.

Quick Start

Basic Usage

from menteedb import MenteeDB

db = MenteeDB(base_path="./data")

db.create_table(
    table_name="notes",
    fields={"title": "str", "body": "str", "tag": "str"},
    vector_field="body",
)

db.insert("notes", {"title": "First", "body": "Vector databases are useful.", "tag": "ml"})
db.insert("notes", {"title": "Second", "body": "I enjoy local-first tools.", "tag": "dev"})

# Fluent query API
results = db.find("notes").where("tag", "==", "ml").select("title", "body").execute()
print(results)

Encrypted Storage (Optional)

from menteedb import MenteeDB

# Enable encryption
db = MenteeDB(
    base_path="./secure_data",
    use_encryption=True,
    encryption_key="my_secure_password"
)

db.create_table("secrets", fields={"key": "str", "value": "str"})
db.insert("secrets", {"key": "api_token", "value": "sk_live_..."})

# Query encrypted data transparently
results = db.find("secrets").where("key", "==", "api_token").execute()

Query API (No SQL!)

Instead of SQL syntax, use Python method chaining:

# SELECT name, email FROM users WHERE age > 25 AND city = 'NYC'
results = (
    db.find('users')
    .where('age', '>', 25)
    .where('city', '==', 'NYC')
    .select('name', 'email')
    .execute()
)

Supported Operators

  • Comparison: ==, !=, >, <, >=, <=
  • Collection: in
  • String: contains

See QUERY_GUIDE.md for complete examples.

Legacy Query Modes

The original db.query() method still works:

  • Filter-only:
    • db.query("notes", conditions={"tag": "ml"})
  • Text contains search:
    • db.query("notes", text_query="vector", text_fields=["title", "body"])
  • Vector-only:
    • db.query("notes", vector_query="your text")
  • Hybrid (filter + vector):
    • db.query("notes", conditions={"tag": "dev"}, vector_query="local tools")

Storage Layout

For base_path="./data" and table notes, menteedb stores:

  • ./data/notes/schema.json
  • ./data/notes/records.jsonl - Binary MessagePack format (compact, ~50% smaller than JSON)
  • ./data/notes/vector_ids.jsonl
  • ./data/notes/vectors.f32

Storage Features

  • MessagePack Binary Format: Compact and fast serialization (~50% size reduction vs JSON)
  • Optional Encryption: Enable AES-256-GCM encryption to protect sensitive data on disk
  • Automatic Format Detection: Seamlessly reads legacy JSON data and writes new data as MessagePack
  • Append-Only Design: Fast sequential writes with minimal overhead

This is local file-based storage. It is not publicly exposed over the network, but anyone with local filesystem access to this folder can read it. Enable encryption for sensitive data.

Encryption

Protect sensitive data with AES-256-GCM encryption:

from menteedb import MenteeDB

db = MenteeDB(
    base_path="./secure",
    use_encryption=True,
    encryption_key="your_secure_password"
)

Benefits:

  • ✅ AES-256-GCM authenticated encryption
  • ✅ Automatic key derivation (PBKDF2-HMAC-SHA256)
  • ✅ Transparent to your code
  • ✅ ~50% disk savings with MessagePack

See ENCRYPTION_GUIDE.md for security best practices and examples.

Privacy and Permissions

  • By default, MenteeDB(..., secure_permissions=True) applies best-effort private permissions (700 for table folders, 600 for files).
  • On Windows, real privacy is controlled by NTFS ACLs; chmod behavior is limited.

Testing

Run locally:

pip install .[dev]
pytest -q

CI/CD to PyPI

Workflow file: .github/workflows/pypi-publish.yml

  • Runs tests on pushes to main, tags (v*), and releases.
  • Publishes to PyPI on tag push (v*) or GitHub Release publish.
  • Uses trusted publishing via GitHub OIDC.

Notes

  • This initial version supports one vector field per table.
  • Default embeddings use a deterministic local hashing embedder with no external model download.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

menteedb-0.2.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

menteedb-0.2.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file menteedb-0.2.0.tar.gz.

File metadata

  • Download URL: menteedb-0.2.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for menteedb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4a178b722920f78a01820e77b39e2e0aefdf214dbcc052bf49412613274bfddf
MD5 ac212e143d9fba10a6cd257cee29dcab
BLAKE2b-256 b57720084a4790c4df3412e829f7569233466770d43b6672e60f20856aa3d320

See more details on using hashes here.

Provenance

The following attestation bundles were made for menteedb-0.2.0.tar.gz:

Publisher: pypi-publish.yml on SyabAhmad/menteedb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file menteedb-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: menteedb-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for menteedb-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7815712d45e85220f33056adf2221b3a38cc23e1fbc4a669aa34b02b63a7cba7
MD5 5e82d5adce62ae778fe740ee3127176f
BLAKE2b-256 f19c93f3720f6924f23baa3a184aafe1b665fb8cc0d362439276d373d2562017

See more details on using hashes here.

Provenance

The following attestation bundles were made for menteedb-0.2.0-py3-none-any.whl:

Publisher: pypi-publish.yml on SyabAhmad/menteedb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page