Skip to main content

SmartKDB – Cognitive & AI-Training-Aware Embedded Database

Project description

SmartKDB 🧠

The Cognitive, AI-Native Embedded Database for Python.

PyPI version Python

SmartKDB is not just a database; it's a data engine for the AI era. It combines a local-first NoSQL store with a "Brain" that learns from your queries, an "Agent" you can chat with, and a "Training Hub" to manage ML datasets.


📚 Table of Contents

  1. Quick Start (The 5-Minute Crash Course)
  2. Core Database Operations (CRUD)
  3. The Query Engine
  4. Cognitive Layer (Chat & Agent)
  5. AI Training Hub (Datasets & Logs)
  6. Vector Search & Embeddings
  7. Architecture & Internals

1. Quick Start (The 5-Minute Crash Course)

Let's build a simple Hospital System to demonstrate SmartKDB.

pip install smartkdb
from kdb import SmartKDB

# 1. Initialize the DB (Creates a folder 'hospital.kdb')
db = SmartKDB("hospital.kdb")

# 2. Setup Security (First run only)
# SmartKDB enforces RBAC. You must create an admin.
db.auth.create_user("admin", "admin123", "admin")
db.login("admin", "admin123")

# 3. Create a Table
# Note: SmartKDB is schema-less (NoSQL), but you define the Primary Key (pk)
# and any initial indexes for speed.
patients = db.create_table("patients", pk="id", indexes=["age", "diagnosis"])

# 4. Insert Data
patients.insert({
    "id": "P001",
    "name": "Ahmad",
    "age": 54,
    "diagnosis": "pneumonia",
    "status": "admitted"
})
patients.insert({
    "id": "P002",
    "name": "Sarah",
    "age": 29,
    "diagnosis": "flu",
    "status": "discharged"
})

# 5. Run a Query
# Find all patients older than 40
results = patients.query().where("age", ">", 40).execute()
print(f"Found {len(results)} patients: {results}")

2. Core Database Operations (CRUD)

SmartKDB provides a simple, Pythonic API for all standard operations.

Create (Insert)

# Returns the inserted document (including the generated PK if missing)
doc = patients.insert({
    "id": "P003",
    "name": "Ali",
    "age": 12,
    "diagnosis": "fracture"
})

Read (Get by ID)

# Extremely fast O(1) lookup by Primary Key
patient = patients.get("P001")
if patient:
    print(patient["name"])

Update

# Updates specific fields. Merges with existing data.
# Returns the updated document.
updated_doc = patients.update("P001", {
    "status": "discharged",
    "notes": "Recovered fully"
})

Delete

# Removes the record permanently
patients.delete("P003")

3. The Query Engine

SmartKDB's query engine is designed for readability and speed.

Basic Filtering

# Syntax: .where(field, operator, value)
# Operators: "==", "!=", ">", "<", ">=", "<=", "in", "contains"

# Find patients with flu OR pneumonia
results = patients.query() \
    .where("diagnosis", "in", ["flu", "pneumonia"]) \
    .execute()

Chaining & Sorting

results = patients.query() \
    .where("age", ">", 20) \
    .where("status", "==", "admitted") \
    .sort_by("age", ascending=False) \
    .limit(10) \
    .execute()

Semantic Query (AI-Powered)

Don't want to write code? Use natural language.

# The DB understands "older than", "active", "limit", etc.
results = db.semantic_query("patients", "patients older than 50 limit 5")

4. Cognitive Layer (Chat & Agent)

SmartKDB v4 introduces an internal agent that monitors your database.

Chat with your DB

You can ask the database about its own state.

response = db.chat("How many tables do we have?")
print(response['message'])
# Output: "You have 1 table: patients."

response = db.chat("Which tables are hot?")
print(response['message'])
# Output: "The 'patients' table is experiencing high read volume."

Predictive Advice

The agent analyzes query patterns and suggests optimizations.

# Ask for advice
advice = db.chat("Do you recommend any indexes?")
# If you query 'status' often but it's not indexed, the Brain will suggest it.

5. AI Training Hub (Datasets & Logs)

SmartKDB is built to be the backend for your AI models. It manages the chaos of training data.

Step 1: Create a Dataset

Instead of dumping CSVs, define datasets dynamically from your tables.

# Create a dataset of 'pneumonia' cases for an X-Ray model
db.datasets.create_dataset(
    name="pneumonia_cases",
    table="patients",
    filter_query={"diagnosis": "pneumonia"}
)

Step 2: Define Splits

Reproducibility is key. Define your Train/Test/Val splits once.

# 70% Train, 15% Validation, 15% Test
db.datasets.define_split("pneumonia_cases", 0.7, 0.15, 0.15)

Step 3: Log Training Experiments

Keep your training metrics right next to your data.

# Start a session
session_id = db.training_logger.start_session(
    model_name="xray_v1",
    dataset_name="pneumonia_cases",
    config={"epochs": 10, "lr": 0.001}
)

# Log metrics (e.g., inside your PyTorch/TensorFlow loop)
db.training_logger.log_metric(session_id, 1, {"loss": 0.8, "acc": 0.6})
db.training_logger.log_metric(session_id, 2, {"loss": 0.5, "acc": 0.8})

# Finish
db.training_logger.end_session(session_id, "success")

6. Vector Search & Embeddings

SmartKDB has a built-in vector store. You don't need a separate vector DB.

# 1. Enable Vector Index on a text field
db.enable_vector_index("patients", "notes")

# 2. Add data (Vectors are updated automatically if you hook up an embedder, 
# or you can push them manually - *Automatic embedding coming in v4.1*)
# For now, v4 supports storing and searching pre-computed vectors or text-similarity 
# if using the default TF-IDF fallback.

# 3. Search
similar_patients = db.vector_search("patients", "patient has breathing issues", "notes")

7. Advanced Usage & Best Practices

🛡️ Error Handling

Robust applications need to handle failures gracefully.

from kdb import SmartKDB, AuthError, RecordNotFoundError

try:
    db = SmartKDB("prod.kdb")
    db.login("admin", "wrong_password")
except AuthError as e:
    print(f"🚨 Security Alert: {e}")
except Exception as e:
    print(f"❌ Unexpected Error: {e}")

try:
    users.get("non_existent_id")
except RecordNotFoundError:
    print("User not found.")

🔍 Querying Guide: Which tool to use?

Feature Best For Example
query() Exact filtering, Analytics, Reporting age > 30 AND status == 'active'
semantic_query() Natural Language, Chatbots, Non-technical users "users older than 30"
vector_search() Similarity, Recommendations, Fuzzy Search "find products like this one"

🔄 AI Project Lifecycle with SmartKDB

  1. Ingest: db.create_table() -> table.insert() (Raw Data)
  2. Refine: db.chat("suggest indexes") -> table.create_index() (Optimization)
  3. Curate: db.datasets.create_dataset() (Prepare for AI)
  4. Train: db.training_logger.start_session() (Experimentation)
  5. Deploy: Use db.predictor to forecast usage in production.

❓ FAQ

Q: Is SmartKDB a replacement for PostgreSQL/MySQL? A: No. SmartKDB is an embedded engine (like SQLite) optimized for AI workflows and local apps. Use Postgres for massive, concurrent enterprise backends. Use SmartKDB for AI agents, local tools, and edge computing.

Q: Does it require Internet? A: No. It is 100% local and offline.

Q: How much data can it handle? A: It handles millions of records comfortably on a standard SSD. For billions, consider sharding or a distributed DB.

Q: Is it production ready? A: Yes, for embedded use cases. It uses append-only storage which is highly resistant to corruption.


8. Architecture & Internals

For the curious engineer:

  • Storage: Append-only log structure (BlockStorage). Extremely robust against corruption.
  • Indexing: In-memory B-Tree for Primary Keys, Hash Maps for Secondary Indexes.
  • The Brain: A JSON-based state machine that tracks query statistics (kdb_brain.json).
  • Concurrency: Single-writer, Multi-reader (SWMR) model using file locks.
  • Format: All data is stored as human-readable(ish) JSON lines or binary blocks depending on config.

📚 Documentation


License: MIT
Author: Alhdrawi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartkdb-4.1.0.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartkdb-4.1.0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file smartkdb-4.1.0.tar.gz.

File metadata

  • Download URL: smartkdb-4.1.0.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for smartkdb-4.1.0.tar.gz
Algorithm Hash digest
SHA256 7ff25a97edf74f0a82b5fd9850324f4c96b31499a43e82d77a9d10d3c64ded0c
MD5 e75b2beed270b77624873a7e96a20c28
BLAKE2b-256 8449c03e4432c2feb25fe431f8a0ae0d5ae31a7a000ac6bb332e47e9ce7fe681

See more details on using hashes here.

File details

Details for the file smartkdb-4.1.0-py3-none-any.whl.

File metadata

  • Download URL: smartkdb-4.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for smartkdb-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 625d4fc0806812fb5a24ffa7b04c5277c60c2f332218a18bec66941c1409f654
MD5 563aca8209fe5bc8fd6f2bf1437b68df
BLAKE2b-256 ebb5d5df7411baaafcedd9ff8763dbf901fd08860de08499116c1381abca5c32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page