Skip to main content

Database utilities for scientific computing with SQLite3 and PostgreSQL

Project description

scitex-db

SciTeX

Database utilities for scientific computing — SQLite3 + PostgreSQL with NumPy-aware storage.

Full Documentation · uv pip install scitex-db[all]

pypi python docs

tests cov License: AGPL v3


Problem and Solution

# Problem Solution
1 Storing ndarrays in SQLite means pickle.dumps → BLOB — no compression, no dtype/shape, no deterministic hashing db.save_array(table, arr) / load_array(...) — typed compressed BLOBs round-trip with dtype, shape, is_compressed, _hash columns
2 sqlite3 API is low-level — every project re-writes connect / transaction / execute boilerplate with db.transaction(): ... — context-managed transactions, health checks, dedup, schema inspection built-in
3 Switching SQLite ↔ Postgres rewrites every call site Mixin compositionSQLite3 and PostgreSQL share _BaseMixins/; the same call site works against either backend

Installation

pip install scitex-db                 # SQLite3 only
pip install scitex-db[postgresql]     # add psycopg2 driver
pip install scitex-db[all]            # everything

Configuration

Defaults work out of the box. To override, drop a config.yaml next to your script, or point SCITEX_DB_CONFIG at one — see .env.example for the full env-var list and resolution order.

Quick Start

from scitex_db import SQLite3
import numpy as np

db = SQLite3("experiments.db")

db.create_table("results", {
    "id": "INTEGER PRIMARY KEY",
    "experiment": "TEXT",
    "accuracy": "REAL",
})
db.insert_many("results", [
    {"experiment": "exp1", "accuracy": 0.95},
    {"experiment": "exp2", "accuracy": 0.92},
])

# NumPy arrays round-trip with dtype/shape preserved
db.save_array("features", np.random.rand(1000, 50), column="embeddings",
              additional_columns={"model": "bert"})
features = db.load_array("features", "embeddings", where="model = 'bert'")

2 Interfaces

Python API ⭐⭐⭐  primary surface
from scitex_db import SQLite3, PostgreSQL, check_health, inspect

# Backends
db = SQLite3("experiments.db")
db = PostgreSQL(host=..., user=..., dbname=...)

# CRUD
db.insert("results", {"experiment": "exp1", "accuracy": 0.95})
db.insert_many("results", rows, batch_size=1000)
rows = db.get_rows("results", where="accuracy > 0.9")
db.update("results", {"accuracy": 0.97}, where="id = 1")
db.delete("results", where="id = 1")

# Arrays / Blobs
db.save_array(table, arr, column="data")
db.load_array(table, "data", where=...)
db.save_blob(table, obj, column="checkpoint")
db.load_blob(table, "checkpoint", where=...)

# Transactions / maintenance
with db.transaction():
    db.insert("a", {...}); db.insert("b", {...})
db.summary                # schema + row counts
inspect("experiments.db") # standalone helper
check_health("experiments.db", fix_issues=True)
CLI ⭐⭐  scitex-db <subcommand>
scitex-db --help-recursive            # all subcommands at once
scitex-db inspect-db experiments.db   # schema + row counts
scitex-db inspect-db experiments.db --tables results --json
scitex-db check-health experiments.db --fix --yes
scitex-db check-health experiments.db --dry-run
scitex-db list-python-apis            # introspect public Python surface

Every subcommand supports -h/--help, --json, and the safety pair --dry-run / --yes where it mutates state.

Architecture

scitex_db/
├── __init__.py            ← public API (SQLite3, PostgreSQL, check_health, inspect)
├── __main__.py            ← `scitex-db` CLI entry
├── _BaseMixins/           ← backend-agnostic mixins (CRUD, schema, batch, ...)
├── _sqlite3/              ← SQLite3 driver
│   └── _SQLite3Mixins/    ← SQLite3-specific mixin overrides
├── _postgresql/           ← PostgreSQL driver
│   └── _PostgreSQLMixins/ ← PostgreSQL-specific mixin overrides
├── _check_health.py       ← `scitex-db check-health`
├── _inspect.py            ← `scitex-db inspect-db`
├── _inspect_optimized.py  ← faster path for large DBs
├── _delete_duplicates.py  ← duplicate-row cleanup
├── _utils.py              ← shared helpers
└── _skills/               ← agent-facing skill files

Each backend composes its _*Mixins/ folder onto _BaseMixins/, so swapping SQLite3PostgreSQL does not change call sites.

Demo

flowchart LR
    U["user code"] --> A["SQLite3('exp.db')"]
    U --> B["PostgreSQL(host=..., user=...)"]
    A --> M["_BaseMixins (CRUD · schema · batch · maintenance)"]
    B --> M
    A -.-> SM["_SQLite3Mixins<br/>(backend overrides)"]
    B -.-> PM["_PostgreSQLMixins<br/>(backend overrides)"]
    M --> H["scitex-db check-health<br/>(fix orphans, vacuum)"]
    M --> I["scitex-db inspect-db<br/>(schema + row counts)"]

Part of SciTeX

scitex-db is part of SciTeX. Install via the umbrella with pip install scitex[db], then import as scitex.db or invoke scitex db <subcommand> — the standalone scitex-db package remains the source of truth.

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.
  2. The freedom to study how every step works — from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_db-0.1.11.tar.gz (8.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_db-0.1.11-py3-none-any.whl (8.6 MB view details)

Uploaded Python 3

File details

Details for the file scitex_db-0.1.11.tar.gz.

File metadata

  • Download URL: scitex_db-0.1.11.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_db-0.1.11.tar.gz
Algorithm Hash digest
SHA256 849313444465ac3014533d83a404fb44a3c855745952278766a3ee35885b6a28
MD5 a9aec1580e071b42c84cd64167586aeb
BLAKE2b-256 3d24ff1b6b846c2084dfc8047f42d3a8fc4e3a63055a41f7aca1785b592b5bdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_db-0.1.11.tar.gz:

Publisher: pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scitex_db-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: scitex_db-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_db-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 dd187a777069edd0dd92f96aabe80ffaf4885fcc7ff9270cc68416326fd6835f
MD5 c89adbb89e7a1467ea4cd9f5d79ee02a
BLAKE2b-256 539037e5edbf06ca110d5354d6e6303c1c856798d84c6f2fa22c01c192ef5de8

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_db-0.1.11-py3-none-any.whl:

Publisher: pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page