Skip to main content

Database utilities for scientific computing with SQLite3 and PostgreSQL

Project description

scitex-db

SciTeX

Database utilities for scientific computing — SQLite3 + PostgreSQL with NumPy-aware storage.

Full Documentation · uv pip install scitex-db[all]

pypi python docs

tests cov License: AGPL v3


Problem and Solution

# Problem Solution
1 Storing ndarrays in SQLite means pickle.dumps → BLOB — no compression, no dtype/shape, no deterministic hashing db.save_array(table, arr) / load_array(...) — typed compressed BLOBs round-trip with dtype, shape, is_compressed, _hash columns
2 sqlite3 API is low-level — every project re-writes connect / transaction / execute boilerplate with db.transaction(): ... — context-managed transactions, health checks, dedup, schema inspection built-in
3 Switching SQLite ↔ Postgres rewrites every call site Mixin compositionSQLite3 and PostgreSQL share _BaseMixins/; the same call site works against either backend

Installation

pip install scitex-db                 # SQLite3 only
pip install scitex-db[postgresql]     # add psycopg2 driver
pip install scitex-db[all]            # everything

Configuration

Defaults work out of the box. To override, drop a config.yaml next to your script, or point SCITEX_DB_CONFIG at one — see .env.example for the full env-var list and resolution order.

Quick Start

from scitex_db import SQLite3
import numpy as np

db = SQLite3("experiments.db")

db.create_table("results", {
    "id": "INTEGER PRIMARY KEY",
    "experiment": "TEXT",
    "accuracy": "REAL",
})
db.insert_many("results", [
    {"experiment": "exp1", "accuracy": 0.95},
    {"experiment": "exp2", "accuracy": 0.92},
])

# NumPy arrays round-trip with dtype/shape preserved
db.save_array("features", np.random.rand(1000, 50), column="embeddings",
              additional_columns={"model": "bert"})
features = db.load_array("features", "embeddings", where="model = 'bert'")

2 Interfaces

Python API ⭐⭐⭐  primary surface
from scitex_db import SQLite3, PostgreSQL, check_health, inspect

# Backends
db = SQLite3("experiments.db")
db = PostgreSQL(host=..., user=..., dbname=...)

# CRUD
db.insert("results", {"experiment": "exp1", "accuracy": 0.95})
db.insert_many("results", rows, batch_size=1000)
rows = db.get_rows("results", where="accuracy > 0.9")
db.update("results", {"accuracy": 0.97}, where="id = 1")
db.delete("results", where="id = 1")

# Arrays / Blobs
db.save_array(table, arr, column="data")
db.load_array(table, "data", where=...)
db.save_blob(table, obj, column="checkpoint")
db.load_blob(table, "checkpoint", where=...)

# Transactions / maintenance
with db.transaction():
    db.insert("a", {...}); db.insert("b", {...})
db.summary                # schema + row counts
inspect("experiments.db") # standalone helper
check_health("experiments.db", fix_issues=True)
CLI ⭐⭐  scitex-db <subcommand>
scitex-db --help-recursive            # all subcommands at once
scitex-db inspect-db experiments.db   # schema + row counts
scitex-db inspect-db experiments.db --tables results --json
scitex-db check-health experiments.db --fix --yes
scitex-db check-health experiments.db --dry-run
scitex-db list-python-apis            # introspect public Python surface

Every subcommand supports -h/--help, --json, and the safety pair --dry-run / --yes where it mutates state.

Architecture

scitex_db/
├── __init__.py            ← public API (SQLite3, PostgreSQL, check_health, inspect)
├── __main__.py            ← `scitex-db` CLI entry
├── _BaseMixins/           ← backend-agnostic mixins (CRUD, schema, batch, ...)
├── _sqlite3/              ← SQLite3 driver
│   └── _SQLite3Mixins/    ← SQLite3-specific mixin overrides
├── _postgresql/           ← PostgreSQL driver
│   └── _PostgreSQLMixins/ ← PostgreSQL-specific mixin overrides
├── _check_health.py       ← `scitex-db check-health`
├── _inspect.py            ← `scitex-db inspect-db`
├── _inspect_optimized.py  ← faster path for large DBs
├── _delete_duplicates.py  ← duplicate-row cleanup
├── _utils.py              ← shared helpers
└── _skills/               ← agent-facing skill files

Each backend composes its _*Mixins/ folder onto _BaseMixins/, so swapping SQLite3PostgreSQL does not change call sites.

Demo

flowchart LR
    U["user code"] --> A["SQLite3('exp.db')"]
    U --> B["PostgreSQL(host=..., user=...)"]
    A --> M["_BaseMixins (CRUD · schema · batch · maintenance)"]
    B --> M
    A -.-> SM["_SQLite3Mixins<br/>(backend overrides)"]
    B -.-> PM["_PostgreSQLMixins<br/>(backend overrides)"]
    M --> H["scitex-db check-health<br/>(fix orphans, vacuum)"]
    M --> I["scitex-db inspect-db<br/>(schema + row counts)"]

Part of SciTeX

scitex-db is part of SciTeX. Install via the umbrella with pip install scitex[db], then import as scitex.db or invoke scitex db <subcommand> — the standalone scitex-db package remains the source of truth.

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.
  2. The freedom to study how every step works — from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_db-0.1.9.tar.gz (8.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_db-0.1.9-py3-none-any.whl (8.6 MB view details)

Uploaded Python 3

File details

Details for the file scitex_db-0.1.9.tar.gz.

File metadata

  • Download URL: scitex_db-0.1.9.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_db-0.1.9.tar.gz
Algorithm Hash digest
SHA256 cc9b491133c3874d4fafc8217ac0ea7755e5de9c2d21b4725ee1e989ca811fd4
MD5 2b8c96fc44b93e57e566e23bf8fc8984
BLAKE2b-256 3b80683d127851bf1b2b0ed99ac308931228478a6b74ace2a1f5c94c9bc9bbec

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_db-0.1.9.tar.gz:

Publisher: pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scitex_db-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: scitex_db-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_db-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6dd9caf5c79d0e66b360d9682ad470ecf0ae442c5e3aa56bb144494dc11acef9
MD5 b6c5508df74116045710ac05050b7a50
BLAKE2b-256 77e70f76cf2e6d30aa25eb98617d284e0d5d7314e85f056d2ae70c4b50e6d5e7

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_db-0.1.9-py3-none-any.whl:

Publisher: pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page