Skip to main content

Generic base classes to handle ORM functionality for multiple downstream datamodels

Project description

orm-loader

Tests

A lightweight foundation for building and validating SQLAlchemy-based data models.

orm-loader sits below any particular schema or CDM. It gives you a small set of reusable pieces for defining tables, loading files through staging tables, and checking models against external specifications. It stays out of domain logic on purpose.

The library focuses on:

  • ORM table mixins and introspection
  • staged file loading
  • loader and validation infrastructure
  • operational helpers that work across supported backends

At the moment, the built-in backends are SQLite and PostgreSQL.

What this library provides

The package is deliberately small. Most downstream projects only need a couple of these pieces.

  1. A minimal ORM table base

ORMTableBase provides structural utilities for mapped tables without pulling domain rules into the base layer.

It supports:

  • mapper access and inspection
  • primary key discovery
  • required (non-nullable) column detection
  • consistent primary key handling across models
  • simple ID allocation helpers for sequence-less databases
from orm_loader.tables import ORMTableBase

class MyTable(ORMTableBase, Base):
    __tablename__ = "my_table"

You can inherit from it directly or pick it up through one of the higher-level mixins.

  1. CSV-based ingestion mixins

CSVLoadableTableInterface adds staged file loading to ORM tables. It can use pandas or PyArrow loaders, and on PostgreSQL it can use a fast COPY path when the input is clean enough.

Features include:

  • staging table creation and cleanup
  • chunked loading for large files
  • optional casting and deduplication before insert
  • backend-specific merge behaviour
  • PostgreSQL fast-path loading with ORM fallback
  • backend-aware index handling during merge
class MyTable(CSVLoadableTableInterface, ORMTableBase, Base):
    __tablename__ = "my_table"

The main extension points here are loader choice, column mapping, and the normal SQLAlchemy model definitions themselves. Most downstream projects do not need to override much beyond csv_columns() and the model schema.

  1. Structured serialisation and hashing

SerialisableTableInterface adds lightweight serialisation helpers for ORM rows.

It supports:

  • conversion to dictionaries
  • JSON serialisation
  • stable row-level fingerprints
  • iterator-style access to field/value pairs
row = session.get(MyTable, 1)
row.to_dict()
row.to_json()
row.fingerprint()

This is useful for:

  • debugging
  • auditing
  • reproducibility checks
  • downstream APIs or exports
  1. Model registry and validation scaffolding

The library includes validation infrastructure for comparing ORM models against external specifications.

This includes:

  • a model registry
  • table and field descriptors
  • validator contracts
  • a validation runner
  • structured validation reports Specifications can be loaded from CSV today, with support for other formats (e.g. LinkML) planned.
registry = ModelRegistry(model_version="1.0")
registry.load_table_specs(table_csv, field_csv)
registry.register_models([MyTable])

runner = ValidationRunner(validators=always_on_validators())
report = runner.run(registry)

Validation output is available as:

  • human-readable text
  • structured dictionaries
  • JSON (for CI/CD integration)
  • exit codes suitable for pipelines
  1. Database bootstrap helpers

The library provides lightweight helpers for schema creation and bootstrapping. It does not try to replace migrations.

from orm_loader.metadata import Base
from orm_loader.bootstrap import bootstrap

bootstrap(engine, create=True)
  1. Bulk-loading helpers

There are a few lower-level helpers for trusted bulk workflows, including backend-aware foreign key management and SQLite connection setup for heavy local loads.

Summary

This library is meant to be the boring layer underneath downstream models:

  • reusable ORM mixins
  • staged ingestion patterns
  • validation scaffolding
  • operational helpers

Domain rules, business logic, and schema semantics stay in the downstream project.

This makes it suitable as a shared foundation for:

  • clinical data models
  • research data marts
  • registry schemas
  • synthetic data pipelines

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orm_loader-0.5.1.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orm_loader-0.5.1-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file orm_loader-0.5.1.tar.gz.

File metadata

  • Download URL: orm_loader-0.5.1.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orm_loader-0.5.1.tar.gz
Algorithm Hash digest
SHA256 885f13a804dad1272bbe0377a5ae0501d8403081254660552663cd13009ee542
MD5 83bea61adb48a388e63da1f8122795be
BLAKE2b-256 a904a85b019eb07524547b73505286a6ad19b953edda79791935b779d956f0a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for orm_loader-0.5.1.tar.gz:

Publisher: python-publish.yml on AustralianCancerDataNetwork/orm-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file orm_loader-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: orm_loader-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 55.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orm_loader-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da6359fab5afc5a32e4d21a66064aed503a7d9b185967e5df2ed84ecbb4d0da7
MD5 197969836ce2b63e71f45eb860aa5ff9
BLAKE2b-256 2f4fb9ccba8f63ca2e7f466bbce7cfb09b11e7df048854097e82e0178b970dd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for orm_loader-0.5.1-py3-none-any.whl:

Publisher: python-publish.yml on AustralianCancerDataNetwork/orm-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page