Skip to main content

Generic base classes to handle ORM functionality for multiple downstream datamodels

Project description

orm-loader

Tests

A lightweight foundation for building and validating SQLAlchemy-based data models.

orm-loader sits below any particular schema or CDM. It gives you a small set of reusable pieces for defining tables, loading files through staging tables, and checking models against external specifications. It stays out of domain logic on purpose.

The library focuses on:

  • ORM table mixins and introspection
  • staged file loading
  • loader and validation infrastructure
  • operational helpers that work across supported backends

At the moment, the built-in backends are SQLite and PostgreSQL.

What this library provides

The package is deliberately small. Most downstream projects only need a couple of these pieces.

  1. A minimal ORM table base

ORMTableBase provides structural utilities for mapped tables without pulling domain rules into the base layer.

It supports:

  • mapper access and inspection
  • primary key discovery
  • required (non-nullable) column detection
  • consistent primary key handling across models
  • simple ID allocation helpers for sequence-less databases
from orm_loader.tables import ORMTableBase

class MyTable(ORMTableBase, Base):
    __tablename__ = "my_table"

You can inherit from it directly or pick it up through one of the higher-level mixins.

  1. CSV-based ingestion mixins

CSVLoadableTableInterface adds staged file loading to ORM tables. It can use pandas or PyArrow loaders, and on PostgreSQL it can use a fast COPY path when the input is clean enough.

Features include:

  • staging table creation and cleanup
  • chunked loading for large files
  • optional casting and deduplication before insert
  • backend-specific merge behaviour
  • PostgreSQL fast-path loading with ORM fallback
  • backend-aware index handling during merge
class MyTable(CSVLoadableTableInterface, ORMTableBase, Base):
    __tablename__ = "my_table"

The main extension points here are loader choice, column mapping, and the normal SQLAlchemy model definitions themselves. Most downstream projects do not need to override much beyond csv_columns() and the model schema.

  1. Structured serialisation and hashing

SerialisableTableInterface adds lightweight serialisation helpers for ORM rows.

It supports:

  • conversion to dictionaries
  • JSON serialisation
  • stable row-level fingerprints
  • iterator-style access to field/value pairs
row = session.get(MyTable, 1)
row.to_dict()
row.to_json()
row.fingerprint()

This is useful for:

  • debugging
  • auditing
  • reproducibility checks
  • downstream APIs or exports
  1. Model registry and validation scaffolding

The library includes validation infrastructure for comparing ORM models against external specifications.

This includes:

  • a model registry
  • table and field descriptors
  • validator contracts
  • a validation runner
  • structured validation reports Specifications can be loaded from CSV today, with support for other formats (e.g. LinkML) planned.
registry = ModelRegistry(model_version="1.0")
registry.load_table_specs(table_csv, field_csv)
registry.register_models([MyTable])

runner = ValidationRunner(validators=always_on_validators())
report = runner.run(registry)

Validation output is available as:

  • human-readable text
  • structured dictionaries
  • JSON (for CI/CD integration)
  • exit codes suitable for pipelines
  1. Database bootstrap helpers

The library provides lightweight helpers for schema creation and bootstrapping. It does not try to replace migrations.

from orm_loader.metadata import Base
from orm_loader.bootstrap import bootstrap

bootstrap(engine, create=True)
  1. Bulk-loading helpers

There are a few lower-level helpers for trusted bulk workflows, including backend-aware foreign key management and SQLite connection setup for heavy local loads.

Summary

This library is meant to be the boring layer underneath downstream models:

  • reusable ORM mixins
  • staged ingestion patterns
  • validation scaffolding
  • operational helpers

Domain rules, business logic, and schema semantics stay in the downstream project.

This makes it suitable as a shared foundation for:

  • clinical data models
  • research data marts
  • registry schemas
  • synthetic data pipelines

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orm_loader-0.5.0.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orm_loader-0.5.0-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file orm_loader-0.5.0.tar.gz.

File metadata

  • Download URL: orm_loader-0.5.0.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orm_loader-0.5.0.tar.gz
Algorithm Hash digest
SHA256 bb7790e91dba2fb26115d1260742e399d905a383bcbe7d1cf62ccfd2bc1b5c2c
MD5 09967d69b765fdffdcc1b35f7ddf18c5
BLAKE2b-256 2f5f83140f757da2a46c688678cd77202c751c71e47126a8a0a642326b0469a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for orm_loader-0.5.0.tar.gz:

Publisher: python-publish.yml on AustralianCancerDataNetwork/orm-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file orm_loader-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: orm_loader-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 55.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orm_loader-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2cecb11a86f5b46956e0f72eb4f855beff3c9f711e083ec386b34c4037cdb5fb
MD5 6fcad235b633f17d545c4f5b3d459a7f
BLAKE2b-256 c85e07a42f13a7b996cd4cf1ac1b82e3101dfcd1639995b3a266bafecc5115af

See more details on using hashes here.

Provenance

The following attestation bundles were made for orm_loader-0.5.0-py3-none-any.whl:

Publisher: python-publish.yml on AustralianCancerDataNetwork/orm-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page