Skip to main content

Generic base classes to handle ORM functionality for multiple downstream datamodels

Project description

orm-loader

Tests

A lightweight foundation for building and validating SQLAlchemy-based data models.

orm-loader sits below any particular schema or CDM. It gives you a small set of reusable pieces for defining tables, loading files through staging tables, and checking models against external specifications. It stays out of domain logic on purpose.

The library focuses on:

  • ORM table mixins and introspection
  • staged file loading
  • loader and validation infrastructure
  • operational helpers that work across supported backends

At the moment, the built-in backends are SQLite and PostgreSQL.

What this library provides

The package is deliberately small. Most downstream projects only need a couple of these pieces.

  1. A minimal ORM table base

ORMTableBase provides structural utilities for mapped tables without pulling domain rules into the base layer.

It supports:

  • mapper access and inspection
  • primary key discovery
  • required (non-nullable) column detection
  • consistent primary key handling across models
  • simple ID allocation helpers for sequence-less databases
from orm_loader.tables import ORMTableBase

class MyTable(ORMTableBase, Base):
    __tablename__ = "my_table"

You can inherit from it directly or pick it up through one of the higher-level mixins.

  1. CSV-based ingestion mixins

CSVLoadableTableInterface adds staged file loading to ORM tables. It can use pandas or PyArrow loaders, and on PostgreSQL it can use a fast COPY path when the input is clean enough.

Features include:

  • staging table creation and cleanup
  • chunked loading for large files
  • optional casting and deduplication before insert
  • backend-specific merge behaviour
  • PostgreSQL fast-path loading with ORM fallback
  • backend-aware index handling during merge
class MyTable(CSVLoadableTableInterface, ORMTableBase, Base):
    __tablename__ = "my_table"

The main extension points here are loader choice, column mapping, and the normal SQLAlchemy model definitions themselves. Most downstream projects do not need to override much beyond csv_columns() and the model schema.

  1. Structured serialisation and hashing

SerialisableTableInterface adds lightweight serialisation helpers for ORM rows.

It supports:

  • conversion to dictionaries
  • JSON serialisation
  • stable row-level fingerprints
  • iterator-style access to field/value pairs
row = session.get(MyTable, 1)
row.to_dict()
row.to_json()
row.fingerprint()

This is useful for:

  • debugging
  • auditing
  • reproducibility checks
  • downstream APIs or exports
  1. Model registry and validation scaffolding

The library includes validation infrastructure for comparing ORM models against external specifications.

This includes:

  • a model registry
  • table and field descriptors
  • validator contracts
  • a validation runner
  • structured validation reports Specifications can be loaded from CSV today, with support for other formats (e.g. LinkML) planned.
registry = ModelRegistry(model_version="1.0")
registry.load_table_specs(table_csv, field_csv)
registry.register_models([MyTable])

runner = ValidationRunner(validators=always_on_validators())
report = runner.run(registry)

Validation output is available as:

  • human-readable text
  • structured dictionaries
  • JSON (for CI/CD integration)
  • exit codes suitable for pipelines
  1. Database bootstrap helpers

The library provides lightweight helpers for schema creation and bootstrapping. It does not try to replace migrations.

from orm_loader.metadata import Base
from orm_loader.bootstrap import bootstrap

bootstrap(engine, create=True)
  1. Bulk-loading helpers

There are a few lower-level helpers for trusted bulk workflows, including backend-aware foreign key management and SQLite connection setup for heavy local loads.

Summary

This library is meant to be the boring layer underneath downstream models:

  • reusable ORM mixins
  • staged ingestion patterns
  • validation scaffolding
  • operational helpers

Domain rules, business logic, and schema semantics stay in the downstream project.

This makes it suitable as a shared foundation for:

  • clinical data models
  • research data marts
  • registry schemas
  • synthetic data pipelines

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orm_loader-0.4.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orm_loader-0.4.0-py3-none-any.whl (53.4 kB view details)

Uploaded Python 3

File details

Details for the file orm_loader-0.4.0.tar.gz.

File metadata

  • Download URL: orm_loader-0.4.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orm_loader-0.4.0.tar.gz
Algorithm Hash digest
SHA256 08e0e260e02d42859d3e91e064c6118e845e178909cf5e38ccb185a37ac205a5
MD5 a63bda4f1336e4fb446875990f65ea2d
BLAKE2b-256 9f6fcd7787ccacb6742d6c204c9b6322e2b2447616ca5f97ed98878d6d4d8920

See more details on using hashes here.

Provenance

The following attestation bundles were made for orm_loader-0.4.0.tar.gz:

Publisher: python-publish.yml on AustralianCancerDataNetwork/orm-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file orm_loader-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: orm_loader-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 53.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for orm_loader-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e4680d415f264304e7fdc597303c0e71320ed18444fd7bd04bdd22939f9e780
MD5 9b5bd1e68fe23ab318b869c8198cb334
BLAKE2b-256 b00ae014ee74e829378c54acb29ebf84fdd797d43c517072aad228a6d1f0ea2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for orm_loader-0.4.0-py3-none-any.whl:

Publisher: python-publish.yml on AustralianCancerDataNetwork/orm-loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page