Generic base classes to handle ORM functionality for multiple downstream datamodels
Project description
orm-loader
A lightweight, reusable foundation for building and validating SQLAlchemy-based clinical (and non-clinical) data models.
This library provides general-purpose ORM infrastructure that sits below any specific data model (OMOP, PCORnet, custom CDMs, etc.), focusing on:
- declarative base configuration
- bulk ingestion patterns
- file-based validation & loading
- table introspection
- model-agnostic validation scaffolding
- safe, database-portable operational helpers
It intentionally contains no domain logic and no assumptions about a specific schema.
What this library provides:
This library provides a small set of composable building blocks for defining, loading, inspecting, and validating SQLAlchemy-based data models. All components are model-agnostic and can be selectively combined in downstream libraries.
- A minimal, opinionated ORM table base
ORMTableBase provides structural introspection utilities for SQLAlchemy-mapped tables, without imposing any domain semantics.
It supports:
- mapper access and inspection
- primary key discovery
- required (non-nullable) column detection
- consistent primary key handling across models
- simple ID allocation helpers for sequence-less databases
from orm_loader.tables import ORMTableBase
class MyTable(ORMTableBase, Base):
__tablename__ = "my_table"
This base is intended to be inherited by all ORM tables, either directly or via higher-level mixins.
- CSV-based ingestion mixins
CSVLoadableTableInterface adds opt-in CSV loading support for ORM tables using pandas, with a focus on correctness and scalability.
Features include:
- chunked loading for large files
- optional per-table normalisation logic
- optional deduplication against existing database rows
- safe bulk inserts using SQLAlchemy sessions
class MyTable(CSVLoadableTableInterface, ORMTableBase, Base):
__tablename__ = "my_table"
Downstream models may override:
- normalise_dataframe(...)
- dedupe_dataframe(...)
- csv_columns() to implement table-specific ingestion policies.
- Structured serialisation and hashing
SerialisableTableInterface adds lightweight, explicit serialisation helpers for ORM rows.
It supports:
- conversion to dictionaries
- JSON serialisation
- stable row-level fingerprints
- iterator-style access to field/value pairs
row = session.get(MyTable, 1)
row.to_dict()
row.to_json()
row.fingerprint()
This is useful for:
- debugging
- auditing
- reproducibility checks
- downstream APIs or exports
- Model registry and validation scaffolding
The library includes model-agnostic validation infrastructure, designed to compare ORM models against external specifications.
This includes:
- a model registry
- table and field descriptors
- validator contracts
- a validation runner
- structured validation reports Specifications can be loaded from CSV today, with support for other formats (e.g. LinkML) planned.
registry = ModelRegistry(model_version="1.0")
registry.load_table_specs(table_csv, field_csv)
registry.register_models([MyTable])
runner = ValidationRunner(validators=always_on_validators())
report = runner.run(registry)
Validation output is available as:
- human-readable text
- structured dictionaries
- JSON (for CI/CD integration)
- exit codes suitable for pipelines
- Database bootstrap helpers The library provides lightweight helpers for schema creation and bootstrapping, without imposing a migration strategy.
from orm_loader.metadata import Base
from orm_loader.bootstrap import bootstrap
bootstrap(engine, create=True)
- Safe bulk-loading utilities
A reusable context manager simplifies trusted bulk ingestion workflows:
- temporarily disables foreign key checks where supported
- suppresses autoflush for performance
- ensures reliable rollback on failure
Summary
This library intentionally focuses on infrastructure, not semantics.
It provides:
- reusable ORM mixins
- safe ingestion patterns
- validation scaffolding
- database-portable utilities
while leaving domain rules, business logic, and schema semantics to downstream libraries.
This makes it suitable as a shared foundation for:
- clinical data models
- research data marts
- registry schemas
- synthetic data pipelines
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orm_loader-0.1.3.tar.gz.
File metadata
- Download URL: orm_loader-0.1.3.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9638d5d32ac3dcba47381a72238e0b1e2267d958f15ddde4ceb867ab1b94715
|
|
| MD5 |
401612e4418f8c326e664de6e3fdbaef
|
|
| BLAKE2b-256 |
1d2d744de462abab574b423128afcbded07a173e27bf68a5beb0101871fb84eb
|
Provenance
The following attestation bundles were made for orm_loader-0.1.3.tar.gz:
Publisher:
python-publish.yml on AustralianCancerDataNetwork/orm-loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orm_loader-0.1.3.tar.gz -
Subject digest:
f9638d5d32ac3dcba47381a72238e0b1e2267d958f15ddde4ceb867ab1b94715 - Sigstore transparency entry: 829413915
- Sigstore integration time:
-
Permalink:
AustralianCancerDataNetwork/orm-loader@c47821008f94a7032a0553de78aaaf8eb61791a9 -
Branch / Tag:
refs/tags/0.1.3 - Owner: https://github.com/AustralianCancerDataNetwork
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@c47821008f94a7032a0553de78aaaf8eb61791a9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file orm_loader-0.1.3-py3-none-any.whl.
File metadata
- Download URL: orm_loader-0.1.3-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5750aa1496ee20dd89c76a0339b448348cf47b0038aac55a6629c4677abf027b
|
|
| MD5 |
014287030dcd1d7847cb3e0cccb55d81
|
|
| BLAKE2b-256 |
7b76ccd722ce7146ae2d478bd47b08fffb47f86fd095c1bf938146e1f7578206
|
Provenance
The following attestation bundles were made for orm_loader-0.1.3-py3-none-any.whl:
Publisher:
python-publish.yml on AustralianCancerDataNetwork/orm-loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orm_loader-0.1.3-py3-none-any.whl -
Subject digest:
5750aa1496ee20dd89c76a0339b448348cf47b0038aac55a6629c4677abf027b - Sigstore transparency entry: 829413919
- Sigstore integration time:
-
Permalink:
AustralianCancerDataNetwork/orm-loader@c47821008f94a7032a0553de78aaaf8eb61791a9 -
Branch / Tag:
refs/tags/0.1.3 - Owner: https://github.com/AustralianCancerDataNetwork
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@c47821008f94a7032a0553de78aaaf8eb61791a9 -
Trigger Event:
release
-
Statement type: