Skip to main content

Portfolio-to-drug object schema for the Refua drug discovery ecosystem.

Project description

refua-schema

refua-schema is a portfolio-centric object model for the Refua ecosystem. It starts with Portfolio, nests Disease, then Rationale, then Drug, and reuses canonical scientific and workflow objects from sibling Refua packages instead of redefining parallel versions.

What it provides

  • A top-level object hierarchy for discovery portfolios: Portfolio -> Disease -> Rationale -> Drug.
  • Reuse of core refua entities such as Protein, SmallMolecule, Binder, Complex, DNA, and RNA.
  • Reuse of downstream workflow objects from:
    • refua-clinical for SimulationConfig and TrialSimulationResult
    • refua-preclinical for PreclinicalStudySpec
    • refua-regulatory for bundle and provenance records
  • Domain metadata objects that fit drug discovery naturally: Evidence, Biomarker, Assay, Modality, AdmetProfile, and ClinicalTrial.
  • JSON/YAML round-tripping for portfolios that preserve nested Refua object types.

Install

cd refua-schema
pip install -e .

With development tooling:

poetry install -E dev

With optional SQLModel persistence support:

poetry install -E sqlmodel

With both:

poetry install -E dev -E sqlmodel

Quickstart

from pathlib import Path

from refua import Complex, Protein, SmallMolecule
from refua_clinical.models import default_simulation_config
from refua_preclinical.models import default_study_spec
from refua_schema import (
    AdmetProfile,
    Assay,
    Disease,
    Drug,
    Modality,
    Portfolio,
    Rationale,
)

egfr = Protein(sequence="LEEKKGNYVVTDHAFV...", ids="A")
lead = SmallMolecule.from_smiles("CCOc1ccc(NC(=O)N2CCN(C)CC2)cc1", name="lead-1")
binding_model = Complex([egfr, lead], name="egfr-lead-1")

trial = default_simulation_config()
trial.trial_id = "egfr-phase2"
trial.indication = "Non-small cell lung cancer"

tox = default_study_spec()
tox.study_id = "egfr-28d-tox"
tox.indication = "Oncology"

portfolio = Portfolio(portfolio_id="solid-tumors", name="Solid Tumor Portfolio")
portfolio.add_disease(
    Disease(
        disease_id="nsclc",
        name="Non-small cell lung cancer",
        rationales=[
            Rationale(
                rationale_id="egfr-driver",
                title="EGFR oncogenic signaling",
                hypothesis="EGFR-driven tumors remain vulnerable to selective kinase blockade.",
                proteins=[egfr],
                refua_objects=[binding_model],
                drugs=[
                    Drug(
                        drug_id="lead-1",
                        name="Lead 1",
                        modality=Modality(name="oral small molecule", kind="small_molecule", route="oral"),
                        mechanism_of_action="Selective EGFR inhibition",
                        structures=[lead],
                        admet_profiles=[
                            AdmetProfile(
                                profile_id="lead-1-admet",
                                source="txgemma",
                                smiles="CCOc1ccc(NC(=O)N2CCN(C)CC2)cc1",
                                summary_scores={"admet_score": 0.72, "safety_score": 0.68},
                                endpoint_scores={"score_hERG": 0.61, "score_DILI": 0.70},
                            )
                        ],
                        assays=[
                            Assay(
                                assay_id="egfr-biochem",
                                name="EGFR biochemical potency",
                                assay_type="biochemical",
                                endpoint="IC50",
                                result_value=14.2,
                                unit="nM",
                            )
                        ],
                        preclinical_studies=[tox],
                        clinical_trials=[],
                    )
                ],
            )
        ],
    )
)

portfolio.save(Path("artifacts/portfolio.yaml"))
round_tripped = Portfolio.load(Path("artifacts/portfolio.yaml"))
assert round_tripped.diseases[0].rationales[0].drugs[0].name == "Lead 1"

SQLModel persistence

refua-schema includes an optional SQLModel adapter in refua_schema.sqlmodel_support. It is intentionally thin:

  • The canonical source of truth remains the Portfolio Pydantic model.
  • SQL storage keeps one full serialized portfolio payload.
  • Lightweight index tables for diseases, rationales, and drugs make the hierarchy queryable without reproducing the entire schema as ORM columns.

Example:

from pathlib import Path

from refua_schema import Disease, Drug, Portfolio, Rationale
from refua_schema.sqlmodel_support import PortfolioStore

portfolio = Portfolio(
    portfolio_id="pf-sql",
    name="SQL Portfolio",
    diseases=[
        Disease(
            disease_id="dis-sql",
            name="SQL Disease",
            rationales=[
                Rationale(
                    rationale_id="rat-sql",
                    title="SQL Rationale",
                    hypothesis="Persist canonical portfolio payloads with a thin relational index.",
                    drugs=[
                        Drug.from_smiles(
                            drug_id="drug-sql",
                            name="SQL Drug",
                            smiles="CCO",
                        )
                    ],
                )
            ],
        )
    ],
)

store = PortfolioStore.sqlite(Path("artifacts/portfolio.sqlite"))
store.create_schema()
store.save(portfolio)

reloaded = store.load("pf-sql")
assert reloaded.to_dict() == portfolio.to_dict()
assert len(store.list_drugs(portfolio_id="pf-sql")) == 1

Validation

  • All core schema models use Pydantic validation with field descriptions.
  • Assignment validation stays enabled after object creation.
  • The package round-trips JSON/YAML payloads while preserving canonical nested Refua object types where supported by the serializer.

Release checks

Typical first-release verification flow:

poetry check
poetry install -E dev -E sqlmodel
poetry run ruff check src tests
poetry run pytest
poetry build

Design notes

  • Protein, SmallMolecule, and other structural entities stay in refua.
  • Clinical simulation config/results stay in refua-clinical.
  • Preclinical study specs stay in refua-preclinical.
  • Audit and provenance records stay in refua-regulatory.
  • refua-schema owns the portfolio-level composition layer that links them together.
  • Optional SQL persistence stays in refua_schema.sqlmodel_support so the core schema does not take a hard dependency on SQLModel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refua_schema-0.7.2.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refua_schema-0.7.2-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file refua_schema-0.7.2.tar.gz.

File metadata

  • Download URL: refua_schema-0.7.2.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.14.4 Darwin/25.4.0

File hashes

Hashes for refua_schema-0.7.2.tar.gz
Algorithm Hash digest
SHA256 842818002c76e8097fae0c8335f40fe2a16b345363efa376f75e2ab9e38cbed8
MD5 22427f1a3fd38117c1e4f40d9259782b
BLAKE2b-256 3fd23bda9cbf11750bb61e6ba049690e4ea1383e2c81ee63948504c404e32423

See more details on using hashes here.

File details

Details for the file refua_schema-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: refua_schema-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.14.4 Darwin/25.4.0

File hashes

Hashes for refua_schema-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d2e053f87753bb5a415058a68d2ef757cbfc8833d53494fe02c51f141cedb933
MD5 ada469a6ff950481a4a903a20471ccbe
BLAKE2b-256 aaa2614f4c4d2eb4e2e80f29d5fa609a44632d3212023146975648e150aca4dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page