Skip to main content

Convert Abusua Pedigree Studio session files to GA4GH Pedigree and Phenopackets.

Project description

abusua2ga4gh

Convert Abusua Pedigree Studio session files (.json) into GA4GH Pedigree Standard messages and GA4GH Phenopackets (schema v2), to make Abusua pedigrees interoperable with the wider genomics ecosystem.

Pure Python, no runtime dependencies, Python ≥ 3.8.


Why these output formats? (the overlap, explained)

The standards are complementary, and there are three valid serialisation forms. This package can emit all three; by default it produces the two recommended ones.

Form What it is When to use
Phenopackets Family (single file) One document holding the proband Phenopacket, relatives with findings, a native PED-style pedigree, and a consanguinousParents flag. The recommended single-file deliverable for family-based genomic diagnostics. Default.
GA4GH Pedigree Standard A relationship-centric graph (individuals + KIN-ontology relationships such as isBiologicalMotherOf). Interop with tools built on the GA4GH Pedigree Standard. Default.
Standalone Phenopacket per individual One file per clinically-relevant person. When a downstream tool ingests individual phenopackets. Optional.

A Phenopacket describes exactly one individual; the schema has no "list of phenopackets" container. To put a whole family in one file, the standard provides the Family message — that is the single-file form, and it embeds the pedigree plus the member phenopackets together.

Two different "pedigrees". The GA4GH Pedigree Standard (KIN relationship triples) and the Phenopackets-native Pedigree (PED-style Person rows, used inside Family) are different artifacts. This package builds the right one for each output: the KIN graph for the standalone GA4GH Pedigree, and the PED-style rows for Family.pedigree.

Default output

abusua2ga4gh session.json --out-dir out

writes two files:

  • session.family.json — the single-file Phenopackets Family
  • session.ga4gh-pedigree.json — the GA4GH Pedigree Standard message

How the Abusua dual-layer model maps across

Abusua deliberately stores biological parentage separately from social parentage, with a paternity-certainty flag between them. The converters honour that split — this is the most important behaviour to understand:

  • bioMotherIdKIN:027 isBiologicalMotherOf (the mogya line; always emitted).
  • bioFatherIdKIN:028 isBiologicalFatherOf, but only when paternity is confirmed or reported. A social-only or unknown biological father produces no biological edge — the genetics must never see a guessed link. reported paternity is emitted but flagged in the warnings and annotated on the edge.
  • fosteredIn with socialMotherId / socialFatherIdKIN:022 isAdoptiveParentOf (the closest standard term for a social/foster parent), emitted as a separate edge so social and biological structure never get conflated. Use --no-social-edges for a strictly genetics-facing graph.

Every suppression or assumption is reported in the conversion warnings, never done silently.


Install

pip install -e .          # from this directory
# or, once published:
pip install abusua2ga4gh

Command line

# Default: single-file Family + GA4GH Pedigree, into ./out
abusua2ga4gh session.json --out-dir out

# Just the single-file Phenopackets Family
abusua2ga4gh session.json --format family

# Standalone per-individual Phenopackets (one file each)
abusua2ga4gh session.json --format phenopackets

# Every form
abusua2ga4gh session.json --format all

# GA4GH Pedigree only, biological edges only (strict genetics graph)
abusua2ga4gh session.json --format pedigree --no-social-edges

# Pick the proband explicitly for the Family
abusua2ga4gh session.json --format family --proband i6

By default, personal names are treated as PII and omitted from output; pass --include-names to include them (stored as alternate_ids).

Python API

from abusua2ga4gh import (
    Pedigree, to_family, to_ga4gh_pedigree, to_phenopackets,
)

ped = Pedigree.load("example-sickle-cell.json")

# Recommended: single-file Family (proband + relatives + native pedigree)
family, warns = to_family(ped, proband_id=None)   # auto-picks the marked proband

# GA4GH Pedigree Standard (KIN-relationship graph)
pedigree_msg, warns2 = to_ga4gh_pedigree(ped, include_social_edges=True)

# Optional: standalone per-individual Phenopackets
packets, warns3 = to_phenopackets(ped, affected_only=True)

Important limitation: condition terms

Abusua stores conditions as free text (e.g. "Sickle cell anaemia"). Phenopackets and the Pedigree disease terms expect ontology identifiers (MONDO/OMIM for diseases, HPO for phenotypes).

  • A small built-in lookup resolves the conditions used in the bundled examples to MONDO terms.
  • Any other condition is emitted with its free-text label and an empty term id, plus a warning. A curator (or a downstream term-mapping step) must supply the correct ontology id before the output is analysis-grade. The converter never guesses an ontology id from free text.

Carrier status is exported as the phenotypic feature HP:0032500 (Heterozygous carrier); verify this is the intended term for your use.


Validating the output

These converters produce JSON that follows the documented structure of each standard, and the test suite checks structural integrity (every relationship resolves, required Phenopacket fields present, correct KIN terms, etc.). For formal schema validation against the official definitions, run the output through:

  • the GA4GH Pedigree validator / pedigree-tools (see the standard's Tooling page), and
  • phenopacket-tools for Phenopacket v2 validation.

We recommend wiring those into CI once you adopt the package.

Tests

pytest          # 36 tests over the five bundled example sessions

Layout

src/abusua2ga4gh/
  model.py            # load & validate Abusua sessions (typed view, dual-layer fields)
  kin.py              # Kinship Ontology term constants
  ga4gh_pedigree.py   # -> GA4GH Pedigree Standard message (KIN relationships)
  phenopackets.py     # -> standalone Phenopackets v2 (per clinically-relevant individual)
  family.py           # -> single-file Phenopackets Family (+ native PED-style pedigree)
  cli.py              # command-line interface
examples/             # the five disease example sessions
tests/                # pytest suite

References

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abusua2ga4gh-0.1.0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abusua2ga4gh-0.1.0-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file abusua2ga4gh-0.1.0.tar.gz.

File metadata

  • Download URL: abusua2ga4gh-0.1.0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for abusua2ga4gh-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bc848be7c373f15f995db2058ef5be349e8a7ae7fa5d82411a5cece67ecbfa02
MD5 a68c8d3778861836ad277f18a222b332
BLAKE2b-256 e3641c2f5ffcb9d2b5aae837dc16faf2d59732687fedff300ad73ab98de336cb

See more details on using hashes here.

File details

Details for the file abusua2ga4gh-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: abusua2ga4gh-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for abusua2ga4gh-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c2a771cc95456765438bef818af0ec900a78e2d79be7a6af3175c6c1a094bfa
MD5 cda9b80844e42bc8a4abcdf7a081cce8
BLAKE2b-256 78358b29428133df073948fdd1c878c6fcb6f9cebac0134f9420787aab217cbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page