Skip to main content

Multi-omics data harmonisation for Python

Project description

omicsync

License: MIT Python 3.9+ PyPI version

A Python library for multi-omics data harmonisation.

omicsync handles the tedious work of aligning sample IDs, normalising each modality consistently, and exporting to downstream tools so you can focus on biology, not data wrangling.


Installation

pip install omicsync

With optional extras:

pip install "omicsync[mofa]"       # MOFA2 factor analysis
pip install "omicsync[geo]"        # GEO data loading
pip install "omicsync[anndata]"    # AnnData export
pip install "omicsync[torch]"      # PyTorch tensor export
pip install "omicsync[all]"        # Everything

Quick Start

import omicsync as oms
from omicsync.loaders.csv import load_multimodal_csv

# Load multiple modalities from CSV files
dataset = load_multimodal_csv({
    "rna":     "brca_rna.tsv",
    "protein": "brca_rppa.tsv",
    "cnv":     "brca_cnv.tsv",
}, study_id="TCGA-BRCA")

# Align, normalise, filter — all chainable
dataset.align_samples().normalize().filter_features(min_variance=0.01)

# Export to DataFrame or MOFA2
df = dataset.to_dataframe()          # samples × features, prefixed columns
mofa_input = dataset.to_mofa2()      # dict ready for mofapy2 entry_point

Features

  • Sample harmonisation — TCGA barcode parsing, fuzzy ID matching, coverage reporting
  • Per-modality normalisation — auto-detection of count/TPM/M-value formats
  • Chainable APIdataset.align().normalize().filter_features()
  • sklearn compatibility — use OmicsSyncTransformer in a Pipeline
  • Multiple export formats — DataFrame, dict, MOFA2, PyTorch tensor, AnnData
  • Open Targets integration — query target-disease associations via GraphQL
  • Type hints throughout — fully typed public API

Supported Data Sources

Source Loader Notes
TCGA load_tcga_files() Local files; barcode auto-harmonisation
GEO load_geo() Via GEOparse; requires omicsync[geo]
CSV/TSV load_csv() Any tabular file
Open Targets load_open_targets_targets() GraphQL API v4

Supported Modalities

Modality Class Default Normalisation
RNA expression RNAModality detect_and_normalise() (log1p)
DNA methylation MethylationModality M→beta conversion + clip
Copy number CNVModality log2 ratio, clipped [-2, 2]
Somatic mutations MutationModality Binarise at threshold
Protein abundance ProteinModality Z-score per protein

Documentation


Citation

If you use omicsync in your research, please cite:

Paterson V. (2026). omicsync: A Python library for multi-omics data harmonisation. GitHub: github.com/vi-c-ky/omicsync


Contributing

Contributions are welcome. Please open an issue or pull request on GitHub.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Write tests for new functionality
  4. Run the test suite (pytest tests/)
  5. Open a pull request

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omicsync-0.1.0.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omicsync-0.1.0-py3-none-any.whl (42.2 kB view details)

Uploaded Python 3

File details

Details for the file omicsync-0.1.0.tar.gz.

File metadata

  • Download URL: omicsync-0.1.0.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for omicsync-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9c825429e55b65d54ca7597ef89629fb129604820ef1c34ef9e3ebb02f661ab5
MD5 b5f590897d7700e2abb50e503dbbd132
BLAKE2b-256 3ded7d291fde77e67bc5a501d6e6ef333ff2970e975fcb020b9679e0aedbd812

See more details on using hashes here.

File details

Details for the file omicsync-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: omicsync-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for omicsync-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a7dc912fd7609891bdea793b0f098338498fcca755983f35252d2ed8732a665
MD5 1cecec4fe0da1644f6c79ed960048c94
BLAKE2b-256 4bc25810f0876b3cc9e85c9e3b6041efd196d8c07dbab07ad56520e5b8509743

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page