Multi-omics data harmonisation for Python
Project description
omicsync
A Python library for multi-omics data harmonisation.
omicsync handles the tedious work of aligning sample IDs, normalising each modality consistently, and exporting to downstream tools so you can focus on biology, not data wrangling.
Installation
pip install omicsync
With optional extras:
pip install "omicsync[mofa]" # MOFA2 factor analysis
pip install "omicsync[geo]" # GEO data loading
pip install "omicsync[anndata]" # AnnData export
pip install "omicsync[torch]" # PyTorch tensor export
pip install "omicsync[all]" # Everything
Quick Start
import omicsync as oms
from omicsync.loaders.csv import load_multimodal_csv
# Load multiple modalities from CSV files
dataset = load_multimodal_csv({
"rna": "brca_rna.tsv",
"protein": "brca_rppa.tsv",
"cnv": "brca_cnv.tsv",
}, study_id="TCGA-BRCA")
# Align, normalise, filter — all chainable
dataset.align_samples().normalize().filter_features(min_variance=0.01)
# Export to DataFrame or MOFA2
df = dataset.to_dataframe() # samples × features, prefixed columns
mofa_input = dataset.to_mofa2() # dict ready for mofapy2 entry_point
Features
- Sample harmonisation — TCGA barcode parsing, fuzzy ID matching, coverage reporting
- Per-modality normalisation — auto-detection of count/TPM/M-value formats
- Chainable API —
dataset.align().normalize().filter_features() - sklearn compatibility — use
OmicsSyncTransformerin aPipeline - Multiple export formats — DataFrame, dict, MOFA2, PyTorch tensor, AnnData
- Open Targets integration — query target-disease associations via GraphQL
- Type hints throughout — fully typed public API
Supported Data Sources
| Source | Loader | Notes |
|---|---|---|
| TCGA | load_tcga_files() |
Local files; barcode auto-harmonisation |
| GEO | load_geo() |
Via GEOparse; requires omicsync[geo] |
| CSV/TSV | load_csv() |
Any tabular file |
| Open Targets | load_open_targets_targets() |
GraphQL API v4 |
Supported Modalities
| Modality | Class | Default Normalisation |
|---|---|---|
| RNA expression | RNAModality |
detect_and_normalise() (log1p) |
| DNA methylation | MethylationModality |
M→beta conversion + clip |
| Copy number | CNVModality |
log2 ratio, clipped [-2, 2] |
| Somatic mutations | MutationModality |
Binarise at threshold |
| Protein abundance | ProteinModality |
Z-score per protein |
Documentation
Citation
If you use omicsync in your research, please cite:
Paterson V. (2026). omicsync: A Python library for multi-omics data harmonisation. GitHub: github.com/vi-c-ky/omicsync
Contributing
Contributions are welcome. Please open an issue or pull request on GitHub.
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Write tests for new functionality
- Run the test suite (
pytest tests/) - Open a pull request
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omicsync-0.1.0.tar.gz.
File metadata
- Download URL: omicsync-0.1.0.tar.gz
- Upload date:
- Size: 39.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c825429e55b65d54ca7597ef89629fb129604820ef1c34ef9e3ebb02f661ab5
|
|
| MD5 |
b5f590897d7700e2abb50e503dbbd132
|
|
| BLAKE2b-256 |
3ded7d291fde77e67bc5a501d6e6ef333ff2970e975fcb020b9679e0aedbd812
|
File details
Details for the file omicsync-0.1.0-py3-none-any.whl.
File metadata
- Download URL: omicsync-0.1.0-py3-none-any.whl
- Upload date:
- Size: 42.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a7dc912fd7609891bdea793b0f098338498fcca755983f35252d2ed8732a665
|
|
| MD5 |
1cecec4fe0da1644f6c79ed960048c94
|
|
| BLAKE2b-256 |
4bc25810f0876b3cc9e85c9e3b6041efd196d8c07dbab07ad56520e5b8509743
|