Skip to main content

SMOC Multi-Omics Digital Object System API

Project description

modos logo

Current Release label Test Status label Documentation website License label

modos-api

Access and manage Multi-Omics Digital Objects (MODOs).

Context

Goals

Provide a digital object and system to process, store and serve multi-omics data with their metadata such that:

  • Traceability and reproducibility is ensured by rich metadata
  • The different omics layers are processed and distributed together
  • Common operations such as liftover can be automated easily and ensure that omics layers are kept in sync
  • Data can be accessed, sliced and streamed over the network without downloading the dataset.

Architecture

The client library by itself can be used to work with local MODOs, or connect to a server to access objects over s3.

The server configuration and setup insructions can be found in deploy. It consists of a REST API, an s3 server and an htsget server to stream CRAM/BCF over the network. The aim is to provide transparent remote access to MODOs without storing the data locally.

Format

The digital object is composed of a folder with:

  • Genomic data files (CRAM, BCF, ...)
  • A zarr archive for metadata and array-based data

The metadata links to the different files and provides context using the modos-schema.

Installation

The library can be installed with pip:

pip install modos

The development version can be installed directly from github:

pip install git+https://github.com/sdsc-ordes/modos-api.git@main

Usage

The CLI is convenient for quickly managing modos (creation, edition, deletion) and quick inspections:

$ modos show  -s3 https://s3.example.org --zarr ex-bucket/ex-modo
/
 ├── assay
    └── assay1
 ├── data
    ├── calls1
    └── demo1
 ├── reference
    └── reference1
 └── sample
     └── sample1

$ modos show --files data/ex
data/ex/reference1.fa.fai
data/ex/demo1.cram
data/ex/reference1.fa
data/ex/calls1.bcf
data/ex/demo1.cram.crai
data/ex/calls1.bcf.csi

The user facing API is in modos.api. It provides full programmatic access to the object's [meta]data:

>>> from modos.api import MODO

>>> ex = MODO('./example-digital-object')
>>> ex.list_samples()
['sample/sample1']
>>> ex.metadata["data/calls1"]
{'@type': 'DataEntity',
 'data_format': 'BCF',
 'data_path': 'calls1.bcf',
 'description': 'variant calls for tests',
 'has_reference': ['reference/reference1'],
 'has_sample': ['sample/sample1'],
 'name': 'Calls 1'}
>>> rec = next(ex.stream_genomics("calls1.bcf", "chr1:103-1321"))
>>> rec.alleles
('A', 'C')

For advanced use cases, the object's metadata can be queried with SPARQL!

>>> # Build a table with all files from male samples
>>> ex.query("""
...   SELECT ?assay ?sample ?file
...   WHERE {
...     [] schema:name ?assay ;
...       modos:has_data [
...         modos:data_path ?file ;
...         modos:has_sample [
...           schema:name ?sample ;
...           modos:sex ?sex .
...         ]
...       ] .
...     FILTER(?sex = "Male")
...   }
... """).serialize(format="csv").decode())
assay,sample,file
Assay 1,Sample 1,file://ex/calls1.bcf
Assay 1,Sample 1,file://ex/demo1.cram

Contributing

First, read the Contribution Guidelines.

For technical documentation on setup and development, see the Development Guide

Acknowledgements and Funding

The development of the Multi-Omics Digital Object System (MODOS) is being funded by the Personalized Health Data Analysis Hub, a joint initiative of the Personalized Health and Related Technologies (PHRT) and the Swiss Data Science Center (SDSC), for a period of three years from 2023 to 2025. The SDSC leads the development of MODOS, bringing expertise in complex data structures associated with multi-omics and imaging data to advance privacy-centric clinical-grade integration. The PHRT contributes its domain expertise of the Swiss Multi-Omics Center (SMOC) in the generation, analysis, and interpretation of multi-omics data for personalized health and precision medicine applications. We gratefully acknowledge the Health 2030 Genome Center for their substantial contributions to the development of MODOS by providing test data sets, deployment infrastructure, and expertise.

Copyright

Copyright © 2023-2024 Swiss Data Science Center (SDSC), www.datascience.ch. All rights reserved. The SDSC is jointly established and legally represented by the École Polytechnique Fédérale de Lausanne (EPFL) and the Eidgenössische Technische Hochschule Zürich (ETH Zürich). This copyright encompasses all materials, software, documentation, and other content created and developed by the SDSC in the context of the Personalized Health Data Analysis Hub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modos-0.2.2.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

modos-0.2.2-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file modos-0.2.2.tar.gz.

File metadata

  • Download URL: modos-0.2.2.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for modos-0.2.2.tar.gz
Algorithm Hash digest
SHA256 5ba8df5f3af1d388e93211292a017fb85bda7b1ba18b87aa3bbdf37fb51949f8
MD5 dd8e2be15602c131caa3f4438e32bf63
BLAKE2b-256 60dcad54dbf1d025a70d6a262d389b9d883d55f951ecea9d228563a9b41c5857

See more details on using hashes here.

File details

Details for the file modos-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: modos-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for modos-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 702a686fea5e41135267b151708327775b5903859f235b4e827c23103ce97146
MD5 da662f1ae14c755d7912b6a923f039dc
BLAKE2b-256 98f820eed3408e1e38a89949efd312540e0086b303610298b55aa258370a1331

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page