Skip to main content

SMOC Multi-Omics Digital Object System API

Project description

modos-api

API system for using and serving Multi-Omics Digital Objects (MODOs).

Context

Motivation

Provide a digital object and system to process, store and serve multi-omics data with their metadata such that:

  • Traceability and reproducibility is ensured by rich metadata
  • The different omics layers are processed and distributed together
  • Common operations such as liftover can be automated easily and ensure that omics layers are kept in sync

Architecture

The digital object is composed of a folder with:

  • Genomic data files (CRAM, FASTA)
  • A zarr archive for metadata and array-based database

The metadata links the different files using the modos-schema.

Installation

The development version of the library can be installed from github using pip:

pip install git+https://github.com/sdsc-ordes/modos-api.git@main

Usage

The user facing API is in modos.api. It allows to interact with existing digital objects:

from modos.api import MODO

ex = MODO('./example-digital-object')
ex.list_files()
ex.list_samples()

Development

The development environment can be set up as follows:

git clone https://github.com/sdsc-ordes/modos-api && cd modos-api
make install

This will install dependencies and create the python virtual environment using poetry and setup pre-commit hooks with pre-commit.

The tests can be run with make test, it will execute pytest with the doctest module.

Implementation details

  • To allow faster horizontal traversal of digital objects in the catalogue (e.g. for listing), the metadata should be exported in a central database/knowledge-graph on the server side.
  • Metadata can be either embedded in the array file, or stored in a separate file
  • Each digital object needs a unique identifier
  • The paths of individual files in the digital object must be referenced in a consistent way.
    • Absolute paths are a no-go (machine/system dependent)
    • Relative paths in the digital object could work, but need to be OS-independent

Status and limitations

  • Focusing on data retrieval, remote object creation not yet implemented
  • The htsget protocol supports streaming CRAM files, but it is currently only implemented for BAM in major genome browsers (igv.js, jbrowse)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modos-0.1.0.tar.gz (21.0 kB view hashes)

Uploaded Source

Built Distribution

modos-0.1.0-py3-none-any.whl (24.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page