Skip to main content

Interface code to interact with data from the Ovara.net biobank.

Project description

marburg_biobank

Introduction

The marburg_biobank python module offers a high level interface to the data sets stored in the [Ovarian Cancer Effusion Biobank and Database])(https://www.ovara.net/biobank).

The basic usage is as follows:

import marburg_biobank
db = marburg_biobank.OvcaBiobank("marburg_ovca_revision_15.zip") #  you need to download that file from your biobank.
print(db.list_datasets())
df_wide = db.get_wide('transcriptomics/rnaseq')  # to retrieve the data in a one sample per column / one row per measured variable format
df_tall = db.get_dataset('transcriptomics/rnaseq') # to retrieve the data in one row per data point format

Data formats available

wide

Using db.get_wide(dataset):

A pandas DataFrame that looks like this

Index Patient12, TAM Patient12, TU PatientX, Compartment
VariableA, unitA 23.23 112.2 nan
VariableB, unitB 3.23 12.2 12.7

Caveats: If a dataset has only one compartment, the compartment information is ommited by get_wide(), unless .get_wide(standardized=True) is used. The same applies for the unit in the index. If there is a 'name' column in dataset, it get's added to the index, regardless of the value of standardized.

tall

Using: db.get_dataset(dataset)):

A pandas DataFrame that looks like this

variable unit patient compartment value optional columns...
variableA unitA Patient12 TAM 23.23
variableA unitA Patient12 TU 112.2
variableB unitB Patient13 TAM 3.23
variableB unitB Patient13 TU 12.2

This is the internal storage format.

compartments

Compartments are an abstraction on top of 'cells' and 'bio-liquid'. Examples are Tumor associated macrophages (TAMs), Tumor cells (TU), ascites, blood... db.get_compartments() provides a list

Datasets

Datasets are organized three levels deep. The first one defines the whether you're looking t ex-vivo (=primary) data or in-vitro experiments (=secondary) or literature data (=tertiary). The second level defines *omics being measured (transcriptomics, proteomics, ... or 'clinical'), while the third levels defines the actual method (RNaseq, FACS,...)

Survival data is in primary/clinical/survival.

Please remember: if using https://pypi.python.org/pypi/lifelines, censored and event are negations of each other.

Excluded patients:

Exclusion can either be on a patient, or a patient+compartment level. In addition, there is per dataset exclusion and global exclusion.

Exclusion is by default applied to db.get_wide(), but not to db.get_dataset(), you can change the default by passing apply_exclusion=True|False.

Exclusion information can be retrieved by db.get_excluded_patients(dataset), which return a set of patients (or patient+compartment tuples), or db.get_exclusion_reasons(), which lists why the exclusion happend.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

marburg_biobank-0.156-py2.py3-none-any.whl (52.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file marburg_biobank-0.156-py2.py3-none-any.whl.

File metadata

  • Download URL: marburg_biobank-0.156-py2.py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.50.1 CPython/3.8.10

File hashes

Hashes for marburg_biobank-0.156-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b664df7170aff0955e7584b6fec4c2d18a9308044918fb87e60faf210d9584d2
MD5 f3cc9865e35c1b59184fb1d1827ae451
BLAKE2b-256 d1a4031415d44b48973922402e454d5f46e099e7f3116cd1379e2ca891b452e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page