Interface code to interact with data from the Ovara.net biobank.
Project description
marburg_biobank
Introduction
The marburg_biobank python module offers a high level interface to the data sets stored in the [Ovarian Cancer Effusion Biobank and Database])(https://www.ovara.net/biobank).
The basic usage is as follows:
import marburg_biobank
db = marburg_biobank.OvcaBiobank("marburg_ovca_revision_15.zip") # you need to download that file from your biobank.
print(db.list_datasets())
df_wide = db.get_wide('transcriptomics/rnaseq') # to retrieve the data in a one sample per column / one row per measured variable format
df_tall = db.get_dataset('transcriptomics/rnaseq') # to retrieve the data in one row per data point format
Data formats available
wide
Using db.get_wide(dataset)
:
A pandas DataFrame that looks like this
Index | Patient12, TAM | Patient12, TU | PatientX, Compartment |
---|---|---|---|
VariableA, unitA | 23.23 | 112.2 | nan |
VariableB, unitB | 3.23 | 12.2 | 12.7 |
Caveats: If a dataset has only one compartment, the compartment information is ommited by get_wide(), unless .get_wide(standardized=True) is used. The same applies for the unit in the index. If there is a 'name' column in dataset, it get's added to the index, regardless of the value of standardized.
tall
Using: db.get_dataset(dataset)
):
A pandas DataFrame that looks like this
variable | unit | patient | compartment | value | optional columns... |
---|---|---|---|---|---|
variableA | unitA | Patient12 | TAM | 23.23 | |
variableA | unitA | Patient12 | TU | 112.2 | |
variableB | unitB | Patient13 | TAM | 3.23 | |
variableB | unitB | Patient13 | TU | 12.2 |
This is the internal storage format.
compartments
Compartments are an abstraction on top of 'cells' and 'bio-liquid'. Examples are Tumor associated macrophages (TAMs), Tumor cells (TU), ascites, blood...
db.get_compartments()
provides a list
Datasets
Datasets are organized three levels deep. The first one defines the whether you're looking t ex-vivo (=primary) data or in-vitro experiments (=secondary) or literature data (=tertiary). The second level defines *omics being measured (transcriptomics, proteomics, ... or 'clinical'), while the third levels defines the actual method (RNaseq, FACS,...)
Survival data is in primary/clinical/survival.
Please remember: if using https://pypi.python.org/pypi/lifelines, censored and event are negations of each other.
Excluded patients:
Exclusion can either be on a patient, or a patient+compartment level. In addition, there is per dataset exclusion and global exclusion.
Exclusion is by default applied to db.get_wide(), but not to db.get_dataset(), you can change the default by passing apply_exclusion=True|False.
Exclusion information can be retrieved by db.get_excluded_patients(dataset), which return a set of patients (or patient+compartment tuples), or db.get_exclusion_reasons(), which lists why the exclusion happend.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file marburg_biobank-0.156-py2.py3-none-any.whl
.
File metadata
- Download URL: marburg_biobank-0.156-py2.py3-none-any.whl
- Upload date:
- Size: 52.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.50.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b664df7170aff0955e7584b6fec4c2d18a9308044918fb87e60faf210d9584d2 |
|
MD5 | f3cc9865e35c1b59184fb1d1827ae451 |
|
BLAKE2b-256 | d1a4031415d44b48973922402e454d5f46e099e7f3116cd1379e2ca891b452e7 |