Interface code to interact with data from the Ovara.net biobank.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

marburg_biobank

Introduction

The marburg_biobank python module offers a high level interface to the data sets stored in the [Ovarian Cancer Effusion Biobank and Database])(https://www.ovara.net/biobank).

The basic usage is as follows:

import marburg_biobank
db = marburg_biobank.OvcaBiobank("marburg_ovca_revision_15.zip") #  you need to download that file from your biobank.
print(db.list_datasets())
df_wide = db.get_wide('transcriptomics/rnaseq')  # to retrieve the data in a one sample per column / one row per measured variable format
df_tall = db.get_dataset('transcriptomics/rnaseq') # to retrieve the data in one row per data point format

Data formats available

wide

Using db.get_wide(dataset):

A pandas DataFrame that looks like this

Index	Patient12, TAM	Patient12, TU	PatientX, Compartment
VariableA, unitA	23.23	112.2	nan
VariableB, unitB	3.23	12.2	12.7

Caveats: If a dataset has only one compartment, the compartment information is ommited by get_wide(), unless .get_wide(standardized=True) is used. The same applies for the unit in the index. If there is a 'name' column in dataset, it get's added to the index, regardless of the value of standardized.

tall

Using: db.get_dataset(dataset)):

A pandas DataFrame that looks like this

variable	unit	patient	compartment	value
variableA	unitA	Patient12	TAM	23.23
variableA	unitA	Patient12	TU	112.2
variableB	unitB	Patient13	TAM	3.23
variableB	unitB	Patient13	TU	12.2

This is the internal storage format.

compartments

Compartments are an abstraction on top of 'cells' and 'bio-liquid'. Examples are Tumor associated macrophages (TAMs), Tumor cells (TU), ascites, blood... db.get_compartments() provides a list

Datasets

Datasets are organized three levels deep. The first one defines the whether you're looking t ex-vivo (=primary) data or in-vitro experiments (=secondary) or literature data (=tertiary). The second level defines *omics being measured (transcriptomics, proteomics, ... or 'clinical'), while the third levels defines the actual method (RNaseq, FACS,...)

Survival data is in primary/clinical/survival.

Please remember: if using https://pypi.python.org/pypi/lifelines, censored and event are negations of each other.

Excluded patients:

Exclusion can either be on a patient, or a patient+compartment level. In addition, there is per dataset exclusion and global exclusion.

Exclusion is by default applied to db.get_wide(), but not to db.get_dataset(), you can change the default by passing apply_exclusion=True|False.

Exclusion information can be retrieved by db.get_excluded_patients(dataset), which return a set of patients (or patient+compartment tuples), or db.get_exclusion_reasons(), which lists why the exclusion happend.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.156

Feb 23, 2022

0.155

Sep 14, 2021

0.154

May 7, 2021

0.153

May 7, 2021

0.152

May 7, 2021

0.151

Apr 21, 2021

0.150

Apr 21, 2021

0.149

Apr 14, 2021

0.148

Apr 14, 2021

0.147

Jan 29, 2021

0.146

Jan 29, 2021

0.145

Jan 29, 2021

0.144

Jan 29, 2021

0.143

Jan 29, 2021

0.142

Oct 29, 2020

0.141

Oct 28, 2020

0.140

Sep 1, 2020

0.139

Jun 9, 2020

0.138

Jun 9, 2020

0.137

Apr 28, 2020

0.135

Apr 28, 2020

This version

0.134

Apr 22, 2020

0.133

Apr 22, 2020

0.132

Apr 22, 2020

0.131

Mar 19, 2020

0.130

Dec 9, 2019

0.129

Nov 20, 2019

0.128

Nov 20, 2019

0.127

Nov 15, 2019

0.124

Aug 27, 2019

0.122

Aug 26, 2019

0.121

May 29, 2019

0.120

May 29, 2019

0.117

May 3, 2019

0.116

May 3, 2019

0.115

Apr 11, 2018

0.114

Apr 11, 2018

0.113

Jan 9, 2018

0.112

Jan 2, 2018

0.111

Jan 2, 2018

0.109

Jan 2, 2018

0.108

Jan 2, 2018

0.107

Jan 2, 2018

0.106

Jan 2, 2018

0.105

Jan 2, 2018

0.104

Oct 9, 2017

0.103

Sep 12, 2017

0.102

Sep 12, 2017

0.101

Sep 12, 2017

0.11

Jan 2, 2018

0.1

Sep 12, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marburg_biobank-0.134.tar.gz (47.2 kB view hashes)

Uploaded Apr 22, 2020 Source

Hashes for marburg_biobank-0.134.tar.gz

Hashes for marburg_biobank-0.134.tar.gz
Algorithm	Hash digest
SHA256	`763ec33ef643c128039d29a50610878385bebd87de53b26a421d4862b4d91891`
MD5	`e3700bdf319d66debaf01d12607c4fc0`
BLAKE2b-256	`cbee40cc85290ad7eb26181cbdef030fa0c2ebbab83b3d540f343b5302c5bbc3`