A package to download, load, and process multiple benchmark multi-omic drug response datasets

These details have not been verified by PyPI

Project links

Homepage

Project description

Cancer Omics Drug Experiment Response Dataset

There is a recent explosion of deep learning algorithms that to tackle the computational problem of predicting drug treatment outcome from baseline molecular measurements. To support this,we have built a benchmark dataset that harmonizes diverse datasets to better assess algorithm performance.

This package collects diverse sets of paired molecular datasets with corresponding drug sensitivity data. All data here is reprocessed and standardized so it can be easily used as a benchmark dataset for the This repository leverages existing datasets to collect the data required for deep learning model development. Since each deep learning model requires distinct data capabilities, the goal of this repository is to collect and format all data into a schema that can be leveraged for existing models.

The goal of this repository is two-fold: First, it aims to collate and standardize the data for the broader community. This requires running a series of scripts to build and append to a standardized data model. Second, it has a series of scripts that pull from the data model to create model-specific data files that can be run by the data infrastructure.

coderdata Data Model

The goal of the data model is to collate drug response data together with molecular data in a way that can be easily ingested by machine learning models. The overall schema is shown below.

We will store the data in tables that are represented by the files below. Each data-specific model can be generated from a smaller set of these tables. The schema for these tables is represented below.

For each dataset added, the files are comma-delimited and named follows:

genes.csv
drugs.tsv.gz --> Drug names have commas and quotes in them, therefore require tab delimited
samples.csv
experiments.csv.gz --> compressed to fit on github
transcriptomics.csv.gz
mutations.csv.gz
copy_number.csv.gz
methylation.csv.gz
mirnas.csv.gz

Building the data model

Below is a description of how the data model is built.

Data model step	Description/Dependencies	Script	Destination
Build cell line data	Runs through PGX and existing CCLE data to compile all values	cell_line/buildInitialDataset.py	[./cell_line]
Build cptac data	This uses the genes files created in the [./cell_line] directory but generates additional samples.	cptac/getCptacData.py	[./cptac]
Get HCMI data	This uses a fixed manifest to download the data into the proper schema	hcmi/getHCMIData.py	[./hcmi]

Current data

What data is stored here?

Using the data model

Files are stored on FigShare. We need to build a script that pulls those data as needed.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.40

Aug 30, 2024

0.1.29

May 30, 2024

0.1.28 yanked

May 14, 2024

Reason this release was yanked:

Bad Data

0.1.27

Apr 29, 2024

0.1.26

Apr 23, 2024

0.1.25

Apr 23, 2024

0.1.24

Apr 4, 2024

0.1.23

Apr 3, 2024

0.1.22

Jan 22, 2024

0.1.21

Jan 19, 2024

0.1.20

Jan 11, 2024

This version

0.1.19

Jan 11, 2024

0.1.18

Jan 11, 2024

0.1.17

Jan 10, 2024

0.1.16

Jan 10, 2024

0.1.15

Jan 4, 2024

0.1.14

Jan 4, 2024

0.1.13

Jan 3, 2024

0.1.12

Jan 3, 2024

0.1.11

Jan 3, 2024

0.1.10

Jan 2, 2024

0.1.9

Jan 2, 2024

0.1.8

Dec 29, 2023

0.1.7

Dec 29, 2023

0.1.6

Dec 27, 2023

0.1.5

Dec 6, 2023

0.1.4

Jul 31, 2024

0.1.3

Dec 6, 2023

0.1.2

Dec 6, 2023

0.1.1

Dec 6, 2023

0.1.0

Nov 17, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coderdata-0.1.19.tar.gz (11.1 kB view details)

Uploaded Jan 11, 2024 Source

Built Distribution

coderdata-0.1.19-py3-none-any.whl (14.7 kB view details)

Uploaded Jan 11, 2024 Python 3

File details

Details for the file coderdata-0.1.19.tar.gz.

File metadata

Download URL: coderdata-0.1.19.tar.gz
Upload date: Jan 11, 2024
Size: 11.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for coderdata-0.1.19.tar.gz
Algorithm	Hash digest
SHA256	`934269b7c1fbf032ed6391ffbf890183e0b8b052772077398d42c90c6b6b4775`
MD5	`702b5b25528ba4fd153b2606351a80f2`
BLAKE2b-256	`d38608b65711bfcef4d1057971abf929252beef88c3a2a4ed6951f9f979dea0f`

See more details on using hashes here.

File details

Details for the file coderdata-0.1.19-py3-none-any.whl.

File metadata

Download URL: coderdata-0.1.19-py3-none-any.whl
Upload date: Jan 11, 2024
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for coderdata-0.1.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ecccedc5c08a6b4f6e195df2f93d3a98dec786aac3203f7ca1ccf3e236a3d124`
MD5	`a21b6e22ccd06e66ebb28b0c0d11902a`
BLAKE2b-256	`d6264c29b1bed7f1762176254187fe1ae756d0b0067894d6768f9fbf805a2e2b`