Skip to main content

A package to download, load, and process multiple benchmark multi-omic drug response datasets

Project description

Cancer Omics Drug Experiment Response Dataset

There is a recent explosion of deep learning algorithms that to tackle the computational problem of predicting drug treatment outcome from baseline molecular measurements. To support this,we have built a benchmark dataset that harmonizes diverse datasets to better assess algorithm performance.

This package collects diverse sets of paired molecular datasets with corresponding drug sensitivity data. All data here is reprocessed and standardized so it can be easily used as a benchmark dataset for the This repository leverages existing datasets to collect the data required for deep learning model development. Since each deep learning model requires distinct data capabilities, the goal of this repository is to collect and format all data into a schema that can be leveraged for existing models.

Coderdata Motivation

The goal of this repository is two-fold: First, it aims to collate and standardize the data for the broader community. This requires running a series of scripts to build and append to a standardized data model. Second, it has a series of scripts that pull from the data model to create model-specific data files that can be run by the data infrastructure.

Data access

For the access to the latest version of CoderData, please visit our documentation site which provides access to Figshare and instructions for using the Python package to download the data.

Data format

All coderdata files are in text format - either comma delimited or tab delimited (depending on data type). Each dataset can be evaluated individually according to the CoderData schema that is maintained in LinkML and can be udpated via a commit to the repository. For more details, please see the schema description.

Building a local version

The build process can be found in our coderbuild directory. Here you can follow the instructions to build your own local copy of the data on your machine.

Adding a new dataset

We have standardized the build (coderbuild) process so an additional dataset can be built locally or as part of the next version of coder. Here are the steps to follow:

  1. First visit the coderbuild directory and ensure you can build a local copy of CoderData.

  2. Checkout this repository and create a subdirectory of the coderbuild directory with your own build files.

  3. Develop your scripts to build the data files according to our LinkML Schema. This will require collecting the following metadata:

  • entrez gene identifiers (or you can use the genes.csv file
  • sample information such as species and model system type
  • drug name that can be searched on PubChem

You can validate each file by using the linkML validator together with our schema file.

You can use the following scripts as part of your build process:

  1. Wrap your scripts in standard shell scripts with the following names and arguments:
shell script arguments description
build_samples.sh [latest_samples] Latest version of samples generated by coderbuild
build_omics.sh [gene file] [samplefile] This includes the genes.csv that was generated in the original build as well as the sample file generated above.
build_drugs.sh [drugfile1,drugfile2,...] This includes a comma-delimited list of all drugs files generated from previous build
build_exp.sh [samplfile ] [drugfile] sample file and drug file generated by previous scripts
  1. Put the Docker container file inside the Docker directory with the name Dockerfile.[datasetname].

  2. Run build_all.py from the root directory, which should now add in your Dockerfile in the mix and call the scripts in your Docker container to build the files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coderdata-2.2.1.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coderdata-2.2.1-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file coderdata-2.2.1.tar.gz.

File metadata

  • Download URL: coderdata-2.2.1.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for coderdata-2.2.1.tar.gz
Algorithm Hash digest
SHA256 1f69cf16b36b7a7b42bc623f403959ad44ae5d40c56934fa9202e1be1cfa4c05
MD5 f0585c41f86457cacbebe21e3d2cd2de
BLAKE2b-256 2ecff8517065bd124e41c9814738c28c08399dadf090827677db1d08317338a1

See more details on using hashes here.

File details

Details for the file coderdata-2.2.1-py3-none-any.whl.

File metadata

  • Download URL: coderdata-2.2.1-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for coderdata-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b94da797d10c97ecf5029d1bb7a6c55bf2cb105e94c4291eb91625de30e444d7
MD5 809ef9827a1202698e86883f60e66602
BLAKE2b-256 81618da70b5e72caa2f4f95c77296bacefa60d9c2619d89ff23e56b1e789f817

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page