Skip to main content

Python package wrapping ENCODE epigenomic data for a number of reference cell lines.

Project description

Travis CI build SonarCloud Quality SonarCloud Maintainability Codacy Maintainability Maintainability Pypi project Pypi total project downloads

Python package wrapping ENCODE epigenomic data for a number of reference cell lines.

How do I install this package?

As usual, just download it using pip:

pip install epigenomic_dataset

Tests Coverage

Since some software handling coverages sometime get slightly different results, here’s three of them:

Coveralls Coverage SonarCloud Coverage Code Climate Coverate

Pipeline

The considered raw data are from this query from the ENCODE project

You can find the complete table of the available epigenomes here. These datasets were selected to have (at time of the writing, 07/02/2020) the least possible amount of known problems, such as low read resolution.

You can run the pipeline as follows: suppose you want to extract the epigenomic features for the cell lines HepG2 and H1:

from epigenomic_dataset import build

build(
    bed_path="path/to/my/bed/file.bed",
    cell_lines=["HepG2", "H1]
)

If you want to specify where to store the files use:

from epigenomic_dataset import build

build(
    bed_path="path/to/my/bed/file.bed",
    cell_lines=["HepG2", "H1"],
    path="path/to/my/target"
)

By default the downloaded bigWig files are not deleted. You can choose to delete the files as follows:

from epigenomic_dataset import build

build(
    bed_path="path/to/my/bed/file.bed",
    cell_lines=["HepG2", "H1"],
    path="path/to/my/target",
    clear_download=True
)

Finally, you can use a custom NaN threshold to drop windows that contain too many NaNs. For instance, if you want to drop the rows that have more than 60% of NaN you can use:

from epigenomic_dataset import build

build(
    bed_path="path/to/my/bed/file.bed",
    cell_lines=["HepG2", "H1"],
    nan_threshold=0.6
)

Project details


Release history Release notifications

This version

1.0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for epigenomic-dataset, version 1.0.0
Filename, size File type Python version Upload date Hashes
Filename, size epigenomic_dataset-1.0.0.tar.gz (63.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page