Skip to main content

Gene-backed peak-to-annotation BED resources for SJ/CAB workflows.

Project description

sjcab_peak2anno_db

Gene-backed peak-to-annotation BED resources for SJ/CAB workflows.

The package bundles gene BED files to keep the distribution small. It can generate these derived annotations from each bundled gene version:

  • tss: 1 bp TSS intervals, strand-aware
  • tes: 1 bp TES intervals, strand-aware
  • deduplong: longest isoform per gene name, using column 4 as gene name and column 5 as isoform length

The package also bundles blacklist BED files. During install, each blacklist is written under ~/.sjcab_peak2anno_db/blacklists as a dated *.bed.20230411 file, with the current *.bed name refreshed as a symlink. CpG island (CGI) BED files for hg38, hg19, mm10, mm9, and mm39 are also bundled and installed into ~/.sjcab_peak2anno_db/cgi.

Supported species:

  • hg19
  • hg38
  • mm10
  • mm9
  • sacCer3

Install

From TestPyPI:

python -m pip install -i https://test.pypi.org/simple/ sjcab_peak2anno_db

From a local checkout:

python -m pip install .

Generate User Data

After package installation, generate all available versions under ~/.sjcab_peak2anno_db:

sjcab-peak2anno-db install

That command installs the packaged gene BEDs, blacklists, and CGI BEDs into the cache directory.

To use a different data directory, set SJCAB_PEAK2ANNO_DB_PATH:

export SJCAB_PEAK2ANNO_DB_PATH=/path/to/sjcab_peak2anno_db
sjcab-peak2anno-db install

You can also pass --data-dir for one command:

sjcab-peak2anno-db install --data-dir /path/to/sjcab_peak2anno_db

Generated files are written as:

{data_dir}/{species}/{annotation}/{version}.bed
{data_dir}/{species}/{annotation}/default.bed
{data_dir}/blacklists/{name}.bed.20230411
{data_dir}/blacklists/{name}.bed

default.bed points to the latest parsed version for that species and annotation.

Download UCSC CpG island BED files for hg38, hg19, mm10, mm9, and mm39:

sjcab-peak2anno-db download-cgi

CGI files are written as:

{data_dir}/cgi/{species}_cgi.bed

Python Usage

import sjcab_peak2anno_db as db

print(db.supported_species())
print(db.versions("hg38", "gene"))
print(db.default_version("hg19", "tss"))

db.install_data()
print(db.path("hg38", "tss"))

Generate one derived file manually:

import sjcab_peak2anno_db as db

db.write_tss("genes.bed", "genes.tss.bed")
db.write_tes("genes.bed", "genes.tes.bed")
db.write_deduplong("genes.bed", "genes.deduplong.bed")

Command Line

sjcab-peak2anno-db list
sjcab-peak2anno-db install
sjcab-peak2anno-db install-blacklists
sjcab-peak2anno-db download-cgi
sjcab-peak2anno-db update
sjcab-peak2anno-db path hg38 gene
sjcab-peak2anno-db path hg38 tss --install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sjcab_peak2anno_db-0.1.1.tar.gz (22.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sjcab_peak2anno_db-0.1.1-py3-none-any.whl (22.4 MB view details)

Uploaded Python 3

File details

Details for the file sjcab_peak2anno_db-0.1.1.tar.gz.

File metadata

  • Download URL: sjcab_peak2anno_db-0.1.1.tar.gz
  • Upload date:
  • Size: 22.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for sjcab_peak2anno_db-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4afc865f30f565e3fdd5958c9878f29ccfba1422473f4f5f7afb3799059469fa
MD5 8d138ef3fda112b1b8ca850c50530390
BLAKE2b-256 2db4b4520f7a72e4333515f9b5fb9ec254aafec4901f4d8b25bf81470e93d39b

See more details on using hashes here.

File details

Details for the file sjcab_peak2anno_db-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sjcab_peak2anno_db-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fa1aa8623a68ae6fff4f4e3ce143028e0b07a002663bee661a2a7c6a951421b4
MD5 8150e8fc1e579bdcb3bbf0720de41d0a
BLAKE2b-256 caa5faabd0069ae0fedf40d881f418a903169c222628ce014f003d7d48ebade9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page