Skip to main content

tfclass_predict allows to estimate transcription factor bindingsites in the TFClass hierarchy.

Project description

TFClassPredict

Description

TFClassPredict predicts transcription factor binding sites (TFBSs) on the human genome accordingly to their DNA-binding domain encapsulated in 23 DBD-Classes defined by the TFClass Class-level. By leveraging the DNABERT model, TFClassPredict reached high performances.

Package Workflow Structure

::: {style="text-align:center"} ./workflow_schema.drawio.png :::

Installation

The package can be installed via pip:

pip install tfclass_predict

To use the package the usage of the precompiled data set is recommended as it shortens runtime drastically and can be on basically any system.

TFCP_precompiled

Alternatively, the human genome (v38) and the TFClassPredict (TFCP) model can be used directly.

HG38 from UCSC

TFCP_model

All downloads need to be unzipped so that the path to TFCP_precompiled directory or the path to the hg38.fa file and directory TFCP_model can be passed to the command-line tool or PredictionManager.

Usage

The tool can be used from the command line with the following parameters:

usage: tfclass_predict [-h] [--genome GENOME] [--precompiled PRECOMPILED] [--model MODEL] [--gpus GPUS] [--cpus CPUS] bed_file output_dir

tfclass_predict allows to estimate transcription factor bindingsites in the TFClass hierarchy. Please specifiy either the path to the precompiled (--precompiled) files or to the published
model (--model + --genome). If both are defined, the precompiled dataset is preferred!

positional arguments:
  bed_file              Path to bed file of ATAC-seq or other NGS experiment.
  output_dir            Path to output directory.

options:
  -h, --help            show this help message and exit
  --genome GENOME       Path to human genome reference (rec.: hg38) (.fa).
  --precompiled PRECOMPILED
                        Path to precompiled hg38 predictions (unzipped folder).
  --model MODEL         Path to TFClass model archive (unzipped folder).
  --gpus GPUS           Number of GPUs that should be used in parallel.
  --cpus CPUS           Number of CPUs that should be used at maximum in parallel. 

Or directly in python scripts:

from tfclass_predict import PredictionManager

bed_file = 'path_to_example/bed.bed'  # smaller bed file for testing
genome_file = "hg38.fa"
tfclass_model = "model/TFCP_model" #see Installation
tfclass_precompiled = "model/TFCP_precompiled" #see Installation
res_dir = "res"

# using the precompiled data set(recommended)
pred_manager = PredictionManager(bed_file, res_dir, precompiled=precompiled)
pred_manager.predict()
pred_manager.save_results()

# using genome and model
pred_manager = PredictionManager(bed_file, res_dir, genome=genome_file, tfcp_model=tfclass_model)
pred_manager.predict()
pred_manager.save_results()

In case the TFCP_model is used, it runs by default a mirrored strategy so that predictions are done in parallel on several GPUs if --gpus > 1. --cpus can be defined to use more than one core for parallel tokenization.

Further Documentation

Find more information about the API at ReadTheDocs.
(Currently outdated! - 26/05/21)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfclass_predict-1.1.3.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tfclass_predict-1.1.3-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file tfclass_predict-1.1.3.tar.gz.

File metadata

  • Download URL: tfclass_predict-1.1.3.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for tfclass_predict-1.1.3.tar.gz
Algorithm Hash digest
SHA256 a756f8bbcf58a02f8a88c84ba8d2cef3dc5b505ce136141ce5f3447e8a3d0196
MD5 095e4cd5944747376ff53da3139ed926
BLAKE2b-256 62cfcce79058ead5edd2dfa0ba867b031898974b742f5289cadedb03ed8378bd

See more details on using hashes here.

File details

Details for the file tfclass_predict-1.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for tfclass_predict-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f3c7f4cc4ecd2e887d0166cf2c561e3814038e140ff91f480d7d9ff1bdc4bb0a
MD5 f15d520dfb792ed4c1abddd756e6dcb5
BLAKE2b-256 2fca5c62ff664389f358ef796fe298d2f9c3add3695c876a5b643c0e67ef094b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page