Skip to main content

tfclass_predict allows to estimate transcription factor bindingsites in the TFClass hierarchy.

Project description

TFClassPredict

Description

TFClassPredict predicts transcription factor binding sites (TFBSs) on the human genome accordingly to their DNA-binding domain encapsulated in 23 DBD-Classes defined by the TFClass Class-level. By leveraging the DNABERT model, TFClassPredict reached high performances.

Package Workflow Structure

::: {style="text-align:center"} ./workflow_schema.drawio.png :::

Installation

The package can be installed via pip:

pip install tfclass_predict

To use the package the human genome (v38) and the TFClassPredict (TFCP) model (v1-06) are needed.

HG38 from UCSC

TFCP_model

Both downloads need to be unzipped so that the path to hg38.fa and the path to the directory TFCP_model can be passed to the command-line tool or PredictionManager.

Usage

The tool can be used from the command line with the following parameters:

usage: tfclass_predict [-h] [--genome GENOME] [--precompiled PRECOMPILED] [--model MODEL] [--gpus GPUS] [--cpus CPUS] bed_file output_dir

tfclass_predict allows to estimate transcription factor bindingsites in the TFClass hierarchy. Please specifiy either the path to the precompiled (--precompiled) files or to the published
model (--model + --genome). If both are defined, the precompiled dataset is preferred!

positional arguments:
  bed_file              Path to bed file of ATAC-seq or other NGS experiment.
  output_dir            Path to output directory.

options:
  -h, --help            show this help message and exit
  --genome GENOME       Path to human genome reference (rec.: hg38) (.fa).
  --precompiled PRECOMPILED
                        Path to precompiled hg38 predictions (unzipped folder).
  --model MODEL         Path to TFClass model archive (unzipped folder).
  --gpus GPUS           Number of GPUs that should be used in parallel.
  --cpus CPUS           Number of CPUs that should be used at maximum in parallel. 

Or directly in python scripts:

from tfclass_predict import PredictionManager

bed_file = 'path_to_example/bed.bed'  # smaller bed file for testing
genome_file = "hg38.fa"
tfclass_model = "model/TFCP_model" #see Installation
res_dir = "res"

pred_manager = PredictionManager(bed_file, res_dir, genome=genome_file, tfcp_model=tfclass_model)
pred_manager.predict()
pred_manager.save_results()

The precompiled predictions are currently unavailable (26/05/11). However, the model runs by default a mirrored strategy so that predictions are done in parallel on several GPUs if --gpus > 1. --cpus can be defined to use more than one core for paralleling tokenization.

Further Documentation

Find more information about the API at ReadTheDocs.
(Currently outdated! - 26/05/11)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfclass_predict-1.1.0.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tfclass_predict-1.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file tfclass_predict-1.1.0.tar.gz.

File metadata

  • Download URL: tfclass_predict-1.1.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for tfclass_predict-1.1.0.tar.gz
Algorithm Hash digest
SHA256 c8efb250793691c54855ef5a675b0ed52b8325827e69a8b1eb8ec964a12a62f3
MD5 c99756846451d43c942e6023a9e274a5
BLAKE2b-256 d753ac60d5b87c800511924c28ae886afc909c34b0d41c77883e2cad53aa87bc

See more details on using hashes here.

File details

Details for the file tfclass_predict-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tfclass_predict-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d120e2979662d83bc9191f148a27919ce0a3fc8c45408e22d4f4d20e6030e87c
MD5 4a02d58b55774bc7c204871bcfe65fcc
BLAKE2b-256 a782f05de4df3d59d1ca871c1b8b27b9774d8625ae46e7fb45943d58f9f5ea30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page