tfclass_predict allows to estimate transcription factor bindingsites in the TFClass hierarchy.
Project description
TFClassPredict
Description
TFClassPredict predicts transcription factor binding sites (TFBSs) on the human genome accordingly to their DNA-binding domain encapsulated in 23 DBD-Classes defined by the TFClass Class-level. By leveraging the DNABERT model, TFClassPredict reached high performances.
Package Workflow Structure
::: {style="text-align:center"}
:::
Installation
The package can be installed via pip:
pip install tfclass_predict
To use the package the usage of the precompiled data set is recommended as it shortens runtime drastically and can be on basically any system.
Alternatively, the human genome (v38) and the TFClassPredict (TFCP) model can be used directly.
HG38 from UCSC
All downloads need to be unzipped so that the path to TFCP_precompiled directory or the path to the hg38.fa file and directory TFCP_model can be passed to the command-line tool or PredictionManager.
Usage
The tool can be used from the command line with the following parameters:
usage: tfclass_predict [-h] [--genome GENOME] [--precompiled PRECOMPILED] [--model MODEL] [--gpus GPUS] [--cpus CPUS] bed_file output_dir
tfclass_predict allows to estimate transcription factor bindingsites in the TFClass hierarchy. Please specifiy either the path to the precompiled (--precompiled) files or to the published
model (--model + --genome). If both are defined, the precompiled dataset is preferred!
positional arguments:
bed_file Path to bed file of ATAC-seq or other NGS experiment.
output_dir Path to output directory.
options:
-h, --help show this help message and exit
--genome GENOME Path to human genome reference (rec.: hg38) (.fa).
--precompiled PRECOMPILED
Path to precompiled hg38 predictions (unzipped folder).
--model MODEL Path to TFClass model archive (unzipped folder).
--gpus GPUS Number of GPUs that should be used in parallel.
--cpus CPUS Number of CPUs that should be used at maximum in parallel.
Or directly in python scripts:
from tfclass_predict import PredictionManager
bed_file = 'path_to_example/bed.bed' # smaller bed file for testing
genome_file = "hg38.fa"
tfclass_model = "model/TFCP_model" #see Installation
tfclass_precompiled = "model/TFCP_precompiled" #see Installation
res_dir = "res"
# using the precompiled data set(recommended)
pred_manager = PredictionManager(bed_file, res_dir, precompiled=precompiled)
pred_manager.predict()
pred_manager.save_results()
# using genome and model
pred_manager = PredictionManager(bed_file, res_dir, genome=genome_file, tfcp_model=tfclass_model)
pred_manager.predict()
pred_manager.save_results()
In case the TFCP_model is used, it runs by default a mirrored strategy so that predictions are done in parallel on several GPUs if --gpus > 1. --cpus can be defined to use more than one core for parallel tokenization.
Further Documentation
Find more information about the API at ReadTheDocs.
(Currently outdated! - 26/05/21)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tfclass_predict-1.1.3.tar.gz.
File metadata
- Download URL: tfclass_predict-1.1.3.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a756f8bbcf58a02f8a88c84ba8d2cef3dc5b505ce136141ce5f3447e8a3d0196
|
|
| MD5 |
095e4cd5944747376ff53da3139ed926
|
|
| BLAKE2b-256 |
62cfcce79058ead5edd2dfa0ba867b031898974b742f5289cadedb03ed8378bd
|
File details
Details for the file tfclass_predict-1.1.3-py3-none-any.whl.
File metadata
- Download URL: tfclass_predict-1.1.3-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3c7f4cc4ecd2e887d0166cf2c561e3814038e140ff91f480d7d9ff1bdc4bb0a
|
|
| MD5 |
f15d520dfb792ed4c1abddd756e6dcb5
|
|
| BLAKE2b-256 |
2fca5c62ff664389f358ef796fe298d2f9c3add3695c876a5b643c0e67ef094b
|