A general DNABERT implementation using the deepbio-toolkit.
Project description
dbtk-dnabert
An implementation of DNABERT using Pytorch and the deepbio-toolkit library.
Getting Started
- Install the dbtk-dnabert package
pip install dbtk-dnabert
- Pull pre-trained DNABERT model
from dnabert import DnaBert
# Load the pre-trained model
model = DnaBert.from_pretrained("SirDavidLudwig/dnabert", revision="64d-silva16s-250bp")
Examples
Embed DNA sequences
# Sequences to embed
sequences = [
"ACTGAATGAGAC",
"TTGAGTAGCCAA"
]
# Tokenize sequences
sequence_tokens = torch.tensor([model.tokenizer(sequence) for sequence in sequences])
# Embed sequences
output = model(sequence_tokens)
# Sequence-level embeddings from class token
embeddings = output["class"]
# Sequence-level embeddings from averaged tokens
embeddings = output["tokens"].mean(dim=1)
Pre-trained Models
| Model Name | Embedding Dim. | Maximum Length | Pre-training Dataset |
|---|---|---|---|
64d-silva16s-250bp |
64 | 250bp | Silva 16S |
768d-silva16s-250bp |
768 | 250bp | Silva 16S |
Development
1. Model Configuration
Template model configurations can be generated using the dbtk model config command.
2. Pre-training
The model can be pre-trained using the supplied configurations with the command:
dbtk model fit \
-c ./configs/datamodules/pretrain_silva_16s_250bp.yaml \
-c ./configs/models/pretrain_dnabert_768d_250bp.yaml \
-c ./configs/trainers/pretrainer.yaml \
./logs/dnabert_768d_250bp
3. Exporting
The trained model can be exported to a Huggingface model with the following command.
dbtk model export ./logs/dnabert_768d_250bp/last.ckpt ./exports/dnabert_768d_250bp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbtk_dnabert-1.2.2.tar.gz.
File metadata
- Download URL: dbtk_dnabert-1.2.2.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f7178a72670997f49953d521f4a0dfadecdb325dfeeb136bc57660ff4fe5e9c
|
|
| MD5 |
15f7dc57738afb87e6203d3031401e43
|
|
| BLAKE2b-256 |
e4371d54bce56575cfb2aa568119c1f3ab5fd956a50793d1c174ad9c33753fc5
|
File details
Details for the file dbtk_dnabert-1.2.2-py3-none-any.whl.
File metadata
- Download URL: dbtk_dnabert-1.2.2-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7851c0ef785f77c9dab3e67a13f641feb92b3c977431841b795da54708277bb7
|
|
| MD5 |
3860d356c15d544dba739cfd302960b3
|
|
| BLAKE2b-256 |
cb217010e569223b9ca677f42113e3d9d57505fe06e2467fba01892808cd5237
|