Skip to main content

A general DNABERT implementation using the deepbio-toolkit.

Project description

dbtk-dnabert

An implementation of DNABERT using Pytorch and the deepbio-toolkit library.

Getting Started

  1. Install the dbtk-dnabert package
pip install dbtk-dnabert
  1. Pull pre-trained DNABERT model
from dnabert import DnaBert

# Load the pre-trained model
model = DnaBert.from_pretrained("SirDavidLudwig/dnabert", revision="64d-silva16s-250bp")

Examples

Embed DNA sequences

# Sequences to embed
sequences = [
    "ACTGAATGAGAC",
    "TTGAGTAGCCAA"
]

# Tokenize sequences
sequence_tokens = torch.tensor([model.tokenizer(sequence) for sequence in sequences])

# Embed sequences
output = model(sequence_tokens)

# Sequence-level embeddings from class token
embeddings = output["class"]

# Sequence-level embeddings from averaged tokens
embeddings = output["tokens"].mean(dim=1)

Pre-trained Models

Model Name Embedding Dim. Maximum Length Pre-training Dataset
64d-silva16s-250bp 64 250bp Silva 16S
768d-silva16s-250bp 768 250bp Silva 16S

Development

1. Model Configuration

Template model configurations can be generated using the dbtk model config command.

2. Pre-training

The model can be pre-trained using the supplied configurations with the command:

dbtk model fit \
    -c ./configs/datamodules/pretrain_silva_16s_250bp.yaml \
    -c ./configs/models/pretrain_dnabert_768d_250bp.yaml \
    -c ./configs/trainers/pretrainer.yaml \
    ./logs/dnabert_768d_250bp

3. Exporting

The trained model can be exported to a Huggingface model with the following command.

dbtk model export ./logs/dnabert_768d_250bp/last.ckpt ./exports/dnabert_768d_250bp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbtk_dnabert-1.2.2.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbtk_dnabert-1.2.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file dbtk_dnabert-1.2.2.tar.gz.

File metadata

  • Download URL: dbtk_dnabert-1.2.2.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for dbtk_dnabert-1.2.2.tar.gz
Algorithm Hash digest
SHA256 2f7178a72670997f49953d521f4a0dfadecdb325dfeeb136bc57660ff4fe5e9c
MD5 15f7dc57738afb87e6203d3031401e43
BLAKE2b-256 e4371d54bce56575cfb2aa568119c1f3ab5fd956a50793d1c174ad9c33753fc5

See more details on using hashes here.

File details

Details for the file dbtk_dnabert-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: dbtk_dnabert-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for dbtk_dnabert-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7851c0ef785f77c9dab3e67a13f641feb92b3c977431841b795da54708277bb7
MD5 3860d356c15d544dba739cfd302960b3
BLAKE2b-256 cb217010e569223b9ca677f42113e3d9d57505fe06e2467fba01892808cd5237

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page