Skip to main content

BertNado: A framework for training and evaluating transformer-based models for Chromatin binding

Project description

BertNado

Docs License CI Release PyPI Version

BertNado logo

BertNado is a modular framework for fine-tuning Hugging Face DNA language models such as GROVER, NT2, and DNABERT variants on genomic prediction tasks. It supports both full fine-tuning and parameter-efficient transfer learning (PEFT) strategies like LoRA.


Features

  • Model Support: GROVER, NT2 (Nucleotide Transformer), DNABERT, and other Hugging Face-compatible DNA language models
  • Task Flexibility: Supports regression, binary, and multi-label classification, as well as masked DNA modeling
  • Chromosome-aware Splits: Train/val/test split by chromosome to prevent data leakage
  • Efficient Fine-tuning: Drop-in support for parameter-efficient tuning methods like LoRA
  • Hyperparameter Optimization: Integrated with Weights & Biases for Bayesian sweep-based tuning
  • Robust Evaluation: Automatically generates R², ROC, PR, and confusion matrix plots
  • Model Interpretation: SHAP and Captum Layer Integrated Gradients (LIG) for biological insight
  • Trainer Integration: Built on Hugging Face Trainer with custom heads and metrics
  • W&B Logging: Full experiment tracking with Weights & Biases out of the box

Installation

git clone https://github.com/CChahrour/BertNado.git
cd BertNado
pip install -e .

Project Structure

bertnado/
├── cli.py                      # Command-line interface
├── data/
│   └── prepare_dataset.py      # Dataset creation and tokenization
├── evaluation/
│   ├── predict.py              # Predict from trained models
│   └── feature_extraction.py   # SHAP / LIG-based interpretation
└── training/
    ├── finetune.py             # Fine-tuning using best config
    ├── full_train.py           # Full training loop
    ├── model.py                # PEFT/LoRA model architecture
    ├── sweep.py                # W&B sweep setup
    ├── trainers.py             # Trainer wrappers
    └── metrics.py              # Metric computation

Quickstart

Step 1: Prepare Dataset

bertnado-data \
  --file-path test/data/mock_data.parquet \
  --target-column test_A \
  --fasta-file test/data/mock_genome.fasta \
  --tokenizer-name PoetschLab/GROVER \
  --output-dir output/dataset \
  --task-type regression

Step 2: Run Hyperparameter Sweep

bertnado-sweep \
  --config-path test/data/mock_sweep_config.json \
  --output-dir output/sweep \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --sweep-count 2 \
  --project-name project \
  --task-type regression

Step 3: Train Best Model

bertnado-train \
  --output-dir output/train \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --best-config-path output/sweep/best_sweep_config.json \
  --task-type regression \
  --project-name project

Step 4: Predict on Test Set

bertnado-predict \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/predictions \
  --task-type regression

Step 5: Interpret Model with SHAP or LIG

bertnado-feature \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/feature_analysis \
  --task-type regression \
  --method shap

Run both SHAP and LIG:

--method both

Outputs

  • Figures saved to output/figures/

    • Regression: R² scatter plot
    • Classification: ROC & PR curves
    • Binary: Confusion matrix
  • SHAP scores saved to output/shap/

  • Trained models saved to output/models/


Interpretation Tools

  • SHAP: Global and local token importance
  • Captum LIG: Gradient-based token attribution at the embedding level

Acknowledgements

  • Hugging Face Transformers
  • PoetschLab/GROVER
  • PEFT/LoRA
  • SHAP & Captum for interpretability
  • crested for efficient sequence extraction

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bertnado-0.1.1.tar.gz (586.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bertnado-0.1.1-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file bertnado-0.1.1.tar.gz.

File metadata

  • Download URL: bertnado-0.1.1.tar.gz
  • Upload date:
  • Size: 586.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c45f122fc7dbfaaeae42a9773465418ca3a3660234987c0b590ff77ec424feca
MD5 e39ba2f2b6d7d4443b2a709caa2afeee
BLAKE2b-256 8755437dbe3da1b9959ca23d69c1de8d225d52df8c6d1638d973f814f0eb8908

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.1.tar.gz:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bertnado-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: bertnado-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b31baf45b40d2acc6447ec6a299b9d47c6756a2fa7dbc6f2eb621fbc879cec67
MD5 225a553955c456d12ebca6dd975809ee
BLAKE2b-256 703c752ff8564939e9f9caf5df0687c16a958b38cc32b2e5086a7e0818ee51e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.1-py3-none-any.whl:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page