Skip to main content

BertNado: A framework for training and evaluating transformer-based models for Chromatin binding

Project description

BertNado

Docs License CI Release PyPI Version

BertNado logo

BertNado is a modular framework for fine-tuning Hugging Face DNA language models such as GROVER, NT2, and DNABERT variants on genomic prediction tasks. It supports both full fine-tuning and parameter-efficient transfer learning (PEFT) strategies like LoRA.


Features

  • Model Support: GROVER, NT2 (Nucleotide Transformer), DNABERT, and other Hugging Face-compatible DNA language models
  • Task Flexibility: Supports regression, binary, and multi-label classification, as well as masked DNA modeling
  • Chromosome-aware Splits: Train/val/test split by chromosome to prevent data leakage
  • Efficient Fine-tuning: Drop-in support for parameter-efficient tuning methods like LoRA
  • Hyperparameter Optimization: Integrated with Weights & Biases for Bayesian sweep-based tuning
  • Robust Evaluation: Automatically generates ROC, PR, and confusion matrix plots for binary classification
  • Model Interpretation: SHAP and Captum Layer Integrated Gradients (LIG) for biological insight
  • Trainer Integration: Built on Hugging Face Trainer with custom heads and metrics
  • W&B Logging: Full experiment tracking with Weights & Biases out of the box

Installation

git clone https://github.com/CChahrour/BertNado.git
cd BertNado
pip install -e .

Project Structure

bertnado/
├── cli.py                      # Command-line interface
├── data/
│   └── prepare_dataset.py      # Dataset creation and tokenization
├── evaluation/
│   ├── predict.py              # Predict from trained models
│   └── feature_extraction.py   # SHAP / LIG-based interpretation
└── training/
    ├── finetune.py             # Fine-tuning using best config
    ├── full_train.py           # Full training loop
    ├── model.py                # PEFT/LoRA model architecture
    ├── sweep.py                # W&B sweep setup
    ├── trainers.py             # Trainer wrappers
    └── metrics.py              # Metric computation

Quickstart

Step 1: Prepare Dataset

bertnado-data \
  --file-path test/data/mock_data.parquet \
  --target-column bound \
  --fasta-file test/data/mock_genome.fasta \
  --tokenizer-name PoetschLab/GROVER \
  --output-dir output/dataset \
  --task-type binary_classification \
  --threshold 0.5

Step 2: Run Hyperparameter Sweep

bertnado-sweep \
  --config-path test/data/mock_sweep_config.json \
  --output-dir output/sweep \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --sweep-count 2 \
  --project-name project \
  --metric-name eval/roc_auc \
  --metric-goal maximize \
  --task-type binary_classification

--config-path points to a Weights & Biases sweep config. The sweep metric is also used to choose the best checkpoint inside each run.


Step 3: Train Best Model

bertnado-train \
  --output-dir output/train \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --best-config-path output/sweep/best_sweep_config.json \
  --task-type binary_classification \
  --project-name project \
  --metric-name eval/roc_auc \
  --metric-goal maximize

The metric flags are optional when best_sweep_config.json was produced by bertnado-sweep, because the resolved metric is saved in that file.


Step 4: Predict on Test Set

bertnado-predict \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/predictions \
  --task-type binary_classification

Step 5: Interpret Model with SHAP or LIG

bertnado-feature \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/feature_analysis \
  --task-type binary_classification \
  --method shap \
  --target-class 1

Run both SHAP and LIG:

--method both --target-class 1

Outputs

  • Figures saved to output/figures/

    • Binary classification: ROC and precision-recall curves
    • Binary classification: Confusion matrix
  • SHAP scores saved to output/shap/

  • Trained models saved to output/models/


Interpretation Tools

  • SHAP: Global and local token importance
  • Captum LIG: Gradient-based token attribution at the embedding level

Acknowledgements

  • Hugging Face Transformers
  • PoetschLab/GROVER
  • PEFT/LoRA
  • SHAP & Captum for interpretability
  • crested for efficient sequence extraction

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bertnado-0.1.6.tar.gz (601.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bertnado-0.1.6-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file bertnado-0.1.6.tar.gz.

File metadata

  • Download URL: bertnado-0.1.6.tar.gz
  • Upload date:
  • Size: 601.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.6.tar.gz
Algorithm Hash digest
SHA256 6c29d40ae36a348096e10b851d9aa9f1c859f0ab7a63450014aa52da719ff355
MD5 54b6b6461864bbd2fc9a2f0b3cc933e4
BLAKE2b-256 816a0fbce7149e563f1acb2b0626949b6d8b4b512dc380ba6853707c247af0f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.6.tar.gz:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bertnado-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: bertnado-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 51162242c07fcf5b4356fd25c5a90069ae776340ade57917750f93cb5586dff4
MD5 f608c1da07ec1da9b283a9d304237329
BLAKE2b-256 28de7db829f31ea3a8b75f0db3ed51e344d2423a83b14209d5db6e728ffcd4e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.6-py3-none-any.whl:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page