Skip to main content

BertNado: A framework for training and evaluating transformer-based models for Chromatin binding

Project description

BertNado

QuantNado logo

BertNado is a modular framework for fine-tuning Hugging Face DNA language models such as GROVER, NT2, and DNABERT variants on genomic prediction tasks. It supports both full fine-tuning and parameter-efficient transfer learning (PEFT) strategies like LoRA.


Features

  • Model Support: GROVER, NT2 (Nucleotide Transformer), DNABERT, and other Hugging Face-compatible DNA language models
  • Task Flexibility: Supports regression, binary, and multi-label classification, as well as masked DNA modeling
  • Chromosome-aware Splits: Train/val/test split by chromosome to prevent data leakage
  • Efficient Fine-tuning: Drop-in support for parameter-efficient tuning methods like LoRA
  • Hyperparameter Optimization: Integrated with Weights & Biases for Bayesian sweep-based tuning
  • Robust Evaluation: Automatically generates R², ROC, PR, and confusion matrix plots
  • Model Interpretation: SHAP and Captum Layer Integrated Gradients (LIG) for biological insight
  • Trainer Integration: Built on Hugging Face Trainer with custom heads and metrics
  • W&B Logging: Full experiment tracking with Weights & Biases out of the box

Installation

git clone https://github.com/CChahrour/BertNado.git
cd BertNado
pip install -e .

Project Structure

bertnado/
├── cli.py                      # Command-line interface
├── data/
│   └── prepare_dataset.py      # Dataset creation and tokenization
├── evaluation/
│   ├── predict.py              # Predict from trained models
│   └── feature_extraction.py   # SHAP / LIG-based interpretation
└── training/
    ├── finetune.py             # Fine-tuning using best config
    ├── full_train.py           # Full training loop
    ├── model.py                # PEFT/LoRA model architecture
    ├── sweep.py                # W&B sweep setup
    ├── trainers.py             # Trainer wrappers
    └── metrics.py              # Metric computation

Quickstart

Step 1: Prepare Dataset

bertnado-data \
  --file-path test/data/mock_data.parquet \
  --target-column test_A \
  --fasta-file test/data/mock_genome.fasta \
  --tokenizer-name PoetschLab/GROVER \
  --output-dir output/dataset \
  --task-type regression

Step 2: Run Hyperparameter Sweep

bertnado-sweep \
  --config-path test/data/mock_sweep_config.json \
  --output-dir output/sweep \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --sweep-count 2 \
  --project-name project \
  --task-type regression

Step 3: Train Best Model

bertnado-train \
  --output-dir output/train \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --best-config-path output/sweep/best_sweep_config.json \
  --task-type regression \
  --project-name project

Step 4: Predict on Test Set

bertnado-predict \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/predictions \
  --task-type regression

Step 5: Interpret Model with SHAP or LIG

bertnado-feature \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/feature_analysis \
  --task-type regression \
  --method shap

Run both SHAP and LIG:

--method both

Outputs

  • Figures saved to output/figures/

    • Regression: R² scatter plot
    • Classification: ROC & PR curves
    • Binary: Confusion matrix
  • SHAP scores saved to output/shap/

  • Trained models saved to output/models/


Interpretation Tools

  • SHAP: Global and local token importance
  • Captum LIG: Gradient-based token attribution at the embedding level

Acknowledgements

  • Hugging Face Transformers
  • PoetschLab/GROVER
  • PEFT/LoRA
  • SHAP & Captum for interpretability
  • crested for efficient sequence extraction

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bertnado-0.1.0.tar.gz (571.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bertnado-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file bertnado-0.1.0.tar.gz.

File metadata

  • Download URL: bertnado-0.1.0.tar.gz
  • Upload date:
  • Size: 571.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2e3420494e9d828d03e4a19139048cacc9b51a04e35723e91e44aeb9fa3c306a
MD5 9e5375a102a91e068de44d81c1c2fe0d
BLAKE2b-256 d31130902af4642ac6cac31ed019bce03574eb77077d4ae4aea35c49b6a08bf0

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.0.tar.gz:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bertnado-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bertnado-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d5a2a8d2108dfe8c0a5d9b3068972922cd3bbdb83c4a7561f02dabf71587aee
MD5 39e6b081a6348df3768bea10d9a31da0
BLAKE2b-256 518c05d838ad8ae503111ccd0d56d48eba93d6598d0f0f213459e0a61972448f

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.0-py3-none-any.whl:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page