BertNado: A framework for training and evaluating transformer-based models for Chromatin binding

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Catherine_Chahrour

These details have not been verified by PyPI

Project description

BertNado

BertNado logo

BertNado is a modular framework for fine-tuning Hugging Face DNA language models such as GROVER, NT2, and DNABERT variants on genomic prediction tasks. It supports both full fine-tuning and parameter-efficient transfer learning (PEFT) strategies like LoRA.

Features

Model Support: GROVER, NT2 (Nucleotide Transformer), DNABERT, and other Hugging Face-compatible DNA language models
Task Flexibility: Supports regression, binary, and multi-label classification, as well as masked DNA modeling
Chromosome-aware Splits: Train/val/test split by chromosome to prevent data leakage
Efficient Fine-tuning: Drop-in support for parameter-efficient tuning methods like LoRA
Hyperparameter Optimization: Integrated with Weights & Biases for Bayesian sweep-based tuning
Robust Evaluation: Automatically generates ROC, PR, and confusion matrix plots for binary classification
Model Interpretation: SHAP and Captum Layer Integrated Gradients (LIG) for biological insight
Trainer Integration: Built on Hugging Face Trainer with custom heads and metrics
W&B Logging: Full experiment tracking with Weights & Biases out of the box

Installation

git clone https://github.com/CChahrour/BertNado.git
cd BertNado
pip install -e .

Project Structure

bertnado/
├── cli.py                      # Command-line interface
├── data/
│   └── prepare_dataset.py      # Dataset creation and tokenization
├── evaluation/
│   ├── predict.py              # Predict from trained models
│   └── feature_extraction.py   # SHAP / LIG-based interpretation
└── training/
    ├── finetune.py             # Fine-tuning using best config
    ├── full_train.py           # Full training loop
    ├── model.py                # PEFT/LoRA model architecture
    ├── sweep.py                # W&B sweep setup
    ├── trainers.py             # Trainer wrappers
    └── metrics.py              # Metric computation

Quickstart

Step 1: Prepare Dataset

bertnado-data \
  --file-path test/data/mock_data.parquet \
  --target-column bound \
  --fasta-file test/data/mock_genome.fasta \
  --tokenizer-name PoetschLab/GROVER \
  --output-dir output/dataset \
  --task-type binary_classification \
  --threshold 0.5

Step 2: Run Hyperparameter Sweep

bertnado-sweep \
  --config-path test/data/mock_sweep_config.json \
  --output-dir output/sweep \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --sweep-count 2 \
  --project-name project \
  --metric-name eval/roc_auc \
  --metric-goal maximize \
  --task-type binary_classification

--config-path points to a Weights & Biases sweep config. The sweep metric is also used to choose the best checkpoint inside each run.

Step 3: Train Best Model

bertnado-train \
  --output-dir output/train \
  --model-name PoetschLab/GROVER \
  --dataset output/dataset \
  --best-config-path output/sweep/best_sweep_config.json \
  --task-type binary_classification \
  --project-name project \
  --metric-name eval/roc_auc \
  --metric-goal maximize

The metric flags are optional when best_sweep_config.json was produced by bertnado-sweep, because the resolved metric is saved in that file.

Step 4: Predict on Test Set

bertnado-predict \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/predictions \
  --task-type binary_classification

Step 5: Interpret Model with SHAP or LIG

bertnado-feature \
  --tokenizer-name PoetschLab/GROVER \
  --model-dir output/train/model \
  --dataset-dir output/dataset \
  --output-dir output/feature_analysis \
  --task-type binary_classification \
  --method shap \
  --target-class 1

Run both SHAP and LIG:

--method both --target-class 1

Outputs

Figures saved to output/figures/
- Binary classification: ROC and precision-recall curves
- Binary classification: Confusion matrix
- Multilabel classification: ROC, precision-recall, confusion matrix, and label count plots
Prediction metrics saved to output/predictions/metrics.json
- Multilabel classification: aggregate metrics plus per-label CSV metrics
SHAP scores saved to output/shap/
Trained models saved to output/models/

Interpretation Tools

SHAP: Global and local token importance
Captum LIG: Gradient-based token attribution at the embedding level

Acknowledgements

Hugging Face Transformers
PoetschLab/GROVER
PEFT/LoRA
SHAP & Captum for interpretability
crested for efficient sequence extraction

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Catherine_Chahrour

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.7

May 11, 2026

0.1.6

Apr 27, 2026

0.1.5

Apr 26, 2026

0.1.2

Apr 26, 2026

0.1.1

Apr 26, 2026

0.1.0

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bertnado-0.1.7.tar.gz (625.0 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bertnado-0.1.7-py3-none-any.whl (44.6 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file bertnado-0.1.7.tar.gz.

File metadata

Download URL: bertnado-0.1.7.tar.gz
Upload date: May 11, 2026
Size: 625.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`cdbe323287b4f99ebed9166cb93dcf9b9eebaa4d7a5add4ab5ddc99cf673183c`
MD5	`d1fe17c34c4a62ec26d194c677da51cd`
BLAKE2b-256	`69d018371f1cdc29d0ac8cf2128c24d1b114803633d6bcedd809e032bf4ba784`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.7.tar.gz:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bertnado-0.1.7.tar.gz
- Subject digest: cdbe323287b4f99ebed9166cb93dcf9b9eebaa4d7a5add4ab5ddc99cf673183c
- Sigstore transparency entry: 1509111770
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: CChahrour/BertNado@fe1b1d2fe349d91e8484f85a17aa55d43ffd2b37
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/CChahrour
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@fe1b1d2fe349d91e8484f85a17aa55d43ffd2b37
- Trigger Event: release

File details

Details for the file bertnado-0.1.7-py3-none-any.whl.

File metadata

Download URL: bertnado-0.1.7-py3-none-any.whl
Upload date: May 11, 2026
Size: 44.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bertnado-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3974a334c40d832045120f6e2ae906191b7c1e9955ca5fc5b97075cae96e65f`
MD5	`eef594357e027e17416309a52e973244`
BLAKE2b-256	`1b4d93e8b0ae9b2e91610ea16985737b02fe17b23658069d358a1597b1476652`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bertnado-0.1.7-py3-none-any.whl:

Publisher: pypi.yml on CChahrour/BertNado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bertnado-0.1.7-py3-none-any.whl
- Subject digest: f3974a334c40d832045120f6e2ae906191b7c1e9955ca5fc5b97075cae96e65f
- Sigstore transparency entry: 1509111929
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: CChahrour/BertNado@fe1b1d2fe349d91e8484f85a17aa55d43ffd2b37
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/CChahrour
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@fe1b1d2fe349d91e8484f85a17aa55d43ffd2b37
- Trigger Event: release

bertnado 0.1.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

BertNado

Features

Installation

Project Structure

Quickstart

Step 1: Prepare Dataset

Step 2: Run Hyperparameter Sweep

Step 3: Train Best Model

Step 4: Predict on Test Set

Step 5: Interpret Model with SHAP or LIG

Outputs

Interpretation Tools

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance