BertNado: A framework for training and evaluating transformer-based models for Chromatin binding
Project description
BertNado
BertNado is a modular framework for fine-tuning Hugging Face DNA language models such as GROVER, NT2, and DNABERT variants on genomic prediction tasks. It supports both full fine-tuning and parameter-efficient transfer learning (PEFT) strategies like LoRA.
Features
- Model Support: GROVER, NT2 (Nucleotide Transformer), DNABERT, and other Hugging Face-compatible DNA language models
- Task Flexibility: Supports regression, binary, and multi-label classification, as well as masked DNA modeling
- Chromosome-aware Splits: Train/val/test split by chromosome to prevent data leakage
- Efficient Fine-tuning: Drop-in support for parameter-efficient tuning methods like LoRA
- Hyperparameter Optimization: Integrated with Weights & Biases for Bayesian sweep-based tuning
- Robust Evaluation: Automatically generates ROC, PR, and confusion matrix plots for binary classification
- Model Interpretation: SHAP and Captum Layer Integrated Gradients (LIG) for biological insight
- Trainer Integration: Built on Hugging Face Trainer with custom heads and metrics
- W&B Logging: Full experiment tracking with Weights & Biases out of the box
Installation
git clone https://github.com/CChahrour/BertNado.git
cd BertNado
pip install -e .
Project Structure
bertnado/
├── cli.py # Command-line interface
├── data/
│ └── prepare_dataset.py # Dataset creation and tokenization
├── evaluation/
│ ├── predict.py # Predict from trained models
│ └── feature_extraction.py # SHAP / LIG-based interpretation
└── training/
├── finetune.py # Fine-tuning using best config
├── full_train.py # Full training loop
├── model.py # PEFT/LoRA model architecture
├── sweep.py # W&B sweep setup
├── trainers.py # Trainer wrappers
└── metrics.py # Metric computation
Quickstart
Step 1: Prepare Dataset
bertnado-data \
--file-path test/data/mock_data.parquet \
--target-column bound \
--fasta-file test/data/mock_genome.fasta \
--tokenizer-name PoetschLab/GROVER \
--output-dir output/dataset \
--task-type binary_classification \
--threshold 0.5
Step 2: Run Hyperparameter Sweep
bertnado-sweep \
--config-path test/data/mock_sweep_config.json \
--output-dir output/sweep \
--model-name PoetschLab/GROVER \
--dataset output/dataset \
--sweep-count 2 \
--project-name project \
--metric-name eval/roc_auc \
--metric-goal maximize \
--task-type binary_classification
--config-path points to a Weights & Biases sweep config. The sweep metric is
also used to choose the best checkpoint inside each run.
Step 3: Train Best Model
bertnado-train \
--output-dir output/train \
--model-name PoetschLab/GROVER \
--dataset output/dataset \
--best-config-path output/sweep/best_sweep_config.json \
--task-type binary_classification \
--project-name project \
--metric-name eval/roc_auc \
--metric-goal maximize
The metric flags are optional when best_sweep_config.json was produced by
bertnado-sweep, because the resolved metric is saved in that file.
Step 4: Predict on Test Set
bertnado-predict \
--tokenizer-name PoetschLab/GROVER \
--model-dir output/train/model \
--dataset-dir output/dataset \
--output-dir output/predictions \
--task-type binary_classification
Step 5: Interpret Model with SHAP or LIG
bertnado-feature \
--tokenizer-name PoetschLab/GROVER \
--model-dir output/train/model \
--dataset-dir output/dataset \
--output-dir output/feature_analysis \
--task-type binary_classification \
--method shap \
--target-class 1
Run both SHAP and LIG:
--method both --target-class 1
Outputs
-
Figures saved to
output/figures/- Binary classification: ROC and precision-recall curves
- Binary classification: Confusion matrix
-
SHAP scores saved to
output/shap/ -
Trained models saved to
output/models/
Interpretation Tools
- SHAP: Global and local token importance
- Captum LIG: Gradient-based token attribution at the embedding level
Acknowledgements
- Hugging Face Transformers
- PoetschLab/GROVER
- PEFT/LoRA
- SHAP & Captum for interpretability
crestedfor efficient sequence extraction
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bertnado-0.1.6.tar.gz.
File metadata
- Download URL: bertnado-0.1.6.tar.gz
- Upload date:
- Size: 601.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c29d40ae36a348096e10b851d9aa9f1c859f0ab7a63450014aa52da719ff355
|
|
| MD5 |
54b6b6461864bbd2fc9a2f0b3cc933e4
|
|
| BLAKE2b-256 |
816a0fbce7149e563f1acb2b0626949b6d8b4b512dc380ba6853707c247af0f4
|
Provenance
The following attestation bundles were made for bertnado-0.1.6.tar.gz:
Publisher:
pypi.yml on CChahrour/BertNado
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bertnado-0.1.6.tar.gz -
Subject digest:
6c29d40ae36a348096e10b851d9aa9f1c859f0ab7a63450014aa52da719ff355 - Sigstore transparency entry: 1393119388
- Sigstore integration time:
-
Permalink:
CChahrour/BertNado@a55dbbffae9d52597e3ba5e13b36dd882956f732 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/CChahrour
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@a55dbbffae9d52597e3ba5e13b36dd882956f732 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bertnado-0.1.6-py3-none-any.whl.
File metadata
- Download URL: bertnado-0.1.6-py3-none-any.whl
- Upload date:
- Size: 41.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51162242c07fcf5b4356fd25c5a90069ae776340ade57917750f93cb5586dff4
|
|
| MD5 |
f608c1da07ec1da9b283a9d304237329
|
|
| BLAKE2b-256 |
28de7db829f31ea3a8b75f0db3ed51e344d2423a83b14209d5db6e728ffcd4e2
|
Provenance
The following attestation bundles were made for bertnado-0.1.6-py3-none-any.whl:
Publisher:
pypi.yml on CChahrour/BertNado
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bertnado-0.1.6-py3-none-any.whl -
Subject digest:
51162242c07fcf5b4356fd25c5a90069ae776340ade57917750f93cb5586dff4 - Sigstore transparency entry: 1393119435
- Sigstore integration time:
-
Permalink:
CChahrour/BertNado@a55dbbffae9d52597e3ba5e13b36dd882956f732 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/CChahrour
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@a55dbbffae9d52597e3ba5e13b36dd882956f732 -
Trigger Event:
release
-
Statement type: