ModelGenerator is an opinionated plug-and-play research framework for cross-disciplinary teams in ML & Bio

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

calebellington

These details have not been verified by PyPI

Project description

ModelGenerator

ModelGenerator is an opinionated plug-and-play research framework for cross-disciplinary teams in ML & Bio.

ModelGenerator is designed to enable rapid and reproducible prototyping with four kinds of experiments in mind:

Applying pre-trained foundation models to new data
Developing new finetuning and inference tasks for foundation models
Benchmarking foundation models and creating leaderboards
Testing new architectures for finetuning performance

while also scaling with hardware and integrating with larger data pipelines or research workflows.

ModelGenerator is built on PyTorch, HuggingFace, and Lightning, and works seamlessly with these ecosystems.

Who uses ModelGenerator?

🧬 Biologists

Intuitive one-command CLIs for in silico experiments
Pre-trained model zoo
Broad data compatibility
Pipeline-oriented workflows

🤖 ML Researchers

Reproducible-by-design experiments
Architecture A/B testing
Automatic hardware scaling
Integration with PyTorch, Lightning, HuggingFace, and WandB

☕ Software Engineers

Extensible and modular models, tasks, and data
Strict typing and documentation
Fail-fast interface design
Continuous integration and testing

🤝 Everyone benefits from

A collaborative hub and focal point for multidisciplinary work on experiments, models, software, and data
Community-driven development
Permissive license for academic and non-commercial use

Projects using ModelGenerator

Quick Start

Installation

pip install modelgenerator

# To use AIDO.StructureTokenizer, also install openfold and dllogger
pip install git+https://github.com/genbio-ai/openfold.git@c4aa2fd0d920c06d3fd80b177284a22573528442
pip install git+https://github.com/NVIDIA/dllogger.git@0540a43971f4a8a16693a9de9de73c1072020769

Good for running inference, reproducing published experiments, or finetuning on new data

Developer Installation

git clone https://github.com/genbio-ai/ModelGenerator
cd modelgenerator
pip install -e .

Necessary to add new backbones, finetuning tasks, or data transformations

Quick Start

Get embeddings from a pre-trained model

mgen predict --model Embed --model.backbone aido_dna_dummy \
  --data SequencesDataModule --data.path genbio-ai/100m-random-promoters \
  --data.x_col sequence --data.id_col sequence --data.test_split_size 0.0001 \
  --config configs/examples/save_predictions.yaml

Get token probabilities from a pre-trained model

mgen predict --model Inference --model.backbone aido_dna_dummy \
  --data SequencesDataModule --data.path genbio-ai/100m-random-promoters \
  --data.x_col sequence --data.id_col sequence --data.test_split_size 0.0001 \
  --config configs/examples/save_predictions.yaml

Finetune a pre-trained model

mgen fit --model ConditionalDiffusion --model.backbone aido_dna_dummy \
  --data ConditionalDiffusionDataModule --data.path "genbio-ai/100m-random-promoters"

Evaluate a model checkpoint

mgen test --model ConditionalDiffusion --model.backbone aido_dna_dummy \
  --data ConditionalDiffusionDataModule --data.path "genbio-ai/100m-random-promoters" \
  --ckpt_path logs/lightning_logs/version_X/checkpoints/<your_model>.ckpt

Save predictions

mgen predict --model ConditionalDiffusion --model.backbone aido_dna_dummy \
  --data ConditionalDiffusionDataModule --data.path "genbio-ai/100m-random-promoters" \
  --ckpt_path logs/lightning_logs/version_X/checkpoints/<your_model>.ckpt \
  --config configs/examples/save_predictions.yaml

Configify your experiment

The command

mgen fit --model ConditionalDiffusion --model.backbone aido_dna_dummy \
  --data ConditionalDiffusionDataModule --data.path "genbio-ai/100m-random-promoters"

is equivalent to mgen fit --config my_config.yaml with

# my_config.yaml
model:
  class_path: ConditionalDiffusion
  init_args:
    backbone: aido_dna_dummy
data:
  class_path: ConditionalDiffusionDataModule
  init_args:
    path: "genbio-ai/100m-random-promoters"

Use composable configs to customize workflows

mgen fit --model SequenceRegression --data PromoterExpressionRegression \
  --config configs/defaults.yaml \
  --config configs/examples/lora_backbone.yaml \
  --config configs/examples/wandb.yaml

Configs use the LAST value for each attribute. Check the full configuration in logs/lightning_logs/your-experiment/config.yaml, or if using wandb logs/config.yaml.

Use LoRA for parameter-efficient finetuning

This also avoids saving the full model, only the LoRA weights are saved.

mgen fit --data PromoterExpressionRegression \
  --model SequenceRegression --model.backbone.use_peft true \
  --model.backbone.lora_r 16 \
  --model.backbone.lora_alpha 32 \
  --model.backbone.lora_dropout 0.1

Use continued pretraining for finetuning domain adaptation

First run pretraining objective on finetuning data

# https://arxiv.org/pdf/2310.02980
mgen fit --model MLM --model.backbone aido_dna_dummy \
  --data MLMDataModule --data.path leannmlindsey/GUE \
  --data.config_name prom_core_notata

Then finetune using the adapted model

mgen fit --model SequenceClassification --model.strict_loading false \
  --data SequenceClassificationDataModule --data.path leannmlindsey/GUE \
  --data.config_name prom_core_notata \
  --ckpt_path logs/lightning_logs/version_X/checkpoints/<your_adapted_model>.ckpt

Make sure to turn off strict_loading to replace the adapter!

Use the head/adapter that comes with the backbone

mgen fit --model SequenceClassification --data GUEClassification \
--model.use_legacy_adapter true

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

calebellington

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3.post0

Dec 12, 2025

0.1.3

Nov 8, 2025

0.1.2

May 13, 2025

0.1.1.post5

Dec 21, 2024

0.1.1.post4

Dec 11, 2024

0.1.1.post3

Dec 11, 2024

0.1.1.post2

Dec 10, 2024

0.1.1.post1

Dec 10, 2024

0.1.1

Dec 10, 2024

This version

0.1.0

Dec 10, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelgenerator-0.1.0.tar.gz (355.7 kB view details)

Uploaded Dec 10, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

modelgenerator-0.1.0-py3-none-any.whl (335.2 kB view details)

Uploaded Dec 10, 2024 Python 3

File details

Details for the file modelgenerator-0.1.0.tar.gz.

File metadata

Download URL: modelgenerator-0.1.0.tar.gz
Upload date: Dec 10, 2024
Size: 355.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for modelgenerator-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`54c66cc03b04ccf50c21bba829661a20a6f96cd4c1eb750f81c270ebf9450e83`
MD5	`d3a14a7ecbac7d490934bce2ecf761f8`
BLAKE2b-256	`fec1244b7f090fd5e0bdb1665c2ecdf5ef9216e9cff377ea1b7f93a57ded1571`

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelgenerator-0.1.0.tar.gz:

Publisher: publish.yml on genbio-ai/ModelGenerator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: modelgenerator-0.1.0.tar.gz
- Subject digest: 54c66cc03b04ccf50c21bba829661a20a6f96cd4c1eb750f81c270ebf9450e83
- Sigstore transparency entry: 154370425
- Sigstore integration time: Dec 10, 2024
Source repository:
- Permalink: genbio-ai/ModelGenerator@01bee70fcacb1026a5dd4f026a50c208244401dd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/genbio-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@01bee70fcacb1026a5dd4f026a50c208244401dd
- Trigger Event: release

File details

Details for the file modelgenerator-0.1.0-py3-none-any.whl.

File metadata

Download URL: modelgenerator-0.1.0-py3-none-any.whl
Upload date: Dec 10, 2024
Size: 335.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for modelgenerator-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe7db73d9ee48bbdc8bc9db9905262a5e2cf4a5129afb71c650b5551376b2388`
MD5	`b875d886a5907d326671d0c32e64c8d3`
BLAKE2b-256	`308d331fc89440514efdb25f1b4a912b4ac41f867594ab0d6e22f8e47d7c43af`

See more details on using hashes here.

Provenance

The following attestation bundles were made for modelgenerator-0.1.0-py3-none-any.whl:

Publisher: publish.yml on genbio-ai/ModelGenerator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: modelgenerator-0.1.0-py3-none-any.whl
- Subject digest: fe7db73d9ee48bbdc8bc9db9905262a5e2cf4a5129afb71c650b5551376b2388
- Sigstore transparency entry: 154370426
- Sigstore integration time: Dec 10, 2024
Source repository:
- Permalink: genbio-ai/ModelGenerator@01bee70fcacb1026a5dd4f026a50c208244401dd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/genbio-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@01bee70fcacb1026a5dd4f026a50c208244401dd
- Trigger Event: release

modelgenerator 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ModelGenerator

Who uses ModelGenerator?

🧬 Biologists

🤖 ML Researchers

☕ Software Engineers

🤝 Everyone benefits from

Projects using ModelGenerator

Quick Start

Installation

Developer Installation

Quick Start

Get embeddings from a pre-trained model

Get token probabilities from a pre-trained model

Finetune a pre-trained model

Evaluate a model checkpoint

Save predictions

Configify your experiment

Use composable configs to customize workflows

Use LoRA for parameter-efficient finetuning

Use continued pretraining for finetuning domain adaptation

Use the head/adapter that comes with the backbone

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance