A unified Python library for structural bioinformatics, MD, protein engineering, and ML.

These details have not been verified by PyPI

Project description

molforge

Molforge Logo

A forge for protein workflows. One Python script, every tool in your stack: docking, MD, folding, antibody and nanobody engineering, de novo design — without the format-conversion tax.

molforge is an open-source Python library that lets you compose protein workflows across the tools you already use. Bring your structures and sequences in, plug in your engines of choice (Vina, OpenMM, ESMFold, AlphaFold, RFdiffusion, ProteinMPNN, your own model), and walk out with a coherent pipeline instead of five incompatible Python environments and a graveyard of conversion scripts.

Documentation: doctordean.github.io/molforge

Why molforge exists

Modern protein work is multi-tool by nature. A real antibody-design loop might fold a sequence with ESMFold, identify CDR loops with anarci, score binding with FoldX, dock against a target with AutoDock Vina, relax with OpenMM, then evaluate with Rosetta. Each of those tools speaks its own dialect: different file formats, different atom-naming conventions, different ideas of what "the structure" is. Most of an engineer's day disappears into glue code.

molforge is the connective tissue. It provides:

A canonical, NumPy-backed data model that's cheap to convert in and out of — so every engine in your pipeline reads from and writes to the same representation.
Thin wrappers around the engines you already trust, with consistent interfaces (so swapping ESMFold for AlphaFold is one line, not a refactor).
First-class IO for the messy reality of structural-bio files: PDB, mmCIF, FASTA, PDBQT, PQR, SDF, MOL2, and AlphaFold predictions with pLDDT.
A plugin registry so the next docking engine, folding model, or scoring function can slot into your pipeline without forking molforge.

Built as a library, not a framework: there's no orchestrator, no DAG runtime, no decorators you have to import to make things work. Use whatever workflow tool you like — Snakemake, Nextflow, Prefect, a shell script — molforge is just imports.

Design principles

Workflows over silos. Every design decision is judged by "does this make it easier to chain N tools together?"
Wrappers, not reimplementations. We don't rebuild OpenMM or AutoDock. We give them a shared vocabulary.
One data model, two views. Hierarchical (protein.chains["A"].residues[42]) for biology, linear (protein.atom_array.coords) for ML — same data, no conversion.
Heterogeneous content is first-class. Antibodies have glycans. Drug targets have ligands and ions. Membrane proteins have lipids. The data model handles all of it without an awkward special case for "non-protein."
Typed, tested, documented. Strict mypy, ruff-clean, >90% coverage target.

Installation

# minimal core (data model + sequence + basic IO)
pip install molforge

# with structure analysis (RMSD, SASA, contacts)
pip install "molforge[structure]"

# with ML wrappers (torch, transformers, esm)
pip install "molforge[ml]"

# with MD support (openmm, mdtraj)
pip install "molforge[md]"

# with docking (rdkit for small molecules)
pip install "molforge[docking]"

# everything
pip install "molforge[all]"

# development
git clone https://github.com/DoctorDean/molforge.git
cd molforge
pip install -e ".[dev,all]"

Quickstart

The smallest end-to-end example that shows the cross-tool point:

import molforge as mf
from molforge.wrappers.folding import ESMFold
from molforge.wrappers.docking import Vina
from molforge.wrappers.md import OpenMM

# 1. Fold a sequence
folded = ESMFold().predict("MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVS...")

# 2. Save as PDB, save as mmCIF, hand to anything
mf.save(folded, "candidate.pdb")
mf.save(folded, "candidate.cif")

# 3. Dock a ligand against it (Vina-prepared PDBQT files)
result = Vina().dock(
    receptor="receptor.pdbqt",
    ligand="ligand.pdbqt",
    center=(10.0, 5.0, -2.0),
    box_size=(20.0, 20.0, 20.0),
)
top_pose = result.best

# 4. Drop into MD for relaxation
trajectory = OpenMM().simulate(top_pose.complex, steps=10_000)

# 5. Inspect — hierarchical or linear, your call
print(folded.sequence)                            # one-letter per chain
print(folded.atom_array.coords.shape)             # (N, 3) NumPy array
ca = folded.chains["A"].residues[42].atoms["CA"]  # specific atom

Notice what isn't there: file-format conversions, atom-name remapping, hand-rolled PDB parsers, custom data classes per engine. molforge does that work so your script reads like the science you're actually doing.

Worked examples and walkthroughs (notebooks/):

de_novo_design.ipynb — de novo design loop: RFdiffusion → ProteinMPNN → ESMFold → scoring.

cross_engine_validation.ipynb — two-validator consensus pattern in detail (ESMFold + AlphaFold).

end_to_end_design.ipynb — full mutation loop: fold → analyze → mutate → re-fold → compare.

01_sequences.ipynb — alignment, mutations, composition.

02_structures.ipynb — RMSD, contacts, DSSP, SASA, dihedrals.

03_md_simulations.ipynb — OpenMM prepare → minimize → run flow, trajectory analysis.

04_docking.ipynb — Vina with automatic ligand prep.

05_ml_featurization.ipynb — one-hot, RBF distances, ESM-2 embeddings, graph construction.

06_plugin_authoring.ipynb — register custom engines, parsers, and scorers.

Repository structure

molforge/
├── src/molforge/             # Library source (src-layout)
│   ├── core/                 # Hierarchical + linear data model
│   ├── sequence/             # Sequence operations, alignment, mutations
│   ├── structure/            # RMSD, SASA, contacts, geometry
│   ├── md/                   # MD trajectories and analysis
│   ├── docking/              # Docking abstractions and pose handling
│   ├── ml/                   # ML utilities, featurizers, tensor views
│   ├── io/                   # PDB, mmCIF, FASTA, PDBQT, PQR, SDF, MOL2
│   ├── plugins/              # Plugin registry and entry-point discovery
│   ├── metrics/              # TM-score, lDDT, GDT-TS, docking metrics
│   └── wrappers/             # Thin interfaces to external engines
│       ├── folding/          # AlphaFold, ESMFold, Boltz, Rosetta
│       ├── docking/          # AutoDock Vina, DiffDock
│       └── md/               # OpenMM, GROMACS
├── tests/                    # pytest suite (909 passing + skips)
│   ├── fixtures/pdb/         # synthetic mini_*.pdb + realistic real_*.pdb fixtures
│   ├── unit/                 # per-subpackage unit tests
│   ├── integration/          # end-to-end tests against the realistic fixtures
│   └── benchmarks/           # performance benchmarks (pytest -m benchmark)
├── docs/                     # Architecture docs and reference
├── notebooks/                # Walkthroughs and worked examples
├── plugins/                  # Example external plugins
├── pyproject.toml            # Build config, deps, tool config
└── ACKNOWLEDGEMENTS.md       # Prior art and intellectual debts

A deeper architecture walkthrough is in docs/architecture/overview.md.

Status

molforge is pre-1.0 and under active development. What's working today:

Core data model — Protein / Chain / Residue / Atom over a canonical NumPy-backed AtomArray, with first-class heterogeneous content (ligands, water, ions, modified residues).
File I/O — full read/write for PDB (with NMR ensembles, altlocs, insertion codes) and mmCIF (the modern format for large structures); FASTA sequence I/O; AlphaFold loader that surfaces pLDDT as a first-class field. PDBQT, PQR, SDF, MOL2 are stubbed with committed APIs.
Sequence operations — pairwise alignment (Needleman-Wunsch / Smith-Waterman with BLOSUM62 / PAM250), point mutations with protein-engineering notation (A123V, A123V/T56K, H:K42N), composition and property helpers (MW, GRAVY, aromaticity).
Structural analysis — Kabsch/Umeyama superposition, RMSD (whole-structure and per-residue, multiple atom subsets), contact and distance maps, radius of gyration, centroid / center of mass, in-place translate / rotate, DSSP secondary-structure assignment (8-state and 3-state, no external binary), SASA (Shrake-Rupley, no FreeSASA dependency), and backbone dihedrals (φ, ψ, ω, Ramachandran).
Validation utilities — molforge.validation orchestrates the "score designs across multiple validators and combine results" pattern. Declarative Criterion (composable with & / | / ~), CriteriaSet for per-criterion diagnostics, cross_validate to run designs through one or more validators, consensus to merge verdict lists ("ESMFold AND AlphaFold both pass" / "majority of validators pass" / threshold rules).
Evaluation metrics — molforge.metrics ships the standard structural-prediction metrics: TM-score (Zhang & Skolnick), GDT-TS / GDT-HA (CASP), lDDT (alignment-free, what AlphaFold's pLDDT estimates), and DockQ (Basu & Wallner, for protein-protein complexes). NumPy-only — no tmalign/lddt binaries required.
ML featurization — sequence featurizers (one-hot, BLOSUM/PAM, positional encoding), structure featurizers (RBF-binned distances, pair orientations, local environment), ESM-2 protein language model embeddings, and graph construction (to_graph → PyTorch Geometric / DGL).
Four engine-wrapper categories live end-to-end — folding (ESMFold + AlphaFold/ColabFold), docking (AutoDock Vina with automatic meeko/RDKit prep), MD (OpenMM with full prepare → minimize → run flow), and now generative design (RFdiffusion for backbone generation, ProteinMPNN for sequence design). The full de novo design loop is in one library.

Coming next: DiffDock wrapper, GROMACS MD wrapper, explicit-solvent prep helpers, ML featurization for downstream models. See CHANGELOG.md for the full picture.

Acknowledgements

molforge is inspired by Protkit (SilicoGenesis), which pioneered the idea of a unified, hierarchical representation for protein structures in Python. molforge extends that direction toward cross-tool, cross-format workflows and a different internal architecture (NumPy-backed linear store, hierarchical views as accessors). See ACKNOWLEDGEMENTS.md for the longer list of projects we've learned from.

Contributing

We welcome contributions. See CONTRIBUTING.md and the Code of Conduct before opening an issue or PR.

License

MIT — see LICENSE.

Citation

If you use molforge in academic work, please cite us (BibTeX coming with the first tagged release).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Jun 22, 2026

This version

0.2.0

May 26, 2026

0.1.0

May 20, 2026

0.0.3

May 20, 2026

0.0.1

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molforge-0.2.0.tar.gz (254.5 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

molforge-0.2.0-py3-none-any.whl (187.0 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file molforge-0.2.0.tar.gz.

File metadata

Download URL: molforge-0.2.0.tar.gz
Upload date: May 26, 2026
Size: 254.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for molforge-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`0df537ef806981929e002379b3eccfd96d186c2333517fd4c7b32b81646e9bbd`
MD5	`7124d4529ce0521210aac6850422e486`
BLAKE2b-256	`b15fc13dc7caa360f5a5bdd8ce563abf58b1fa27c373c24016f4d1cd9e5affe4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for molforge-0.2.0.tar.gz:

Publisher: release.yml on DoctorDean/molforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: molforge-0.2.0.tar.gz
- Subject digest: 0df537ef806981929e002379b3eccfd96d186c2333517fd4c7b32b81646e9bbd
- Sigstore transparency entry: 1632765155
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: DoctorDean/molforge@7a6d2de24677bfffa4b53b6727687eb5f23cace6
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DoctorDean
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7a6d2de24677bfffa4b53b6727687eb5f23cace6
- Trigger Event: push

File details

Details for the file molforge-0.2.0-py3-none-any.whl.

File metadata

Download URL: molforge-0.2.0-py3-none-any.whl
Upload date: May 26, 2026
Size: 187.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for molforge-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`891a72bfd313d02757ca8d81bc8c37d4069aa7de2579506e939b779688c6c73f`
MD5	`94ae3abbda4c3be522ef5081b8c27a4e`
BLAKE2b-256	`e6f9d95e8cf74141f67e1ae57134f770b2783b30963f7b403cddb3fa11fce0fc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for molforge-0.2.0-py3-none-any.whl:

Publisher: release.yml on DoctorDean/molforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: molforge-0.2.0-py3-none-any.whl
- Subject digest: 891a72bfd313d02757ca8d81bc8c37d4069aa7de2579506e939b779688c6c73f
- Sigstore transparency entry: 1632765169
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: DoctorDean/molforge@7a6d2de24677bfffa4b53b6727687eb5f23cace6
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DoctorDean
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7a6d2de24677bfffa4b53b6727687eb5f23cace6
- Trigger Event: push

molforge 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

molforge

Why molforge exists

Design principles

Installation

Quickstart

Repository structure

Status

Acknowledgements

Contributing

License

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance