Skip to main content

Lightweight molecular structure analysis, visualisation, graph export, and coarse-graining in Python.

Project description

MolScope

CI Python License: MIT Code style: Ruff

Lightweight molecular structure analysis, visualisation, graph export, and coarse-graining in Python. Read .xyz, .pdb, .cif and .sdf files (optionally gzip-compressed), select and analyse atoms, and visualise them in 3D. The .cif reader is a basic mmCIF parser for standard _atom_site coordinate loops, not a full mmCIF syntax implementation.

3D structure (element) Secondary structure (DSSP) Residue contact map Coarse-grained beads
Aquaporin-1 rendered as a 3D element-coloured molecular structure Aquaporin-1 coloured by DSSP secondary structure: helices red, turns cyan, coil grey Residue-level contact map heatmap for Aquaporin-1 Coarse-grained bead model of Aquaporin-1

What it does

  • Read and write XYZ, PDB, mmCIF and SDF (gzip-aware), fetch structures by id from RCSB, and load multi-model NMR ensembles.
  • Select and measure by chain, element or residue; compute distances, angles, dihedrals and Kabsch-aligned RMSD.
  • Analyse centroids, radius of gyration, the inertia tensor, inferred bonds and contacts.
  • Contact maps at atom or residue level, with heatmap plots.
  • Secondary structure via a self-contained, dependency-free DSSP, with plot(color_by="ss").
  • Ensembles: pairwise RMSD, RMSF, averaging, and conformer clustering.
  • Export for ML: flat structural descriptors and molecular graphs for NetworkX, PyTorch Geometric and DGL.
  • Coarse-grain onto residue, Martini-style or custom bead mappings.
  • Visualise with 3D matplotlib plots, an interactive py3Dmol viewer, spin GIFs, and a command-line interface.

Why MolScope?

MolScope is not intended to replace full molecular-simulation or cheminformatics frameworks. It is a lightweight educational and prototyping toolkit for reading common molecular structure files, performing simple structural analysis, exporting graph representations for ML workflows, and experimenting with coarse-grained mappings. Its core depends only on NumPy and Matplotlib, and the API is Python-first and scriptable.

In particular, the coarse-graining tools are for educational CG mapping and bead-graph prototyping: useful for exploring mappings before moving to a production Martini workflow. They are not a validated Martini force-field generator.

Tool Main focus How MolScope differs
RDKit Cheminformatics MolScope leans toward structure visualisation, protein/PDB-style metadata, and CG prototyping
MDAnalysis MD trajectories MolScope is lighter and easier for static structures and teaching
MDTraj Trajectory analysis MolScope is simpler and graph/CG oriented
Biopython Structure parsing / bioinformatics MolScope adds 3D analysis, ML-graph export, and coarse-graining
PyMOL / VMD Interactive visualisation MolScope is Python-first, scriptable, and ML-export friendly
nglview Notebook structure viewer MolScope also does analysis, descriptors, graphs and CG, not just viewing

Reach for those tools when you need their depth and validation. Reach for MolScope when you want something small, readable, and quick to teach or prototype with.

Install

With uv (recommended):

uv sync                     # creates .venv, installs deps + dev tools from the lockfile
uv run molscope 1fqy.pdb  # run the CLI
uv run pytest               # run the tests

uv sync pins the interpreter from .python-version and resolves against uv.lock for reproducible installs. Use uv sync --no-dev to skip the test tools.

With plain pip:

python -m venv .venv && source .venv/bin/activate
pip install -e ".[test]"    # or: pip install -r requirements.txt

Documentation

The documentation website is built with MkDocs Material:

uv sync --group docs
uv run mkdocs serve
python scripts/build_user_guide_pdf.py

Docs source lives in docs/; the site configuration is mkdocs.yml. The PDF builder requires Pandoc and a LaTeX engine such as xelatex.

Quickstart

A runnable end-to-end tour over the bundled sample structures lives in example.py:

uv run python example.py                  # opens 3D plot windows
MPLBACKEND=Agg uv run python example.py   # headless: saves PNGs instead

It reads an .xyz and a .pdb, prints derived properties, compares the NMR models of 1aml, writes a transformed structure back out, and renders a plot.

Library

import molscope as ms

mol = ms.read("1fqy.pdb")          # parser chosen from the extension
mol = ms.fetch("1fqy")             # ...or download straight from RCSB by id
print(mol.summary())                # atoms, formula, chains, bounding box

mol = mol.centered().rotate("z", 90).translate((1, 2, -1))
mol.plot()                          # CPK colours, inferred bonds, equal aspect

Molecule is immutable: translate, centered and rotate each return a new molecule, so transformations chain cleanly without aliasing. Equality is by value (np.array_equal on coordinates).

Selections

PDB files, and standard mmCIF atom-site loops, carry per-atom metadata (atom name, residue, chain), so you can slice a structure:

mol.select(chain="A")               # one chain
mol.select(element="C")             # all carbons
mol.select(resname="HOH")           # waters
mol.select(resid=(10, 20))          # an inclusive residue range
mol.alpha_carbons()                 # CA atoms (the usual basis for protein RMSD)
mol.backbone()                      # N, CA, C, O
mol[mask_or_indices]                # subset by numpy mask / index array

Analysis and measurements

mol.centroid, mol.center_of_mass    # geometric / mass-weighted centre
mol.radius_of_gyration              # compactness (angstrom)
mol.dimensions, mol.formula         # bounding box, Hill-order formula
mol.bonds()                         # inferred bond index pairs (KD-tree if scipy)
mol.contacts(cutoff=5.0)            # atom pairs within a distance

mol.distance(i, j)                  # bond length
mol.angle(i, j, k)                  # bond angle (degrees)
mol.dihedral(a, b, c, d)            # torsion angle (degrees)

a.alpha_carbons().rmsd(b.alpha_carbons(), align=True)   # CA-RMSD after Kabsch fit

Structural descriptors for ML

features = mol.descriptors()                 # flat dict of scalar/vector descriptors
features["radius_of_gyration"]
features["principal_moments"]                # 3 values
features["distance_histogram"]               # fixed-size histogram

X, names = ms.featurize_many(
    ["a.pdb", "b.pdb", "c.xyz"],
    return_names=True,
)                                            # numeric matrix + column names

Descriptors include atom/residue counts, element counts, molecular mass, centres, radius of gyration, bounding-box dimensions, inertia tensor, principal moments/axes, shape anisotropy, compactness, distance histograms, bond-length summary statistics, and atom/residue contact summaries. Full contact maps remain available through mol.contact_map(...).

Contact maps

cmap = mol.contact_map(cutoff=8.0, level="residue")   # CA-CA contacts -> ContactMap
cmap.matrix                                           # (R, R) array
mol.plot_contact_map(cutoff=8.0)                      # heatmap

mol.contact_map(level="atom")                         # atom-level map
mol.contact_map(level="residue", method="min")        # closest inter-residue atom
mol.contact_map(level="residue", method="com")        # residue centre of mass

Secondary structure (DSSP)

Assign protein secondary structure from backbone hydrogen-bond patterns with a self-contained, pure-NumPy DSSP (no external mkdssp binary needed):

mol = ms.read("1fqy.pdb")
ss = mol.secondary_structure()      # SecondaryStructure, one code per residue

ss.string                           # e.g. '--HHHHHHHH--SS--EEEE--'
ss.codes                            # per-residue array
ss.summary()                        # helix/strand/coil counts and fractions

mol.plot(color_by="ss")             # colour the 3D view by secondary structure

Codes follow DSSP: H/G/I helices, E/B strands, T turn, S bend, - coil. This is a simplified educational implementation: it reproduces the main classes from the Kabsch-Sander hydrogen-bond model but is not bit-identical to the reference mkdssp on every edge case. It needs backbone N/CA/C/O atoms, so use PDB/mmCIF input (not a bare .xyz). The secondary-structure render in the showcase above (helices red, turns cyan, coil grey) is produced this way.

NMR ensembles

from molscope import ensemble

models = ms.read_pdb_models("1aml.pdb")     # all 20 models
ensemble.rmsd_matrix(models)                 # pairwise RMSD matrix
ensemble.rmsf(models)                        # per-atom fluctuation
ensemble.average(models)                     # mean structure
ensemble.align_all(models)                   # superpose every model onto the first

# Per-residue-pair contact probability across the ensemble (NMR variability)
freq = ms.ensemble_contact_frequency(models, cutoff=8.0)
freq.plot()                                  # heatmap of contact frequencies in [0, 1]

Comparing and clustering conformers

Cluster an ensemble (NMR models, conformer sets, docking poses, MD snapshots) by pairwise RMSD:

matrix = ms.rmsd_matrix(models, align=True)        # (M, M) RMSD matrix
ms.plot_rmsd_heatmap(matrix)                        # heatmap

clusters = ms.cluster(models, method="hierarchical")   # data-driven cutoff
clusters = ms.cluster(models, n_clusters=3)            # ...or a fixed count
clusters.n_clusters                                  # how many clusters
clusters.groups()                                    # {cluster_id: [model indices]}
clusters.representatives()                            # {cluster_id: medoid model index}

ms.plot_rmsd_heatmap(matrix, order=clusters.order)  # reorder into diagonal blocks

Writing and viewing

ms.write_xyz(mol.centered(), "out.xyz")     # write transformed coordinates back
ms.write_pdb(mol, "out.pdb")

mol.plot(color_by="chain")                   # colour by element / chain / residue
mol.view(style="cartoon")                    # interactive py3Dmol viewer (notebooks)
from molscope.plotting import spin_gif
spin_gif(mol, "spin.gif")                    # rotating animation

Molecular graphs (for machine learning)

Turn 3D coordinates plus inferred bonds into a graph, then export to the common ML frameworks. The base to_graph() needs no extra dependencies; each exporter imports its backend lazily.

mol = ms.read("1fqy.pdb")

g = mol.to_graph()                  # MolecularGraph: nodes + edges, no deps
g.n_atoms, g.n_bonds                # counts
g.atomic_numbers, g.masses          # per-node arrays
g.node_features()                   # (N, 2) default features [atomic_number, mass]

G = mol.to_networkx()               # networkx.Graph with node/edge attributes
data = mol.to_pyg_data()            # torch_geometric.data.Data (x, pos, edge_index, edge_attr, z)
dglg = mol.to_dgl_graph()           # dgl.DGLGraph with ndata/edata tensors

Nodes carry element, atomic number, mass, coordinates and (from PDB/mmCIF) atom name, residue and chain. Edges carry the bonded pair, interatomic distance, and bond order (1.0 for geometrically inferred bonds). Install backends as needed: pip install "molscope[graph]" installs only NetworkX. PyTorch Geometric and DGL are optional manual installs: pip install torch torch_geometric or pip install dgl after choosing the right PyTorch build for your platform.

Coarse-graining

Map an atomistic structure onto a smaller set of beads. The result is an ordinary Molecule (beads as "atoms") with explicit CG bonds attached, so it plots, transforms and graphs like anything else.

mol = ms.read("1fqy.pdb")

cg = mol.coarse_grain("residue_com")        # one bead per residue (centre of mass)
cg = mol.coarse_grain("residue_centroid")   # ...or geometric centroid
cg = mol.coarse_grain("martini")            # simplified backbone + side-chain beads
cg.plot(scale=200)                          # beads + backbone topology
print(cg.mapping_report())                  # explain beads, dropped atoms, and bonds

# Custom bead definitions by residue + atom name (needs PDB/mmCIF metadata)
mapping = {"ALA": {"BB": ["N", "CA", "C", "O"], "SC": ["CB"]}}
cg = mol.coarse_grain(mapping)
cg, report = mol.coarse_grain(mapping, return_report=True)

# Custom bead definitions by atom index (works on ANY structure, even .xyz)
cg = mol.coarse_grain({"head": [0, 1, 2, 3], "tail": [4, 5, 6, 7]},
                      bonds=[("head", "tail")])   # define the bead network too

cg.to_graph()                               # CG bead network, ready for ML

Bead positions are mass-weighted (or centroids). For residue mappings bonds are generated automatically (within a residue, plus a backbone chain between residues); pass bonds= to define them yourself. Name-based bonds are intended for unique bead names such as head/tail; repeated names such as BB/SC are ambiguous, so use bead indices for those. Atoms you leave unassigned are dropped with a warning. This is meant for teaching and prototyping CG mappings, not as a replacement for production Martini parameters.

Command line

molscope helix_201.xyz --translate 1 2 -1
molscope 1fqy.pdb --select atom_name=CA --color-by residue --save ca.png
molscope --fetch 1aml --center --gif amyloid.gif
python -m molscope 1fqy.pdb          # equivalent if not pip-installed

Sample structures

File Contents
helix_201.xyz a helix (bare coordinates)
1fqy.pdb Aquaporin-1, single model (1661 atoms)
1aml.pdb Alzheimer amyloid A4 peptide, 20-model NMR ensemble

Notes

  • PDB files are parsed by fixed columns, not whitespace splitting, so atoms with touching coordinate fields (large or negative values) read correctly.
  • Alternate conformations (altLoc) other than the primary one are skipped.
  • read_pdb returns a single model (model=1 by default); use read_pdb_models for the whole ensemble.
  • Bond inference uses a scipy.spatial.cKDTree when available; without scipy it falls back to a dense O(n^2) search that is refused above ~8000 atoms.
  • Optional extras: pip install "molscope[fast]" (scipy, faster bonds/contacts) and "molscope[viz]" (py3Dmol, for Molecule.view).

Tests and linting

uv run pytest          # full test suite
uv run ruff check .    # lint

CI (GitHub Actions) runs both across Python 3.9 / 3.11 / 3.13 on every push and PR.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molscope-0.7.0.tar.gz (53.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molscope-0.7.0-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file molscope-0.7.0.tar.gz.

File metadata

  • Download URL: molscope-0.7.0.tar.gz
  • Upload date:
  • Size: 53.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for molscope-0.7.0.tar.gz
Algorithm Hash digest
SHA256 d8fff5201bea30ea0a0337f5e524ba2e22b5a05b4d1c0d5da57dd1b049986ba2
MD5 c0a672953935122f18774ecffd5a4e1a
BLAKE2b-256 9d11a7d65787a35db07c467f52a7871c2cf19ae732888036623d46699edad77d

See more details on using hashes here.

File details

Details for the file molscope-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: molscope-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for molscope-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec096654cdf2c6a871e47a62e05e567471456e1ea2925ad56e8042f047b9139a
MD5 8c17b06cced50f5366df09c04be5c12e
BLAKE2b-256 751c5b3462ed769bfc7b7a03ae6264f13402d68fbb3852e28169f87971597aa0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page