Skip to main content

A lightweight bridge from molecular structures to descriptors, contact maps, graph-ML inputs, and educational coarse-grained representations.

Project description

MolScope logo

MolScope

CI codecov Docs PyPI Python License: MIT Code style: Ruff DOI

Turn a molecular structure file into descriptors, contact maps, ML graphs, and coarse-grained bead models, with a small, readable Python API.

Reads .xyz, .pdb, .cif, and .sdf (or fetches from the RCSB by ID). The core depends only on NumPy and Matplotlib; heavier backends (RDKit, PyTorch Geometric, DGL, Gemmi) are opt-in extras. Built for teaching, exploratory analysis, and ML-for-molecules prototyping, not as a replacement for full simulation or cheminformatics stacks.

📖 Full documentation: https://molscope.readthedocs.io

3D structure (element) Secondary structure (DSSP) Residue contact map Coarse-grained beads
Aquaporin-1 rendered as a 3D element-coloured molecular structure Aquaporin-1 coloured by DSSP secondary structure: helices red, turns cyan, coil grey Residue-level contact map heatmap for Aquaporin-1 Coarse-grained bead model of Aquaporin-1

Install

pip install molscope            # core: NumPy + Matplotlib only

Optional extras, added only when a workflow needs them:

Extra Adds
fast scipy KD-tree for faster bond/contact search on large structures
chem RDKit chemical perception and descriptors
cif Gemmi mmCIF parsing and validation
pyg / dgl / graph / gnn PyTorch Geometric / DGL / NetworkX graph export
viz py3Dmol interactive viewer
xlsx read/write .xlsx molecule tables
gpu Torch dense distance backend
mcp MCP server for AI assistants (Python >= 3.10)
pip install "molscope[chem,cif,pyg]"   # combine as needed

For local development with uv: uv sync (creates .venv and installs deps + dev tools from the lockfile), then uv run pytest.

Quickstart

Given a .pdb (or .xyz / .cif / .sdf), here is what you can pull out:

import molscope as ms

mol = ms.read("protein.pdb")        # or ms.fetch("1fqy") from the RCSB
print(mol.summary())                # atoms, formula, chains, bounding box

ca   = mol.select(chain="A").alpha_carbons()   # metadata selections
cmap = mol.contact_map(cutoff=8.0)             # residue contact map (NumPy)
desc = mol.descriptors()                       # dict of structural descriptors
graph = mol.to_graph()                         # ML-ready graph, no extra deps
data  = mol.to_pyg_data()                      # PyTorch Geometric Data ([pyg])
cg    = mol.coarse_grain("residue_com")        # one bead per residue

Molecule is immutable: translate, centered, and rotate each return a new molecule, so transformations chain cleanly.

What you can do

Capability Guide
Read/write XYZ, PDB, mmCIF, SDF; fetch from RCSB; build from SMILES Reading files
Stream large multi-model files frame by frame Reading files
Select atoms by metadata (chain, residue, name, ...) Selections
Geometry, RMSD, distances, angles, torsions Geometry and measurements
Contact maps and distance matrices Contact maps
DSSP secondary structure, torsions, interfaces, binding sites Protein analysis
Native and RDKit-backed descriptors Structural descriptors
Chemical perception, protein template bonds, bond-order inference Chemical perception
Atom/bond and residue-contact graphs for ML (with positional encodings) Molecular graphs
Coarse-grained bead mappings (residue, Martini-style, custom) Coarse-graining
NMR ensembles and clustering Ensemble analysis
Plotting and py3Dmol viewing Plotting and viewing
Diverse subset selection from a CSV/XLSX table Diverse selection

Task-oriented tutorials: PDB to descriptors, PDB to graph/GNN, and PDB to coarse-grained beads. A runnable tour over the bundled samples lives in examples/tour.py.

Command line

Command Does
molscope <file> (view) visualise a structure, save a PNG or GIF
molscope analyze batch descriptor table to CSV
molscope binding-site ligand binding-site contacts and pocket descriptors
molscope export batch graph export to PyG / DGL / NetworkX
molscope select diverse subset from a CSV/XLSX table
molscope dock-summary rank docking poses from an SDF; summary + top-hit tables + score plot
molscope dock-diverse diverse shortlist of top hits by Tanimoto clustering
molscope dock-rank transparent consensus ranking across scored SDFs
molscope dock-report self-contained HTML report + top poses for PyMOL/ChimeraX/Mol*
molscope examples/data/1fqy.pdb --select atom_name=CA --color-by residue --save ca.png
molscope analyze examples/data/*.pdb --out results.csv --preset native-3d --jobs 4
molscope export "data/*.cif" --to pyg --out-dir pyg_graphs/ --pe laplacian --jobs 8
molscope select molecules.csv --smiles-col SMILES --compute-descriptors -n 100 --out picked.csv
molscope dock-summary vina_out.sdf --score-field minimizedAffinity --top 20
molscope dock-diverse vina_out.sdf --top 500 --select 50

Use from an AI assistant (MCP)

MolScope ships an optional Model Context Protocol server, so an MCP-capable assistant (Claude Code/Desktop, Codex CLI, Gemini CLI) can drive its analyses in natural language. It exposes the public API as 28 tools (structure analysis, graphs, plots, dataset prep, docking-hit triage) and adds no new science.

pip install "molscope[mcp]"              # needs Python >= 3.10
claude mcp add molscope -- molscope-mcp  # Claude Code
codex mcp add molscope -- molscope-mcp   # Codex CLI
gemini mcp add molscope molscope-mcp     # Gemini CLI

For example: "fetch trypsin (3ptb), find the benzamidine binding-site residues, and render a contact map." See docs/user-guide/mcp-server.md for the full tool reference.

Scientific validation

MolScope is explicit about which results are cross-checked against reference tools and which are intentionally lightweight:

Feature Status
Geometry, RMSD, contact maps Cross-checked vs MDAnalysis (near machine precision)
Bond perception, chemical features Cross-checked vs RDKit
Secondary structure (simplified DSSP) Cross-checked vs mkdssp: ~98 to 99% 3-state agreement across helical, mixed, and all-beta folds
Protein template bonds Cross-checked vs known per-residue chemistry
Native descriptors, molecular graphs Deterministic; not benchmarked against a curated library
Coarse-graining Mapping and visualisation only; not a validated force-field model
Standard protonation Idealised pH-7 textbook model ("standard")
pKa-aware protonation Environment-aware via PROPKA (proteins) / Dimorphite-DL (SMILES) at a chosen pH

Methods, tolerances, and failure modes are in docs/validation.md. The CI validation job runs physical invariants plus these cross-checks on every push.

Scope and design philosophy

MolScope is a lightweight core (NumPy and Matplotlib only) with a broad optional surface. Those optional extras are deliberately of two kinds, and the distinction is what keeps "lightweight" honest:

  • Interop / output targets (networkx, pyg, dgl, viz, xlsx, gpu, mcp). These let MolScope emit to your framework or run faster. Speaking many formats is the bridge mission, so this surface is expected to grow.
  • Method backends (chem/RDKit, cif/gemmi, propka, dimorphite, validation/MDAnalysis). These delegate a scientific computation to an external tool, and each one inherits that tool's scope, versioning, and correctness surface. This is the surface we keep deliberately small.

A new method backend earns inclusion only when MolScope adds real integration value beyond what you could write in a line or two against the tool directly (for example, mapping RDKit perception back onto MolScope's atom model and residue templates), and when the wrapped tool is maintained. The core does something useful on its own; everything heavier degrades gracefully when its extra is absent. MolScope is not, and is not trying to become, a re-implementation of a cheminformatics or simulation stack: where a dedicated tool is the right answer, it integrates that tool rather than reinventing it, and says so.

FAQ

Which formats can it read? .xyz, .pdb, .cif, and .sdf; fetch from the RCSB with ms.fetch("1fqy"); or build from SMILES with ms.read_smiles(...) (needs [chem]).

Does it handle MD trajectories? It works on static structures and multi-model files (NMR ensembles, and ms.stream(...) to iterate large multi-model PDB/XYZ frame by frame). It has no trajectory engine; for DCD/XTC and friends use MDAnalysis or MDTraj.

Is the coarse-graining a real force field? No. It produces CG mappings and bead graphs for inspection and ML prototyping. The OpenMM XML export describes topology only and is not a validated Martini parameter set.

Do I need RDKit or PyTorch? No. The core runs on NumPy and Matplotlib; those are opt-in extras you install only for the matching workflow.

Will odd PDB files parse? ATOM/HETATM lines are read by fixed columns (not whitespace), so touching, large, or negative coordinate fields read correctly. Alternate conformations default to the primary altLoc.

Development and citation

uv run pytest                  # full test suite
uv run pytest tests/validation # validation suite only
uv run ruff check .            # lint

CI runs the suite and linting across Python 3.9 / 3.11 / 3.13, smoke-imports the extras, and runs a separate validation job on every push and PR. Notable changes per release are in CHANGELOG.md.

Each release is archived on Zenodo with a citable DOI. The concept DOI 10.5281/zenodo.20433850 always resolves to the latest version; citation metadata is in CITATION.cff, so GitHub's "Cite this repository" button produces BibTeX and APA entries.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molscope-0.12.0.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molscope-0.12.0-py3-none-any.whl (168.8 kB view details)

Uploaded Python 3

File details

Details for the file molscope-0.12.0.tar.gz.

File metadata

  • Download URL: molscope-0.12.0.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for molscope-0.12.0.tar.gz
Algorithm Hash digest
SHA256 584518de0c7cd1c438f1e188e343664a470a1aa574d54f0abc4e9f973ad2b1dd
MD5 34c5e4f318d935fb3bee19457e5decac
BLAKE2b-256 6dc17627f61717415636b7eeab57ab0e73a781fa309d681078106dd8a5ef0434

See more details on using hashes here.

File details

Details for the file molscope-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: molscope-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 168.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for molscope-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c73b27a90ad1a12854ef0ed97759f46eaea1390b9a6fc8ff90c2c59ef371884b
MD5 b288898027aeb11222e5ca103f7ac1c3
BLAKE2b-256 2a41915b15122b287aa811add503fbaf38d235314e5bea5c3905d10a6cc862cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page