Skip to main content

Fast analysis of massive-scale data produced with MassiveFold

Project description

header

Fast analysis of massive-scale data produced with MassiveFold

PyPI version Supported Python versions

Introduction

Massif is a high-throughput analysis suite built to process the large structural ensembles generated by MassiveFold. It helps MassiveFold users review many predictions at once, evaluate interfaces and distances, and identify models that warrant follow-up. Instead of working through raw model folders manually, Massif gathers the metrics needed for filtering, ranking, and selecting structures in one place.

Getting Started

Massif can be both installed as CLI tool and python libary via pip (requires a Rust toolchain).

Get the last massif release from the pypi release with:

python -m pip install massif

Or build an up-to-date version with non-tested but new features with:

python -m pip install .

Python-installed CLI

After pip install, the massif command is available in your environment and uses the same CLI syntax as the Rust binary:

massif --help
massif fit <OUTPUT_DIR> <REFERENCE_PDB> <CHAIN_IDS> <STRUCTURE_DIR> <OUTPUT_CSV>

Python Package

Example usage:

import massif

files = massif.structure_files("path/to/structures")
distances = massif.distances(
    "path/to/structures",
    "path/to/reference.pdb",
    distance_mode="TM-score",
)
contacts = massif.contacts(
    "path/to/structures",
    receptor="A",
    ligand="B",
    contact_cutoff=4.0,
)

Notes:

  • massif.distances writes a CSV report in the current working directory.
  • Functions print progress output to stdout while running.
  • pip install also exposes a massif console script that runs the Rust CLI.

Building from source

Prerequisites

  • Rust toolchain >= 1.74 (install via rustup)
  • A directory containing the structures you want to process (PDB or mmCIF files); filenames are sorted numerically on the first _-separated index

Build

cargo build --release

Command Help

cargo run -- --help

Usage

Massif expects positional arguments in the following order:

massif <COMMAND> [COMMAND OPTIONS] <STRUCTURE_DIR> <OUTPUT_CSV> [OPTIONS]
  • STRUCTURE_DIR: directory containing the input PDB/CIF files
  • OUTPUT_CSV: base report name; data is currently written to <OUTPUT_CSV>_alternative.*
  • --disable-parallel: force single-threaded execution (Rayon is enabled by default)

The COMMAND argument selects one of the following subcommands:

fit

Align every structure against a reference chain, save aligned coordinates, and compute distances (currently TM-score).

massif fit <OUTPUT_DIR> <REFERENCE_PDB> <CHAIN_IDS> [METRIC] [DISTANCE_CHAINS] <STRUCTURE_DIR> <OUTPUT_CSV>
  • OUTPUT_DIR: folder where aligned structures are written
  • REFERENCE_PDB: path to the reference structure used for alignment and distance computation
  • CHAIN_IDS: concatenated chain identifiers (for example AB or C) that define the fitting anchor in both reference and target structures
  • METRIC (optional): TM-score (default) or rmsd-cur
  • DISTANCE_CHAINS (optional): chain group used for the post-fit distance computation, including both rmsd-cur and TM-score (for example AB)
  • Output columns: TM-score to <reference> plus Models

contacts

Characterise interface contacts and clashes across the ensemble.

massif contacts <OUTPUT_DIR> <STRUCTURE_DIR> <OUTPUT_CSV>
  • Extracts direct residue-residue contacts from each model interface and writes one <model>_contact_details.csv file per structure
  • Reports the number of atomic clashes per model and prints the automatic exclusion threshold (mean + 2×SD)
  • Adds interface score placeholders (future integration of pTM/ipTM based scoring)
  • Aligned structures are not emitted; OUTPUT_DIR is reserved for future extensions

iplddt

Compute the mean pLDDT over residues at a user-defined interface.

massif iplddt <AGGREGATE_1> <AGGREGATE_2> <THRESHOLD> <STRUCTURE_DIR> <OUTPUT_CSV>
  • AGGREGATE_1 / AGGREGATE_2: chain groups (for example AB vs C)
  • THRESHOLD: distance cutoff (Å) between atoms to treat residues as contacting
  • Returns an i-plddt column per model; failures are reported as -1

cluster

Align every structure on a reference, reduce a selected chain group to one 3D point, and assign complete-linkage clusters in the reduced space.

massif cluster <REFERENCE_PDB> <ANCHOR_CHAINS> <REDUCTION_CHAINS> <CUTOFF> <STRUCTURE_DIR> <OUTPUT_CSV> [--aligned-output-dir <OUTPUT_DIR>]
  • REFERENCE_PDB: path to the reference structure used for alignment
  • ANCHOR_CHAINS: concatenated chain identifiers used as the alignment anchor (for example AB or C)
  • REDUCTION_CHAINS: concatenated chain identifiers whose aligned atoms are averaged into one point per model
  • CUTOFF: complete-linkage cutoff (Å) applied to the reduced 3D points
  • --aligned-output-dir: optional directory where the aligned reference and aligned models are written
  • Output columns: point_x, point_y, point_z, cluster_id, and Models
  • When --aligned-output-dir is not provided, Massif reuses cached reduced coordinates from the existing structured CSV when possible

distances

Measure minimal distances between every pair of chains and optionally retain a subset.

massif distances <FILENAME> <CHAIN_PAIRS> <STRUCTURE_DIR> <OUTPUT_CSV>
  • FILENAME: reserved for future use (currently ignored)
  • CHAIN_PAIRS: comma-separated list (for example AB,AC,BC); each pair becomes a CSV column
  • Records minimal heavy-atom distances in Å

scoring

Placeholder for future scoring pipelines.

massif scoring <STRUCTURE_DIR> <OUTPUT_CSV>
  • Currently returns a vector of 1.0 for each model and does not write extra columns

Output Layout

  • <OUTPUT_CSV>_alternative.csv: structured report with stable column ordering that merges new results with previous runs
  • Aligned structures are written to the provided OUTPUT_DIR for fit and to --aligned-output-dir for cluster

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

massif-0.8.0.tar.gz (57.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

massif-0.8.0-cp312-cp312-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.12Windows x86-64

massif-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

massif-0.8.0-cp312-cp312-macosx_11_0_arm64.whl (5.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

massif-0.8.0-cp312-cp312-macosx_10_12_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file massif-0.8.0.tar.gz.

File metadata

  • Download URL: massif-0.8.0.tar.gz
  • Upload date:
  • Size: 57.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for massif-0.8.0.tar.gz
Algorithm Hash digest
SHA256 17d4719c31bebaf5d01244b8c53d95ec48efd468960f159f7eedea55912f1471
MD5 263818138779e2c1e010aac35b24cf8d
BLAKE2b-256 e7c620f680dd014d320d5500053df0ea3d8de4e59aa22ea96f73c076010bcf11

See more details on using hashes here.

File details

Details for the file massif-0.8.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: massif-0.8.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 5.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for massif-0.8.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6fa664c56c4eaed8799a309d0ea849e2de48af1b9192c1c003f4b88b8cba5c30
MD5 b4166ad77faf1ef9e634ffaa92e83c28
BLAKE2b-256 369467ac7ad8fb6fd80bcfa2b855461f77bc4af500c58e73274799e5f453397f

See more details on using hashes here.

File details

Details for the file massif-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for massif-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3f8cad7d8b562c376573387e1f4841c9ad1de9370318e5b925f0afaf22a308c7
MD5 d771278c598e04adf92e3ca1eb9925dd
BLAKE2b-256 06700f46a3bdfa7f2fc534340db0672b8951d8f6ed20a7f56f1e963128c633c1

See more details on using hashes here.

File details

Details for the file massif-0.8.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for massif-0.8.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f36302a099d09bd188cff9495e17ac2f4aa18af607e94220eb53b24dea4b09bd
MD5 a8980e08e9cee59f09050fd1ff1c0d2a
BLAKE2b-256 013320160637b8249eff44e3ee8d19b95f35692e23d1199f0dbe96187d00332f

See more details on using hashes here.

File details

Details for the file massif-0.8.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for massif-0.8.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 27ae038058e57577ccfa6d51bce5f43c7bf514192c8c94dc0a360dddd497396b
MD5 d5560edf67105ab97271ba06ebb1d74a
BLAKE2b-256 288e820ce1c35f6444a2e28cb3d07693368b71f94ab11c4995e4c5ea5e3fbf6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page