Skip to main content

Fast analysis of massive-scale data produced with MassiveFold

Project description

header

Fast analysis of massive-scale data produced with MassiveFold

PyPI version Supported Python versions

Introduction

Massif is a high-throughput analysis suite built to process the large structural ensembles generated by MassiveFold. It helps MassiveFold users review many predictions at once, evaluate interfaces and distances, and identify models that warrant follow-up. Instead of working through raw model folders manually, Massif gathers the metrics needed for filtering, ranking, and selecting structures in one place.

Getting Started

Massif can be both installed as CLI tool and python libary via pip (requires a Rust toolchain).

Get the last massif release from the pypi release with:

python -m pip install massif

Or build an up-to-date version with non-tested but new features with:

python -m pip install .

Python-installed CLI

After pip install, the massif command is available in your environment and uses the same CLI syntax as the Rust binary:

massif --help
massif fit <OUTPUT_DIR> <REFERENCE_PDB> <CHAIN_IDS> <STRUCTURE_DIR> <OUTPUT_CSV>

Python Package

Example usage:

import massif

files = massif.structure_files("path/to/structures")
distances = massif.distances(
    "path/to/structures",
    "path/to/reference.pdb",
    distance_mode="TM-score",
)

Notes:

  • massif.distances writes a CSV report in the current working directory.
  • Functions print progress output to stdout while running.
  • pip install also exposes a massif console script that runs the Rust CLI.

Building from source

Prerequisites

  • Rust toolchain >= 1.74 (install via rustup)
  • A directory containing the structures you want to process (PDB or mmCIF files); filenames are sorted numerically on the first _-separated index

Build

cargo build --release

Command Help

cargo run -- --help

Usage

Massif expects positional arguments in the following order:

massif <COMMAND> [COMMAND OPTIONS] <STRUCTURE_DIR> <OUTPUT_CSV> [OPTIONS]
  • STRUCTURE_DIR: directory containing the input PDB/CIF files
  • OUTPUT_CSV: base report name; data is currently written to <OUTPUT_CSV>_alternative.*
  • --disable-parallel: force single-threaded execution (Rayon is enabled by default)

The COMMAND argument selects one of the following subcommands:

fit

Align every structure against a reference chain, save aligned coordinates, and compute distances (currently TM-score).

massif fit <OUTPUT_DIR> <REFERENCE_PDB> <CHAIN_IDS> [METRIC] [DISTANCE_CHAINS] <STRUCTURE_DIR> <OUTPUT_CSV>
  • OUTPUT_DIR: folder where aligned structures are written
  • REFERENCE_PDB: path to the reference structure used for alignment and distance computation
  • CHAIN_IDS: concatenated chain identifiers (for example AB or C) that define the fitting anchor in both reference and target structures
  • METRIC (optional): TM-score (default) or rmsd-cur
  • DISTANCE_CHAINS (optional): chain group used for the post-fit distance computation, including both rmsd-cur and TM-score (for example AB)
  • Output columns: TM-score to <reference> plus Models

contacts

Characterise interface contacts and clashes across the ensemble.

massif contacts <OUTPUT_DIR> <STRUCTURE_DIR> <OUTPUT_CSV>
  • Extracts direct residue-residue contacts from each model interface and writes one <model>_contact_details.csv file per structure
  • Reports the number of atomic clashes per model and prints the automatic exclusion threshold (mean + 2×SD)
  • Adds interface score placeholders (future integration of pTM/ipTM based scoring)
  • Aligned structures are not emitted; OUTPUT_DIR is reserved for future extensions

iplddt

Compute the mean pLDDT over residues at a user-defined interface.

massif iplddt <AGGREGATE_1> <AGGREGATE_2> <THRESHOLD> <STRUCTURE_DIR> <OUTPUT_CSV>
  • AGGREGATE_1 / AGGREGATE_2: chain groups (for example AB vs C)
  • THRESHOLD: distance cutoff (Å) between atoms to treat residues as contacting
  • Returns an i-plddt column per model; failures are reported as -1

cluster

Align every structure on a reference, reduce a selected chain group to one 3D point, and assign complete-linkage clusters in the reduced space.

massif cluster <REFERENCE_PDB> <ANCHOR_CHAINS> <REDUCTION_CHAINS> <CUTOFF> <STRUCTURE_DIR> <OUTPUT_CSV> [--aligned-output-dir <OUTPUT_DIR>]
  • REFERENCE_PDB: path to the reference structure used for alignment
  • ANCHOR_CHAINS: concatenated chain identifiers used as the alignment anchor (for example AB or C)
  • REDUCTION_CHAINS: concatenated chain identifiers whose aligned atoms are averaged into one point per model
  • CUTOFF: complete-linkage cutoff (Å) applied to the reduced 3D points
  • --aligned-output-dir: optional directory where the aligned reference and aligned models are written
  • Output columns: point_x, point_y, point_z, cluster_id, and Models
  • When --aligned-output-dir is not provided, Massif reuses cached reduced coordinates from the existing structured CSV when possible

distances

Measure minimal distances between every pair of chains and optionally retain a subset.

massif distances <FILENAME> <CHAIN_PAIRS> <STRUCTURE_DIR> <OUTPUT_CSV>
  • FILENAME: reserved for future use (currently ignored)
  • CHAIN_PAIRS: comma-separated list (for example AB,AC,BC); each pair becomes a CSV column
  • Records minimal heavy-atom distances in Å

scoring

Placeholder for future scoring pipelines.

massif scoring <STRUCTURE_DIR> <OUTPUT_CSV>
  • Currently returns a vector of 1.0 for each model and does not write extra columns

Output Layout

  • <OUTPUT_CSV>_alternative.csv: structured report with stable column ordering that merges new results with previous runs
  • Aligned structures are written to the provided OUTPUT_DIR for fit and to --aligned-output-dir for cluster

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

massif-0.6.0.tar.gz (57.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

massif-0.6.0-cp312-cp312-win_amd64.whl (5.0 MB view details)

Uploaded CPython 3.12Windows x86-64

massif-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

massif-0.6.0-cp312-cp312-macosx_11_0_arm64.whl (5.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

massif-0.6.0-cp312-cp312-macosx_10_12_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file massif-0.6.0.tar.gz.

File metadata

  • Download URL: massif-0.6.0.tar.gz
  • Upload date:
  • Size: 57.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for massif-0.6.0.tar.gz
Algorithm Hash digest
SHA256 f9aaa7558e0120b378af6d73b8ce68154860137e2101235db5fe37b20aee8f3e
MD5 f2b6fad266ac26b13b51654c6cee947b
BLAKE2b-256 aa8d7c5b20c78ee6043921d3d486a4f56c707cc046f39184b36a7347f6630768

See more details on using hashes here.

File details

Details for the file massif-0.6.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: massif-0.6.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 5.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for massif-0.6.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 bed67da50823aed265d720048d3804d1ca68c622c7bed3ca7c5fa9ace311b704
MD5 c1d92b8a914f851e597721d4f8d6383b
BLAKE2b-256 58cdf273d80407422529d4f613f2283adcd2785aff37de86de87292656df27c8

See more details on using hashes here.

File details

Details for the file massif-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for massif-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 11d7c6e700174e13c7f93a6859280520e84f90499ea71443bf9a696a1ca545bf
MD5 331c286d18b4a3b8d9e35531137cfd65
BLAKE2b-256 c3c9e7a91d5a985fc533eed1b9246eb602c471780e7e11af8e12fa6f1c30f67a

See more details on using hashes here.

File details

Details for the file massif-0.6.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for massif-0.6.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 93db768bfdd900c9ba94538d39cb2f959ab72572c84e2e416bbfc2d5253cb1b5
MD5 416061b37f606d620d3edc02460ad841
BLAKE2b-256 45df9d460cd0a122f1f95dec26766da965a3b11c7a862abf5db48de5d974315c

See more details on using hashes here.

File details

Details for the file massif-0.6.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for massif-0.6.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cde0a54b30c3230a421af84c0618b65a5e8222c4b0790867e3cbff52df00d86b
MD5 0956080cc1285c41134a36cc9f381185
BLAKE2b-256 5e7a60a1383c62c582ebfee465a45418266d35eb1ddb0ecd85fb31cce7652c9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page