Skip to main content

Fast mmCIF parser for structural biology

Project description

Overview

ciffy is a fast CIF file parser for molecular structures, with a C backend and Python interface. It supports both NumPy and PyTorch backends for array operations.

Performance

ciffy is 70-125x faster than BioPython and Biotite for parsing CIF files:

Structure Atoms ciffy BioPython Biotite
3SKW 2,874 0.36 ms 39 ms (106x) 28 ms (78x)
9GCM 4,466 0.54 ms 48 ms (88x) 38 ms (70x)
9MDS 102,216 11 ms 1340 ms (126x) 946 ms (89x)

Benchmarked on Apple M1 Max. Run python tests/profile.py to reproduce.

Installation

From PyPI

pip install ciffy

With GPU Acceleration (CUDA)

For GPU-accelerated coordinate conversions:

pip install ciffy-cuda

This requires PyTorch with CUDA support. See ciffy-cuda for details.

From Source

git clone https://github.com/hmblair/ciffy.git
cd ciffy
pip install -r requirements.txt
pip install -e .

# Optional: Install CUDA extension for GPU acceleration
pip install -e ./cuda

Backends

ciffy supports two array backends:

  • NumPy: Lightweight, no additional dependencies required
  • PyTorch: For GPU support (CUDA/MPS) and integration with deep learning workflows

Specify the backend when loading structures:

import ciffy

# Load with NumPy backend (recommended for general use)
polymer = ciffy.load("structure.cif", backend="numpy")

# Load with PyTorch backend (for deep learning workflows)
polymer = ciffy.load("structure.cif", backend="torch")

Polymers can be converted between backends:

# Convert to PyTorch tensors
torch_polymer = polymer.torch()

# Convert to NumPy arrays
numpy_polymer = polymer.numpy()

For PyTorch, move tensors to GPU:

# Move to CUDA
polymer_gpu = polymer.torch().to("cuda")

# Move to Apple Silicon (MPS)
polymer_mps = polymer.torch().to("mps")

Note: The default backend is "numpy" as of v0.6.0. Specify the backend explicitly for clarity.

Usage

import ciffy

# Load a structure from a CIF file
polymer = ciffy.load("structure.cif", backend="numpy")

# Basic information
print(polymer)  # Summary of chains, residues, atoms

# Access coordinates and properties
coords = polymer.coordinates      # (N, 3) array/tensor
atoms = polymer.atoms             # (N,) array/tensor of atom types
sequence = polymer.sequence_str()  # Sequence string

# Geometric operations
centered, means = polymer.center(ciffy.MOLECULE)
aligned, Q = polymer.align(ciffy.CHAIN)
distances = polymer.pairwise_distances(ciffy.RESIDUE)

# Selection
rna_chains = polymer.by_type(ciffy.RNA)
backbone = polymer.backbone()

# Molecule type per chain (parsed from CIF _entity_poly block)
mol_types = polymer.molecule_type  # Array of Molecule enum values

# Load with entity descriptions (off by default for performance)
polymer = ciffy.load("structure.cif", load_descriptions=True)
descriptions = polymer.descriptions  # List of description strings per chain

# Iterate over chains
for chain in polymer.chains(ciffy.RNA):
    print(chain.pdb_id, chain.sequence_str())

# Compute RMSD between structures (defaults to MOLECULE scale)
rmsd = ciffy.rmsd(polymer1, polymer2)

Internal Coordinates

Polymer supports dual representation - access both Cartesian (XYZ) and internal (bond lengths, angles, dihedrals) coordinates on the same object. Conversions happen automatically with lazy evaluation.

import ciffy

polymer = ciffy.load("structure.cif", backend="torch")

# Access internal coordinates (computed lazily on first access)
distances = polymer.distances   # (N,) bond lengths
angles = polymer.angles         # (N,) bond angles
dihedrals = polymer.dihedrals   # (N,) dihedral angles

# Access named backbone dihedrals using enum
phi = polymer.dihedral(ciffy.DihedralType.PHI)    # Protein phi
psi = polymer.dihedral(ciffy.DihedralType.PSI)    # Protein psi
alpha = polymer.dihedral(ciffy.DihedralType.ALPHA)  # RNA/DNA alpha

# Modify dihedrals - Cartesian coordinates auto-update
new_dihedrals = polymer.dihedrals + noise
polymer.dihedrals = new_dihedrals
coords = polymer.coordinates  # Automatically reconstructed

# Set specific named dihedrals
polymer.set_dihedral(ciffy.DihedralType.PHI, new_phi_values)

# Fully differentiable for PyTorch (gradients flow through reconstruction)
dihedrals = polymer.dihedrals.requires_grad_(True)
polymer.dihedrals = dihedrals
loss = ciffy.rmsd(polymer, target)
loss.backward()
print(dihedrals.grad)  # Gradients on dihedral angles

Saving Structures

# Save to CIF format (supports all molecule types)
polymer.write("output.cif")

# Save only polymer atoms (excludes water, ions, ligands)
polymer.poly().write("polymer_only.cif")

Command Line Interface

# View structure summary
ciffy structure.cif

# Show sequences per chain
ciffy structure.cif --sequence

# Show entity descriptions per chain
ciffy structure.cif --desc

# Multiple files
ciffy file1.cif file2.cif

# Run multiple training experiments in parallel
ciffy experiment configs/*.yaml

# Run inference to generate structures from sequences
# Copy example config and customize for your setup:
# cp examples/configs/inference_example.yaml configs/inference.yaml
ciffy inference configs/inference.yaml

Example output:

PDB 9GCM (numpy)
──────────────────────
   Type     Res  Atoms
A  RNA      135   1413
B  PROTEIN  132   1032
C  PROTEIN  246   1261
D  PROTEIN  485    760
──────────────────────
            998   4466

Descriptions:
  A: U11 snRNA
  B: U11/U12 small nuclear ribonucleoprotein 25 kDa protein
  C: U11/U12 small nuclear ribonucleoprotein 35 kDa protein
  D: Programmed cell death protein 7

Training Neural Networks

ciffy includes PyTorch modules for deep learning on molecular structures. See the deep learning guide for full documentation.

Running Experiments

Train multiple models in parallel across GPUs:

# Run all configs in parallel (auto-distributes across GPUs)
ciffy experiment configs/*.yaml

# Run sequentially
ciffy experiment configs/*.yaml --sequential

# Force CPU
ciffy experiment configs/*.yaml --device cpu

Results are displayed in a comparison table:

Experiment            Status    Best Loss   Device    Time
--------------------  --------  ----------  --------  ----------
vae_small             success   0.1234      cuda:0    45.2s
vae_medium            success   0.0987      cuda:1    2m0s
vae_large             failed    N/A         cuda:0    5.3s
--------------------  --------  ----------  --------  ----------
Total: 2/3 succeeded in 2m51s

Testing

pytest tests/

Contributing

See CONTRIBUTING.md for development setup, repository structure, and code generation details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ciffy-0.9.10.tar.gz (548.2 kB view details)

Uploaded Source

File details

Details for the file ciffy-0.9.10.tar.gz.

File metadata

  • Download URL: ciffy-0.9.10.tar.gz
  • Upload date:
  • Size: 548.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ciffy-0.9.10.tar.gz
Algorithm Hash digest
SHA256 2e237e288c2c09e22a6389ad70d4976984548ca70f748abd761b821101ce38bb
MD5 c21170ca6ba55cd8b12ba58d6c56ae22
BLAKE2b-256 ea9e45a73d6839567e9df5294d896db5bb480e80aae415a53e0781dd48c1a854

See more details on using hashes here.

Provenance

The following attestation bundles were made for ciffy-0.9.10.tar.gz:

Publisher: pypi.yml on hmblair/ciffy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page