Fast mmCIF parser for structural biology
Project description
Overview
ciffy is a fast CIF file parser for molecular structures, with a C backend and Python interface. It supports both NumPy and PyTorch backends for array operations.
Performance
ciffy is 50-90x faster than BioPython and Biotite for parsing CIF files:
| Structure | Atoms | ciffy | BioPython | Biotite |
|---|---|---|---|---|
| 3SKW | 2,874 | 0.47 ms | 31 ms (66x) | 28 ms (59x) |
| 9GCM | 4,466 | 0.71 ms | 40 ms (56x) | 36 ms (51x) |
| 9MDS | 102,216 | 14 ms | 1266 ms (93x) | 911 ms (67x) |
Benchmarked on Apple M1 Max. Run python tests/profile.py to reproduce.
Installation
From PyPI
pip install ciffy
From Source
git clone https://github.com/hmblair/ciffy.git
cd ciffy
pip install -r requirements.txt
pip install -e .
Backends
ciffy supports two array backends:
- NumPy: Lightweight, no additional dependencies required
- PyTorch: For GPU support (CUDA/MPS) and integration with deep learning workflows
Specify the backend when loading structures:
import ciffy
# Load with NumPy backend (recommended for general use)
polymer = ciffy.load("structure.cif", backend="numpy")
# Load with PyTorch backend (for deep learning workflows)
polymer = ciffy.load("structure.cif", backend="torch")
Polymers can be converted between backends:
# Convert to PyTorch tensors
torch_polymer = polymer.torch()
# Convert to NumPy arrays
numpy_polymer = polymer.numpy()
For PyTorch, move tensors to GPU:
# Move to CUDA
polymer_gpu = polymer.torch().to("cuda")
# Move to Apple Silicon (MPS)
polymer_mps = polymer.torch().to("mps")
Note: The default backend is "numpy" as of v0.6.0. Specify the backend explicitly for clarity.
Usage
import ciffy
# Load a structure from a CIF file
polymer = ciffy.load("structure.cif", backend="numpy")
# Basic information
print(polymer) # Summary of chains, residues, atoms
# Access coordinates and properties
coords = polymer.coordinates # (N, 3) array/tensor
atoms = polymer.atoms # (N,) array/tensor of atom types
sequence = polymer.str() # Sequence string
# Geometric operations
centered, means = polymer.center(ciffy.MOLECULE)
aligned, Q = polymer.align(ciffy.CHAIN)
distances = polymer.pairwise_distances(ciffy.RESIDUE)
# Selection
rna_chains = polymer.subset(ciffy.RNA)
backbone = polymer.backbone()
# Molecule type per chain (parsed from CIF _entity_poly block)
mol_types = polymer.molecule_type # Array of Molecule enum values
# Load with entity descriptions (off by default for performance)
polymer = ciffy.load("structure.cif", load_descriptions=True)
descriptions = polymer.descriptions # List of description strings per chain
# Iterate over chains
for chain in polymer.chains(ciffy.RNA):
print(chain.id(), chain.str())
# Compute RMSD between structures (defaults to MOLECULE scale)
rmsd = ciffy.rmsd(polymer1, polymer2)
Saving Structures
# Save to CIF format (supports all molecule types)
polymer.write("output.cif")
# Save only polymer atoms (excludes water, ions, ligands)
polymer.poly().write("polymer_only.cif")
Command Line Interface
# View structure summary
ciffy structure.cif
# Show sequences per chain
ciffy structure.cif --sequence
# Show entity descriptions per chain
ciffy structure.cif --desc
# Multiple files
ciffy file1.cif file2.cif
Example output:
PDB 9GCM (numpy)
──────────────────────
Type Res Atoms
A RNA 135 1413
B PROTEIN 132 1032
C PROTEIN 246 1261
D PROTEIN 485 760
──────────────────────
998 4466
Descriptions:
A: U11 snRNA
B: U11/U12 small nuclear ribonucleoprotein 25 kDa protein
C: U11/U12 small nuclear ribonucleoprotein 35 kDa protein
D: Programmed cell death protein 7
Module Structure
ciffy/
├── backend/ # NumPy/PyTorch abstraction layer
├── types/ # Scale, Molecule enums
├── biochemistry/ # Element, Residue, nucleotide definitions
├── operations/ # Reduction, alignment operations
├── io/ # File loading and writing
└── utils/ # Helper functions and base classes
Testing
pip install pytest
pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ciffy-0.8.4.tar.gz.
File metadata
- Download URL: ciffy-0.8.4.tar.gz
- Upload date:
- Size: 156.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34446c24206f66fd81feee82e8f55ca04df3e34966e32adffa2392f8bd6df240
|
|
| MD5 |
bfa3b514decd00c4da3485e18f3e7274
|
|
| BLAKE2b-256 |
4148e2bd060f332cea5a7e9e4ba7da27c9db8fac8a60f1ff6d2ca4da76e7b81c
|
Provenance
The following attestation bundles were made for ciffy-0.8.4.tar.gz:
Publisher:
pypi.yml on hmblair/ciffy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ciffy-0.8.4.tar.gz -
Subject digest:
34446c24206f66fd81feee82e8f55ca04df3e34966e32adffa2392f8bd6df240 - Sigstore transparency entry: 760846033
- Sigstore integration time:
-
Permalink:
hmblair/ciffy@81827c1ea0bcdbc44bb58484212a2ac14d5154b5 -
Branch / Tag:
refs/tags/v0.8.4 - Owner: https://github.com/hmblair
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@81827c1ea0bcdbc44bb58484212a2ac14d5154b5 -
Trigger Event:
push
-
Statement type: