Fast mmCIF parser for structural biology
Project description
Overview
ciffy is a fast CIF file parser for molecular structures, with a C backend and Python interface. It supports both NumPy and PyTorch backends for array operations.
Performance
ciffy is 70-125x faster than BioPython and Biotite for parsing CIF files:
| Structure | Atoms | ciffy | BioPython | Biotite |
|---|---|---|---|---|
| 3SKW | 2,874 | 0.36 ms | 39 ms (106x) | 28 ms (78x) |
| 9GCM | 4,466 | 0.54 ms | 48 ms (88x) | 38 ms (70x) |
| 9MDS | 102,216 | 11 ms | 1340 ms (126x) | 946 ms (89x) |
Benchmarked on Apple M1 Max. Run python tests/profile.py to reproduce.
Installation
From PyPI
pip install ciffy
From Source
git clone https://github.com/hmblair/ciffy.git
cd ciffy
pip install -r requirements.txt
pip install -e .
Backends
ciffy supports two array backends:
- NumPy: Lightweight, no additional dependencies required
- PyTorch: For GPU support (CUDA/MPS) and integration with deep learning workflows
Specify the backend when loading structures:
import ciffy
# Load with NumPy backend (recommended for general use)
polymer = ciffy.load("structure.cif", backend="numpy")
# Load with PyTorch backend (for deep learning workflows)
polymer = ciffy.load("structure.cif", backend="torch")
Polymers can be converted between backends:
# Convert to PyTorch tensors
torch_polymer = polymer.torch()
# Convert to NumPy arrays
numpy_polymer = polymer.numpy()
For PyTorch, move tensors to GPU:
# Move to CUDA
polymer_gpu = polymer.torch().to("cuda")
# Move to Apple Silicon (MPS)
polymer_mps = polymer.torch().to("mps")
Note: The default backend is "numpy" as of v0.6.0. Specify the backend explicitly for clarity.
Usage
import ciffy
# Load a structure from a CIF file
polymer = ciffy.load("structure.cif", backend="numpy")
# Basic information
print(polymer) # Summary of chains, residues, atoms
# Access coordinates and properties
coords = polymer.coordinates # (N, 3) array/tensor
atoms = polymer.atoms # (N,) array/tensor of atom types
sequence = polymer.str() # Sequence string
# Geometric operations
centered, means = polymer.center(ciffy.MOLECULE)
aligned, Q = polymer.align(ciffy.CHAIN)
distances = polymer.pairwise_distances(ciffy.RESIDUE)
# Selection
rna_chains = polymer.subset(ciffy.RNA)
backbone = polymer.backbone()
# Molecule type per chain (parsed from CIF _entity_poly block)
mol_types = polymer.molecule_type # Array of Molecule enum values
# Load with entity descriptions (off by default for performance)
polymer = ciffy.load("structure.cif", load_descriptions=True)
descriptions = polymer.descriptions # List of description strings per chain
# Iterate over chains
for chain in polymer.chains(ciffy.RNA):
print(chain.id(), chain.str())
# Compute RMSD between structures (defaults to MOLECULE scale)
rmsd = ciffy.rmsd(polymer1, polymer2)
Saving Structures
# Save to CIF format (supports all molecule types)
polymer.write("output.cif")
# Save only polymer atoms (excludes water, ions, ligands)
polymer.poly().write("polymer_only.cif")
Command Line Interface
# View structure summary
ciffy structure.cif
# Show sequences per chain
ciffy structure.cif --sequence
# Show entity descriptions per chain
ciffy structure.cif --desc
# Multiple files
ciffy file1.cif file2.cif
Example output:
PDB 9GCM (numpy)
──────────────────────
Type Res Atoms
A RNA 135 1413
B PROTEIN 132 1032
C PROTEIN 246 1261
D PROTEIN 485 760
──────────────────────
998 4466
Descriptions:
A: U11 snRNA
B: U11/U12 small nuclear ribonucleoprotein 25 kDa protein
C: U11/U12 small nuclear ribonucleoprotein 35 kDa protein
D: Programmed cell death protein 7
Module Structure
ciffy/
├── backend/ # NumPy/PyTorch abstraction layer
├── types/ # Scale, Molecule enums (Molecule is auto-generated)
├── biochemistry/ # Element, Residue, atom definitions (mostly auto-generated)
├── operations/ # Reduction, alignment operations
├── io/ # File loading and writing
├── src/codegen/ # Code generation from PDB Chemical Component Dictionary
└── utils/ # Helper functions and base classes
Development
Code Generation
Most biochemistry definitions are auto-generated from the PDB Chemical Component Dictionary (CCD). The generator runs automatically during pip install:
# Generated files (in .gitignore):
ciffy/biochemistry/_generated_atoms.py # Atom indices per residue
ciffy/biochemistry/_generated_elements.py # Element enum with atomic numbers
ciffy/biochemistry/_generated_residues.py # Residue enum and mappings
ciffy/types/molecule.py # Molecule type enum
ciffy/src/hash/*.c # C hash tables for fast lookup
To regenerate manually (requires gperf):
# CCD is auto-downloaded to ~/.cache/ciffy/ on first run
python ciffy/src/codegen/generate.py ~/.cache/ciffy/components.cif
To add new residues, edit RESIDUE_WHITELIST in ciffy/src/codegen/generate.py.
Testing
pip install pytest
pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ciffy-0.9.7.tar.gz.
File metadata
- Download URL: ciffy-0.9.7.tar.gz
- Upload date:
- Size: 223.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e8b0a7ba48b10eff4391ad9c2c84ca94e15103f71df7809c3588a216587ed6f
|
|
| MD5 |
ba853f8ebd5e34003b2c4e53471fc007
|
|
| BLAKE2b-256 |
567e5e2ca21ec5f5c08150702a9aaed07360bc6a585798fb4611743e019b4848
|
Provenance
The following attestation bundles were made for ciffy-0.9.7.tar.gz:
Publisher:
pypi.yml on hmblair/ciffy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ciffy-0.9.7.tar.gz -
Subject digest:
4e8b0a7ba48b10eff4391ad9c2c84ca94e15103f71df7809c3588a216587ed6f - Sigstore transparency entry: 762800936
- Sigstore integration time:
-
Permalink:
hmblair/ciffy@637f858764446c7d61f8630f60f0bac888d9be3e -
Branch / Tag:
refs/tags/v0.9.7 - Owner: https://github.com/hmblair
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@637f858764446c7d61f8630f60f0bac888d9be3e -
Trigger Event:
push
-
Statement type: