Skip to main content

mmCIF parser written in Nim with Python bindings

Project description

nim-mmcif

Fast mmCIF (Macromolecular Crystallographic Information File) parser written in Nim with Python bindings

The goal of this repository is to experiment with vibe coding while building something useful for bioinformatics community, to see how much of a cross platform library can be driven to completion by transformers

Features

  • 🚀 High-performance parsing of mmCIF files using Nim
  • 🌍 Cross-platform support (Linux, macOS, Windows)
  • 📦 Easy installation via pip

Installation

Prerequisites

From PyPI (when available)

pip install nim-mmcif

From Source

# Install Nim (platform-specific, see below)
# macOS: brew install nim
# Linux: curl https://nim-lang.org/choosenim/init.sh -sSf | sh
# Windows: scoop install nim

# Install the package
git clone https://github.com/lucidrains/nim-mmcif
cd nim-mmcif
pip install -e .

For detailed platform-specific instructions, see CROSS_PLATFORM.md.

Quick Start

Python Usage

import nim_mmcif

# Parse an mmCIF file
data = nim_mmcif.parse_mmcif("path/to/file.mmcif")
print(f"Found {len(data['atoms'])} atoms")

# Parse multiple files using glob patterns
# Returns dict[str, dict] mapping filepaths to parsed data
results = nim_mmcif.parse_mmcif("path/to/*.mmcif")
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

# Parse with recursive glob patterns
results = nim_mmcif.parse_mmcif("path/**/*.mmcif")
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

# Get atom count directly
count = nim_mmcif.get_atom_count("path/to/file.mmcif")
print(f"File contains {count} atoms")

# Get all atoms with their properties
atoms = nim_mmcif.get_atoms("path/to/file.mmcif")
for atom in atoms[:5]:  # Print first 5 atoms
    print(f"Atom {atom['id']}: {atom['label_atom_id']} at ({atom['x']}, {atom['y']}, {atom['z']})")

# Get just the 3D coordinates
positions = nim_mmcif.get_atom_positions("path/to/file.mmcif")
for i, (x, y, z) in enumerate(positions[:5]):
    print(f"Position {i}: ({x:.3f}, {y:.3f}, {z:.3f})")

Nim Usage

import nim_mmcif/mmcif

# Parse an mmCIF file
let data = mmcif_parse("path/to/file.mmcif")
echo "Found ", data.atoms.len, " atoms"

# Iterate through atoms
for atom in data.atoms[0..<min(5, data.atoms.len)]:
  echo "Atom ", atom.id, ": ", atom.label_atom_id, 
       " at (", atom.Cartn_x, ", ", atom.Cartn_y, ", ", atom.Cartn_z, ")"

# Access specific atom properties
if data.atoms.len > 0:
  let firstAtom = data.atoms[0]
  echo "Chain: ", firstAtom.label_asym_id
  echo "Residue: ", firstAtom.label_comp_id
  echo "B-factor: ", firstAtom.B_iso_or_equiv

Batch Processing

Process multiple mmCIF files efficiently in a single operation:

import nim_mmcif

# List of mmCIF files to process
files = [
    "path/to/structure1.mmcif",
    "path/to/structure2.mmcif",
    "path/to/structure3.mmcif"
]

# Parse all files in batch (returns list when no globs used)
results = nim_mmcif.parse_mmcif_batch(files)

# Process results
for i, data in enumerate(results):
    print(f"Structure {i+1}: {len(data['atoms'])} atoms")
    
    # Analyze each structure
    atoms = data['atoms']
    if atoms:
        # Get unique chain IDs
        chains = set(atom['label_asym_id'] for atom in atoms)
        print(f"  Chains: {', '.join(sorted(chains))}")
        
        # Count residues
        residues = set((atom['label_asym_id'], atom['label_seq_id']) 
                      for atom in atoms)
        print(f"  Residues: {len(residues)}")

# Batch processing with glob patterns (returns dict)
results = nim_mmcif.parse_mmcif_batch("path/to/*.mmcif")
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

# Mix of glob patterns and regular paths (returns dict)
results = nim_mmcif.parse_mmcif_batch([
    "specific_file.mmcif",
    "structures/*.mmcif",
    "models/model_?.mmcif"
])
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

Batch processing is particularly useful when:

  • Analyzing multiple protein structures for comparative studies
  • Processing entire datasets of crystallographic structures
  • Building machine learning datasets from PDB files
  • Performing high-throughput structural analysis

The batch function provides better performance than individual parsing when processing multiple files, as it reduces the overhead of repeated function calls.

API Reference

Functions

parse_mmcif(filepath: str) -> dict | dict[str, dict]

Parse an mmCIF file or files matching a glob pattern.

  • Single file: Returns a dictionary with parsed data containing 'atoms' key
  • Glob pattern: Returns a dictionary mapping file paths to parsed data
  • Supports wildcards: * (any characters), ? (single character), ** (recursive)

parse_mmcif_batch(filepaths: list[str] | str) -> list[dict] | dict[str, dict]

Parse multiple mmCIF files in a single operation.

  • No glob patterns: Returns a list of dictionaries with parsed data
  • With glob patterns: Returns a dictionary mapping file paths to parsed data
  • Accepts a single path/pattern or a list of paths/patterns
  • More efficient than parsing files individually when processing multiple structures

get_atom_count(filepath: str) -> int

Get the number of atoms in an mmCIF file.

get_atoms(filepath: str) -> list[dict]

Get all atoms from an mmCIF file as a list of dictionaries.

get_atom_positions(filepath: str) -> list[tuple[float, float, float]]

Get 3D coordinates of all atoms as a list of (x, y, z) tuples.

Atom Properties

Each atom dictionary contains:

  • type: Record type (ATOM or HETATM)
  • id: Atom serial number
  • label_atom_id: Atom name
  • label_comp_id: Residue name
  • label_asym_id: Chain identifier
  • label_seq_id: Residue sequence number
  • x, y, z: 3D coordinates (aliases for Cartn_x, Cartn_y, Cartn_z)
  • occupancy: Occupancy factor
  • B_iso_or_equiv: B-factor
  • And more...

Platform Support

Platform Architecture Python Status
Linux x64, ARM64 3.8-3.12
macOS x64, ARM64 3.8-3.12
Windows x64 3.8-3.12

Building from Source

Automatic Build

python build_nim.py

Manual Build

# Build using nimble tasks
nimble build         # Build debug version
nimble buildRelease  # Build optimized release version

Development

Running Tests

pip install pytest
pytest tests/ -v

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests
  5. Submit a pull request

Documentation

Performance

The Nim implementation provides significant performance improvements over pure Python parsers, especially for large mmCIF files commonly used in structural biology.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built with Nim for high performance
  • Python integration via nimporter and nimpy
  • mmCIF format specification from wwPDB

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nim_mmcif-0.0.15.tar.gz (19.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nim_mmcif-0.0.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (150.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.15-cp312-cp312-macosx_11_0_arm64.whl (86.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

nim_mmcif-0.0.15-cp312-cp312-macosx_10_9_x86_64.whl (91.8 kB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

nim_mmcif-0.0.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (150.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.15-cp311-cp311-macosx_11_0_arm64.whl (86.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

nim_mmcif-0.0.15-cp311-cp311-macosx_10_9_x86_64.whl (91.8 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

nim_mmcif-0.0.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (150.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.15-cp310-cp310-macosx_11_0_arm64.whl (86.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

nim_mmcif-0.0.15-cp310-cp310-macosx_10_9_x86_64.whl (91.8 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

nim_mmcif-0.0.15-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (149.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.15-cp39-cp39-macosx_11_0_arm64.whl (86.4 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

nim_mmcif-0.0.15-cp39-cp39-macosx_10_9_x86_64.whl (91.8 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

nim_mmcif-0.0.15-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (18.2 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

nim_mmcif-0.0.15-cp38-cp38-macosx_11_0_arm64.whl (13.0 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

nim_mmcif-0.0.15-cp38-cp38-macosx_10_9_x86_64.whl (12.6 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file nim_mmcif-0.0.15.tar.gz.

File metadata

  • Download URL: nim_mmcif-0.0.15.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nim_mmcif-0.0.15.tar.gz
Algorithm Hash digest
SHA256 e8758e13a17122f4d89a52f9fc55adc4622e72f111c0a17295504b04be021919
MD5 13cc9ff16f622b124f3955d90c3c8f07
BLAKE2b-256 107305419458238fb2ed329c6289132a1addaba4a5b8bc19a717784aa6dd98da

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 55ad64ac21f2c842f05b9b1efa2c8c633d59f9aa318a6c935ba115039c53b400
MD5 c305012cda900849f385f1bb43f05034
BLAKE2b-256 aa200ddfbae74218b365a7b28d521b1739c226ce883dd2fa3521fcff27d5e066

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 afccaa909224d592cf925dd934b6e95d841583196f39535923504b91204245d7
MD5 7435762166177b1139613c1d7092c189
BLAKE2b-256 aa0f3140ab5b552a4e82eed014396127cebe896362da01cab449e4e170121e57

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9e7420b72da0f9b5151d83cef967e8f73d8bd8f9a82fe3b59e8181e465b0bd60
MD5 0020c9f0d2b1391ee944de22440af605
BLAKE2b-256 8e28cfa927efddd0e20699907ef2a92ce5e43bed7202d5f1c79fbbed835b0d48

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e6b56d603f75b17543c58b3561b7ceac12c8e62b8bfbb948967e82893b5ceebc
MD5 fe2d46452b3365049ce0ebb044064a4f
BLAKE2b-256 e998a4f86eca8fb8fbd72190e7effd83d44944acdc09563fa7eecaad47ed1ec3

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 53c55b998d2202c66817667b72a4ccb97af97d23bd5410dd51ad56d048931d39
MD5 edf323a2b8b38dd1ba18b089e984b412
BLAKE2b-256 c27a2100a9fe07d7fc579582bc894ed8de84127632d32a16fd12915e0c2baccf

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 aa64015082d42ae405d372d52d25f1c62ec5e083cace1de78e9ac1f5b590972d
MD5 41c35dee130d4bb5c2566c06f32eed33
BLAKE2b-256 6bc416feb80f9749507d1797499477c721ac73aa754f14d9d002be6282577af8

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9347a0badcf404ad62ecc22d9d686dfb80a3120b8cebf7726e04d3545e09492a
MD5 643b713f7ecd9671ff48413237c1aa38
BLAKE2b-256 8644c6e832a89ccba4ae790e68a3ac110d8514b6b1c8f5e10ad393ca0a610590

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 875df9b72064d2bd89f2c3dc262db2368f9d1d50dd90b921d44e957194847e0d
MD5 0767a3b2140dd6988a1e27453d11e0ff
BLAKE2b-256 c2bb7e832c4ffa39559302cb089a00dee58347b9901fe8bc2f55483b80591d51

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 67861f29b9eb50296cdf620324e8ed0e8555813410926b04fa1e38a7349b43fb
MD5 7e0d55161de21fe8a631503d474a40b1
BLAKE2b-256 e2a876c77ce94dcbb0a5356ab41a9bd0839769aafad7e8d541ff09c8431a5f23

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7e5273d2bace289414facc768f79b2b86f8876bbad3a60969e089bda7d6b65af
MD5 0c4077ba9347abba9f63070828f339d1
BLAKE2b-256 8c9f37723fe67fd1a74a6cd0d040f5a5673801d6dd47bfe189d13109b467d790

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a3179fb783becf355e901cb6f056d95f3a6d6c44316214333c9a038f10939106
MD5 9d3735c8c95da2eefceded9edb9b9995
BLAKE2b-256 ab19cf4ee436ea3a3236796dc3e1d063994acdb6f32fc2ff4be560c855fb9805

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ada729ff5828ba49a080bafa26f51965842f96b933760c697b5027bde5e96a21
MD5 0c3adda78545f5283f4e79a48286c474
BLAKE2b-256 a31a81f862ef06d5ec8240ccff2cca03ff1436c0686d95dc9cbbaed7217ea036

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d394837b44a70bc548e2795bc3f5e525b4ca6da155512d90440b02b415afe5b2
MD5 471ba32ad160bba0d752244e80077113
BLAKE2b-256 dabd2ac02206ffc26654d083d671bac87d0010d87f79f31aedf70179022b3787

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 989d0ad47e130e5e4b126d9146943641add4361ede1ff15fb338a17fa3ae479b
MD5 10e698148148c4543d1abd147f928a60
BLAKE2b-256 b14c417dc267dba7f549a613f4ca3db0883aaf09b5eafc4bda36619841c0110f

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.15-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.15-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5f16eccb974c56e1abf360a8841a854ddb6afd888851dd6d6a83a16f4be92c81
MD5 cbde20722a22b012b905ef7a0981ea44
BLAKE2b-256 61c8e57e3727c7d19ba33ea462553782a3441a5815bd072de8e5825f4851a73f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page