Skip to main content

mmCIF parser written in Nim with Python bindings

Project description

nim-mmcif

Fast mmCIF (Macromolecular Crystallographic Information File) parser written in Nim with Python bindings via nimporter.

The goal of this repository is to experiment with vibe coding while building something useful for bioinformatics community, to see how much of a cross platform library can be driven to completion by transformers

Features

  • 🚀 High-performance parsing of mmCIF files using Nim
  • 🐍 Seamless Python integration via nimporter
  • 🌍 Cross-platform support (Linux, macOS, Windows)
  • 🏗️ Automatic compilation on import
  • 📦 Easy installation via pip

Installation

Prerequisites

From PyPI (when available)

pip install nim-mmcif

From Source

# Install Nim (platform-specific, see below)
# macOS: brew install nim
# Linux: curl https://nim-lang.org/choosenim/init.sh -sSf | sh
# Windows: scoop install nim

# Install the package
git clone https://github.com/lucidrains/nim-mmcif
cd nim-mmcif
pip install -e .

For detailed platform-specific instructions, see CROSS_PLATFORM.md.

Quick Start

import nim_mmcif

# Parse an mmCIF file
data = nim_mmcif.parse_mmcif("path/to/file.mmcif")
print(f"Found {len(data['atoms'])} atoms")

# Get atom count directly
count = nim_mmcif.get_atom_count("path/to/file.mmcif")
print(f"File contains {count} atoms")

# Get all atoms with their properties
atoms = nim_mmcif.get_atoms("path/to/file.mmcif")
for atom in atoms[:5]:  # Print first 5 atoms
    print(f"Atom {atom['id']}: {atom['label_atom_id']} at ({atom['x']}, {atom['y']}, {atom['z']})")

# Get just the 3D coordinates
positions = nim_mmcif.get_atom_positions("path/to/file.mmcif")
for i, (x, y, z) in enumerate(positions[:5]):
    print(f"Position {i}: ({x:.3f}, {y:.3f}, {z:.3f})")

Batch Processing

Process multiple mmCIF files efficiently in a single operation:

import nim_mmcif

# List of mmCIF files to process
files = [
    "path/to/structure1.mmcif",
    "path/to/structure2.mmcif",
    "path/to/structure3.mmcif"
]

# Parse all files in batch
results = nim_mmcif.parse_mmcif_batch(files)

# Process results
for i, data in enumerate(results):
    print(f"Structure {i+1}: {len(data['atoms'])} atoms")
    
    # Analyze each structure
    atoms = data['atoms']
    if atoms:
        # Get unique chain IDs
        chains = set(atom['label_asym_id'] for atom in atoms)
        print(f"  Chains: {', '.join(sorted(chains))}")
        
        # Count residues
        residues = set((atom['label_asym_id'], atom['label_seq_id']) 
                      for atom in atoms)
        print(f"  Residues: {len(residues)}")

Batch processing is particularly useful when:

  • Analyzing multiple protein structures for comparative studies
  • Processing entire datasets of crystallographic structures
  • Building machine learning datasets from PDB files
  • Performing high-throughput structural analysis

The batch function provides better performance than individual parsing when processing multiple files, as it reduces the overhead of repeated function calls.

API Reference

Functions

parse_mmcif(filepath: str) -> dict

Parse an mmCIF file and return a dictionary with parsed data.

parse_mmcif_batch(filepaths: list[str]) -> list[dict]

Parse multiple mmCIF files in a single operation. More efficient than parsing files individually when processing multiple structures.

get_atom_count(filepath: str) -> int

Get the number of atoms in an mmCIF file.

get_atoms(filepath: str) -> list[dict]

Get all atoms from an mmCIF file as a list of dictionaries.

get_atom_positions(filepath: str) -> list[tuple[float, float, float]]

Get 3D coordinates of all atoms as a list of (x, y, z) tuples.

Atom Properties

Each atom dictionary contains:

  • type: Record type (ATOM or HETATM)
  • id: Atom serial number
  • label_atom_id: Atom name
  • label_comp_id: Residue name
  • label_asym_id: Chain identifier
  • label_seq_id: Residue sequence number
  • x, y, z: 3D coordinates (aliases for Cartn_x, Cartn_y, Cartn_z)
  • occupancy: Occupancy factor
  • B_iso_or_equiv: B-factor
  • And more...

Platform Support

Platform Architecture Python Status
Linux x64, ARM64 3.8-3.12
macOS x64, ARM64 3.8-3.12
Windows x64 3.8-3.12

Building from Source

Automatic Build

python build_nim.py

Manual Build

cd nim_mmcif
nim c --app:lib --out:nim_mmcif.so nim_mmcif.nim  # Linux/macOS
nim c --app:lib --out:nim_mmcif.pyd nim_mmcif.nim  # Windows

Development

Running Tests

pip install pytest
pytest tests/ -v

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests
  5. Submit a pull request

Documentation

Performance

The Nim implementation provides significant performance improvements over pure Python parsers, especially for large mmCIF files commonly used in structural biology.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built with Nim for high performance
  • Python integration via nimporter and nimpy
  • mmCIF format specification from wwPDB

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nim_mmcif-0.0.12.tar.gz (15.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nim_mmcif-0.0.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (147.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.12-cp312-cp312-macosx_11_0_arm64.whl (85.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

nim_mmcif-0.0.12-cp312-cp312-macosx_10_9_x86_64.whl (90.5 kB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

nim_mmcif-0.0.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (147.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.12-cp311-cp311-macosx_11_0_arm64.whl (85.0 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

nim_mmcif-0.0.12-cp311-cp311-macosx_10_9_x86_64.whl (90.5 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

nim_mmcif-0.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (147.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.12-cp310-cp310-macosx_11_0_arm64.whl (85.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

nim_mmcif-0.0.12-cp310-cp310-macosx_10_9_x86_64.whl (90.5 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

nim_mmcif-0.0.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (147.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.12-cp39-cp39-macosx_11_0_arm64.whl (85.0 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

nim_mmcif-0.0.12-cp39-cp39-macosx_10_9_x86_64.whl (90.5 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

nim_mmcif-0.0.12-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (17.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

nim_mmcif-0.0.12-cp38-cp38-macosx_11_0_arm64.whl (12.2 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

nim_mmcif-0.0.12-cp38-cp38-macosx_10_9_x86_64.whl (11.9 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file nim_mmcif-0.0.12.tar.gz.

File metadata

  • Download URL: nim_mmcif-0.0.12.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nim_mmcif-0.0.12.tar.gz
Algorithm Hash digest
SHA256 d20c58e2b3b53fb5c90f5ea7ea29bcf87a7e1e9ad972b9bc874f648dc490c0b5
MD5 6f4544f22c91bd3ef72af5337758b0a9
BLAKE2b-256 9dac597b0f14edbe13bfcd8df3e23edd52f3909893c7e60a44ca3615d23bfb77

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9bc69dd8f31e0ecc2e291197c69d971258964d0871ab789d19506e68ecd0740f
MD5 7951cefc39e7d1d40ab19810bf735c97
BLAKE2b-256 d7249fa78312fce13c70efabdb301615f7120922123153fbb54ac0ebf46ab62f

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 efd24b732bf17b64727022d500f4aecf4efa196c0b01842cbcb92670af80e25f
MD5 808653059b1cff46fb8f37ab4d1273f0
BLAKE2b-256 cab3b681a5c5fe05eca08a7a36a20aba653d92b07c3c38a6822e204fb103d049

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4b7afa3c0c2f0993b2edf456a672715e994092ad454bb85641b289ad2a060406
MD5 20221d8c4b54aac9857020b2595c82d8
BLAKE2b-256 9d16d7e7566bb81b3515400e3bc51fd3b6d255650d32cde2541be0c9f2295c1d

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ac20268c70dc5b0d9d99b67b815c890ec62deab7dd2828cb15443e8b54d00ad5
MD5 a3b91747f2ba5268e00ef6ce75a8483b
BLAKE2b-256 4d547a08fb917afa9d477bdb5c8e928a5b40b5f5939b504143d4a8af7b403b4d

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e0dcb72c5b5a6d62e9e07f86a87446414c4174cdf250fad9aeb3fb9e7b3f1709
MD5 9fd7501bacfa6b3c08ea5cb310a8c58b
BLAKE2b-256 7969654b4b1bb0a3de5b5ba0a6b68e0568cc6e81aab0fd1446b83e262dab0ded

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b037087e54b6a39a41ffd4f29aeb01bd8fc424d31dce3ae2425a1486da79a48c
MD5 346570c18139db27bea717cf657446f9
BLAKE2b-256 b33e27e3f431c29c3fd6d5fbd1fd34f044e91e55af06dc3bcfc363d70e2d6475

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ab6e12e4ffa579ba503deec050a2c6b8905aa1be9d974cabfea6450609303314
MD5 dfebe3c0b264299284e4103c66680dab
BLAKE2b-256 78382b693273cee84fbcc57e269211a7d7487ff44574425fedd0bafa6b167f1a

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e391a0772821d9cb88168b1f83cf55395a1f8112216de12c40dc62bfb0168ef2
MD5 1c53260e1bfd1f1e62221a65ef103c98
BLAKE2b-256 a8439fb427fc5533c83879bc80ea6cc5e5dc9a0e816c303727865ee865a4a07b

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d14a34971df3837c984f01bc3620c93fca99c043846f9cb699687fc1677d415f
MD5 9db8383cddd80ffed6b90f91a551cff7
BLAKE2b-256 892126b8bc30b30cde4e1ebea377340c39492243d0498a2ea84be4d6c2ad8b8f

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1cf84ccec5d7284ade714fc54a020ca385a50c96afb951e9e6005e7a58daa20a
MD5 4923963d4390ab974cab0396fd82f46b
BLAKE2b-256 256ac5a3eea9f4ae78795b0a9ab7b77aa9f476b7dbd0a6740caae26a0b358df7

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a17c9714d01dee22c6892cf0db3f27486e066a462b66ec5b9df36b6a370748da
MD5 27dd4359384fe94cd82c0e4dac8c3e6a
BLAKE2b-256 fa5c4911cd22b86fc7cbe2c6b78b73febb8e1bb4a1c3ad1d6bdfebb292b33339

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 33b8e6762528d3a12ebaba2bacf3e93c3288097a33702e0325517748b13b6312
MD5 1c30b035adb2668b231c83ebf2c9e99d
BLAKE2b-256 b5a8888a66ce1911e76c20c01dd0eca673f286d2edb6fc481372ed847f317f8a

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b7ee305cdb65ab8f62e3025ad8f8df8fbaf601769cf6036e84495118b5799829
MD5 0944fc2af180e9c9e9d5975e9ae0998e
BLAKE2b-256 ebf827c1d86454fe1ea0948bf8580b338fcab077ec0ef29a722b82b76a0b64b3

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3bcfbeea3ff55f4e91098c55ba1bc8bdfe33a72227e9571cf525a182fe7c9faa
MD5 c820bc5e266d4635f6459bdb131b90ac
BLAKE2b-256 f1efcac466c115a49349cce50119514506778c6086be71bbb3fa8ab9eba471ea

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.12-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.12-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 92652519ce8fff4d3aa3fe17425a618a72b32693e9dbd08a27c9739d6e80a63b
MD5 7585a837f9e0b5764fc93b2935185ee5
BLAKE2b-256 db05b023f99bed1de86b699fad1fc2427072a398c390479ad279f075c4d57a80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page