Skip to main content

mmCIF parser written in Nim with Python bindings

Project description

nim-mmcif

Fast mmCIF (Macromolecular Crystallographic Information File) parser written in Nim with Python bindings

The goal of this repository is to experiment with vibe coding while building something useful for bioinformatics community, to see how much of a cross platform library can be driven to completion by transformers

Features

  • 🚀 High-performance parsing of mmCIF files using Nim
  • 🌍 Cross-platform support (Linux, macOS, Windows)
  • 📦 Easy installation via pip

Installation

Prerequisites

From PyPI (when available)

pip install nim-mmcif

From Source

# Install Nim (platform-specific, see below)
# macOS: brew install nim
# Linux: curl https://nim-lang.org/choosenim/init.sh -sSf | sh
# Windows: scoop install nim

# Install the package
git clone https://github.com/lucidrains/nim-mmcif
cd nim-mmcif
pip install -e .

For detailed platform-specific instructions, see CROSS_PLATFORM.md.

Quick Start

Python Usage

import nim_mmcif

# Parse an mmCIF file
data = nim_mmcif.parse_mmcif("path/to/file.mmcif")
print(f"Found {len(data['atoms'])} atoms")

# Parse multiple files using glob patterns
# Returns dict[str, dict] mapping filepaths to parsed data
results = nim_mmcif.parse_mmcif("path/to/*.mmcif")
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

# Parse with recursive glob patterns
results = nim_mmcif.parse_mmcif("path/**/*.mmcif")
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

# Get atom count directly
count = nim_mmcif.get_atom_count("path/to/file.mmcif")
print(f"File contains {count} atoms")

# Get all atoms with their properties
atoms = nim_mmcif.get_atoms("path/to/file.mmcif")
for atom in atoms[:5]:  # Print first 5 atoms
    print(f"Atom {atom['id']}: {atom['label_atom_id']} at ({atom['x']}, {atom['y']}, {atom['z']})")

# Get just the 3D coordinates
positions = nim_mmcif.get_atom_positions("path/to/file.mmcif")
for i, (x, y, z) in enumerate(positions[:5]):
    print(f"Position {i}: ({x:.3f}, {y:.3f}, {z:.3f})")

Nim Usage

import nim_mmcif/mmcif

# Parse an mmCIF file
let data = mmcif_parse("path/to/file.mmcif")
echo "Found ", data.atoms.len, " atoms"

# Iterate through atoms
for atom in data.atoms[0..<min(5, data.atoms.len)]:
  echo "Atom ", atom.id, ": ", atom.label_atom_id, 
       " at (", atom.Cartn_x, ", ", atom.Cartn_y, ", ", atom.Cartn_z, ")"

# Access specific atom properties
if data.atoms.len > 0:
  let firstAtom = data.atoms[0]
  echo "Chain: ", firstAtom.label_asym_id
  echo "Residue: ", firstAtom.label_comp_id
  echo "B-factor: ", firstAtom.B_iso_or_equiv

Batch Processing

Process multiple mmCIF files efficiently in a single operation:

import nim_mmcif

# List of mmCIF files to process
files = [
    "path/to/structure1.mmcif",
    "path/to/structure2.mmcif",
    "path/to/structure3.mmcif"
]

# Parse all files in batch (returns list when no globs used)
results = nim_mmcif.parse_mmcif_batch(files)

# Process results
for i, data in enumerate(results):
    print(f"Structure {i+1}: {len(data['atoms'])} atoms")
    
    # Analyze each structure
    atoms = data['atoms']
    if atoms:
        # Get unique chain IDs
        chains = set(atom['label_asym_id'] for atom in atoms)
        print(f"  Chains: {', '.join(sorted(chains))}")
        
        # Count residues
        residues = set((atom['label_asym_id'], atom['label_seq_id']) 
                      for atom in atoms)
        print(f"  Residues: {len(residues)}")

# Batch processing with glob patterns (returns dict)
results = nim_mmcif.parse_mmcif_batch("path/to/*.mmcif")
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

# Mix of glob patterns and regular paths (returns dict)
results = nim_mmcif.parse_mmcif_batch([
    "specific_file.mmcif",
    "structures/*.mmcif",
    "models/model_?.mmcif"
])
for filepath, data in results.items():
    print(f"{filepath}: {len(data['atoms'])} atoms")

Batch processing is particularly useful when:

  • Analyzing multiple protein structures for comparative studies
  • Processing entire datasets of crystallographic structures
  • Building machine learning datasets from PDB files
  • Performing high-throughput structural analysis

The batch function provides better performance than individual parsing when processing multiple files, as it reduces the overhead of repeated function calls.

API Reference

Functions

parse_mmcif(filepath: str) -> dict | dict[str, dict]

Parse an mmCIF file or files matching a glob pattern.

  • Single file: Returns a dictionary with parsed data containing 'atoms' key
  • Glob pattern: Returns a dictionary mapping file paths to parsed data
  • Supports wildcards: * (any characters), ? (single character), ** (recursive)

parse_mmcif_batch(filepaths: list[str] | str) -> list[dict] | dict[str, dict]

Parse multiple mmCIF files in a single operation.

  • No glob patterns: Returns a list of dictionaries with parsed data
  • With glob patterns: Returns a dictionary mapping file paths to parsed data
  • Accepts a single path/pattern or a list of paths/patterns
  • More efficient than parsing files individually when processing multiple structures

get_atom_count(filepath: str) -> int

Get the number of atoms in an mmCIF file.

get_atoms(filepath: str) -> list[dict]

Get all atoms from an mmCIF file as a list of dictionaries.

get_atom_positions(filepath: str) -> list[tuple[float, float, float]]

Get 3D coordinates of all atoms as a list of (x, y, z) tuples.

Atom Properties

Each atom dictionary contains:

  • type: Record type (ATOM or HETATM)
  • id: Atom serial number
  • label_atom_id: Atom name
  • label_comp_id: Residue name
  • label_asym_id: Chain identifier
  • label_seq_id: Residue sequence number
  • x, y, z: 3D coordinates (aliases for Cartn_x, Cartn_y, Cartn_z)
  • occupancy: Occupancy factor
  • B_iso_or_equiv: B-factor
  • And more...

Platform Support

Platform Architecture Python Status
Linux x64, ARM64 3.8-3.12
macOS x64, ARM64 3.8-3.12
Windows x64 3.8-3.12

Building from Source

Automatic Build

python build_nim.py

Manual Build

# Build using nimble tasks
nimble build         # Build debug version
nimble buildRelease  # Build optimized release version

Development

Running Tests

pip install pytest
pytest tests/ -v

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests
  5. Submit a pull request

Documentation

Performance

The Nim implementation provides significant performance improvements over pure Python parsers, especially for large mmCIF files commonly used in structural biology.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built with Nim for high performance
  • Python integration via nimporter and nimpy
  • mmCIF format specification from wwPDB

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nim_mmcif-0.0.14.tar.gz (19.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nim_mmcif-0.0.14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (148.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.14-cp312-cp312-macosx_11_0_arm64.whl (85.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

nim_mmcif-0.0.14-cp312-cp312-macosx_10_9_x86_64.whl (91.1 kB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

nim_mmcif-0.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (148.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.14-cp311-cp311-macosx_11_0_arm64.whl (85.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

nim_mmcif-0.0.14-cp311-cp311-macosx_10_9_x86_64.whl (91.1 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

nim_mmcif-0.0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (148.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.14-cp310-cp310-macosx_11_0_arm64.whl (85.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

nim_mmcif-0.0.14-cp310-cp310-macosx_10_9_x86_64.whl (91.1 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

nim_mmcif-0.0.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (148.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

nim_mmcif-0.0.14-cp39-cp39-macosx_11_0_arm64.whl (85.7 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

nim_mmcif-0.0.14-cp39-cp39-macosx_10_9_x86_64.whl (91.1 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

nim_mmcif-0.0.14-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (18.2 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

nim_mmcif-0.0.14-cp38-cp38-macosx_11_0_arm64.whl (12.9 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

nim_mmcif-0.0.14-cp38-cp38-macosx_10_9_x86_64.whl (12.5 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file nim_mmcif-0.0.14.tar.gz.

File metadata

  • Download URL: nim_mmcif-0.0.14.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nim_mmcif-0.0.14.tar.gz
Algorithm Hash digest
SHA256 7dff48786f6d45d90c8844eceb5b378bce4ec971bd78b572e751dd2070e2dde6
MD5 ae31b9b413f3d56942f01b7bdec79c0a
BLAKE2b-256 92d4f45cd35fbfa0ee1bfcd0627c8741aef365736a0dba7de143cee698a4944b

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c2117efbc01caa1e9a17da3b7202867108ad14152ca2c8f4af71bf98262bdc26
MD5 0397f9344a31795f3285f37ad5d9d20f
BLAKE2b-256 4060f354d85026e1102e348a966af2928589a1618e883fc60f9cafdf570e788f

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3025926af73d8ff33b709ddd2e3d750a953cf437d7e1e37aa0464dec03c1b42d
MD5 f49e538a875f0ec591b8f05c9a5f4399
BLAKE2b-256 2fd09c6f302b42d71914a043d44a6e7cfde0e0d99fa438bace35045bf23e6bd9

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 172a4a81d1ddfaee5e082ae3c041f98d6eba5ad7f2963f98efe24e56a43a9ae6
MD5 84b3b9d951cf790638230cdd41b2ae6a
BLAKE2b-256 a901559dbac039df118383505ac0c34fc1737687c9192a7381d3ac82db7d968b

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b951b365e186ecd1fd24345388a56603b85ba5f8d5e3688bafcf819bf125d2cd
MD5 f3f76b670aa97f8a677f703345217ec9
BLAKE2b-256 cd67fddd29e2999b380648daa10436990582b496425288329f0b9a14e7b1074f

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 046948bdd112fa805fbf1d0d384a02d94150b3314b0b52238625cbf4f1f51924
MD5 80ba5a380c9b4796fef3a642453274dc
BLAKE2b-256 b4a2920438a274940956c2599d05f50c3d897fea1d6930d49a4d7c477a8ada80

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1acb026349963f4fe04bd7c9fd61e09d75c05d8ea382984e2f4d1794225cc98b
MD5 feaa71c53692f837900aac50c654b0f7
BLAKE2b-256 0b7209e7147aea1d277a2191ce712e22782f37853367e85806b8775d8e6d739d

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dd455ce481a0927994f836189b4786b17f29b668bcd4f0e2479004d6a6840fe8
MD5 cd34302247d27e0d0f5936edaa5e5e70
BLAKE2b-256 881daffe6a882c70abb6f48a145cae394506f4d08e3835f52a1f811cf001dbb9

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 76010462cb7b066e07aa71b5abd98697160228587dfb57e5624ac4db1455f4c1
MD5 f297274b9baa86f95c60714a23605622
BLAKE2b-256 fec6d1a1e75ca55ec5d79843ceb6aff64d521b732bba79810c9b49fc84b30090

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c293c422af9e91cadc781bbfefc587d640cc5169fa9f1f7a326da7ee6d649d62
MD5 9f00a98ffe1771185653fa5f0f159232
BLAKE2b-256 217f63803bea166ef66fd094b8f7206c19769ad1bb13d3aaee7ba3b2c5838b0c

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 180d4e940f00be99fb61c370dfbadfe7852d1eb248dbcb3c72c8dd96a671cdfa
MD5 cea8612083ecb472f065e14df0610b06
BLAKE2b-256 35645ba0fc0d670dafe91758829fa3655bf0806e7469253a2722af20197a2f37

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1d5124acdc4eeb47152c256b4b8dd2c3119c0ffc1a17e93a89b1bd36c95eee19
MD5 3f92422b807de5c2a879bc80f043d39e
BLAKE2b-256 043ca791dd7a7a53e203224198bee432896c012c7f21e80169fbbbcfe476fc1a

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 656163187f11832a217ef6d0c88e422af1d1db57765d754e47f06cfe72bb0216
MD5 83ad76bca8ec31c5c180540ddc719210
BLAKE2b-256 8f137b235681fccfd3a185a42612d0dfb7d07584df405bbe65341e1fbd41f5ce

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e9fcfaac76f8f80c378216c5c0f90f385abbad7dadad2daf1a69298e7f1ed052
MD5 6f37800226e3f409c14cb279354b9b93
BLAKE2b-256 5767d663de678ec7562e18ad572a02338566e75c546ce17eb2ecc6400ac07459

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 99f6271e4c10ebaf1b6cf8ed5864a76e0977956b933ad3df4e099f3d9475dc49
MD5 4da53db8ff9884e1166b6b46d0143527
BLAKE2b-256 847564ace624119acd74d62b22933df623bcc0850317b13a0e671cdd58634749

See more details on using hashes here.

File details

Details for the file nim_mmcif-0.0.14-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for nim_mmcif-0.0.14-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 52951bb7e72bd91405d6e26172ec8036fd8264f8a91e65bdfa181e37cea13b89
MD5 618411838d753d03752f7c71357e2b32
BLAKE2b-256 373910044adc38efc3aba93240190af3e65759430c2e227e4969eb2bd53900aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page