Skip to main content

Molecular Crystal Toolkit

Project description

MolCrysKit: Molecular Crystal Toolkit

License: MIT Python Version Build Status

Overview

MolCrysKit is a Python toolkit designed for handling molecular crystals, providing utilities for parsing crystallographic data, identifying molecules within crystals, and performing various analyses on molecular crystals using graph theory and the Atomic Simulation Environment (ASE).

Key Features

  • Robust Molecule Identification: Identify individual molecules within a crystal structure using graph-based algorithms
  • Disorder Handling: Process disordered structures with graph algorithms
  • Topological Surface Generation: Create surface slabs while preserving molecular topology
  • Hydrogen Addition: Automatically add hydrogen atoms with geometric placement rules, optionally constrained by CIF _chemical_formula_moiety fragment formulas when available

Installation

To install MolCrysKit, you can use pip:

pip install .

Or for development purposes, install in editable mode:

pip install -e .

Quick Start

Here's a simple example of how to use MolCrysKit:

from ase import Atoms
from molcrys_kit.structures.crystal import MolecularCrystal

# 1. Create a toy system (e.g., 2 Water molecules in a unit cell)
# In practice, you would typically load this from a file: atoms = read('cif_file.cif')
atoms = Atoms(
    symbols=['O', 'H', 'H', 'O', 'H', 'H'],
    positions=[
        [1.0, 1.0, 1.0], [1.8, 1.0, 1.0], [0.7, 1.6, 1.0],  # Molecule 1
        [5.0, 5.0, 5.0], [5.8, 5.0, 5.0], [4.7, 5.6, 5.0]   # Molecule 2
    ],
    cell=[10.0, 10.0, 10.0],
    pbc=True
)

# 2. Initialize MolecularCrystal (Automatically identifies molecules via graph logic)
crystal = MolecularCrystal.from_ase(atoms)

# 3. Access Crystal & Molecular Properties
print(f"Lattice Parameters: {crystal.get_lattice_parameters()}")
print(f"Identified Molecules: {len(crystal.molecules)}") 

mol = crystal.molecules[0]
print(f"Molecule 1 Formula: {mol.get_chemical_formula()}")
print(f"Molecule 1 Center of Mass: {mol.get_center_of_mass()}")

Citation

If you use MolCrysKit in academic work, please cite:

Guo, M.-Y.; Zhang, W.-X. MolCrysKit: A Topology-Aware Toolkit for Bridging Experimental Molecular-Crystal Structures and Simulation-Ready Modeling. J. Chem. Inf. Model. 2026, 66 (9), 4999-5007. https://doi.org/10.1021/acs.jcim.6c00168

For exact reproduction of the published JCIM results, use the archived v0.1.0 release together with the versioned container image and the material under paper/. The main branch may continue to evolve after publication.

Running with Docker (no local installation required)

Two Dockerfiles are provided for different environments:

File Base image Use case
Dockerfile python:3.10-slim Local use, reviewers, CI
Dockerfile.bohrium registry.dp.tech/dptech/ubuntu:ubuntu24.04-py3.12 Bohrium cloud platform

Both install MolCrysKit directly from the GitHub archive. A local Docker build context is still used to start the build, but the package, notebook assets, and helper scripts are fetched from the selected GitHub ref inside the image build.

Prerequisites

Quick start (general use)

# 1. Clone the repository and enter the package directory
git clone https://github.com/SchrodingersCattt/MolCrysKit.git
cd MolCrysKit

# 2. Build the image (≈ 5–10 min on first run; subsequent builds use the cache)
docker build -t molcryskit:latest .

# 3. Run the smoke test to confirm everything works
docker run --rm molcryskit:latest python /opt/molcryskit/scripts/docker_smoke_test.py

# 4. Start the Jupyter notebook server
docker run -it --rm -p 8888:8888 molcryskit:latest
# Then open http://localhost:8888 in your browser.
# Example CIF files are available at /workspace/notebook/example/ inside the container.

One-step build + test helper

# From the MolCrysKit/ directory:
bash scripts/docker-test.sh

This script builds the image and runs the smoke test automatically, reporting ALL CHECKS PASSED on success.

Bohrium cloud platform

# Build with the Bohrium-specific Dockerfile
docker build -f Dockerfile.bohrium -t molcryskit-bohrium:latest .

# Pin to an immutable Git tag instead of the moving main branch
# (recommended for archival/reviewer reproducibility)
docker build -f Dockerfile.bohrium \
    --build-arg MOLCRYSKIT_REF=refs/tags/v0.2.0 \
    -t molcryskit-bohrium:v0.2.0 .

The Bohrium image uses pip install from the GitHub archive zip (no git clone required) and does not include Jupyter — Bohrium provides its own notebook environment.

Permanent image publication with GHCR

The Bohrium registry is convenient for cloud execution, but it should not be the only archival location because image retention is controlled by the platform and project namespace. For a stable public anchor, publish immutable release images to GitHub Container Registry (GHCR).

The workflow publish-ghcr.yml pushes Dockerfile images to ghcr.io/<owner>/molcryskit:

  • pushing a Git tag such as v0.2.0 publishes ghcr.io/<owner>/molcryskit:v0.2.0
  • stable release tags also receive latest
  • manual dispatch can publish a development snapshot from a chosen Git ref

Recommended archival pattern:

# 1. Create and push an immutable release tag
git tag v0.2.0
git push origin v0.2.0

# 2. GitHub Actions publishes the image automatically to GHCR
#    ghcr.io/<owner>/molcryskit:v0.2.0

For Bohrium, keep using Dockerfile.bohrium as the platform-specific runtime image, but cite the GitHub repository and GHCR image as the permanent public anchor.

Mounting your own data

docker run -it --rm \
    -p 8888:8888 \
    -v /path/to/your/cif/files:/workspace/my_data \
    molcryskit:latest

Your files will be accessible at /workspace/my_data/ inside the container.

Documentation

For detailed architecture, tutorials, and API reference, please see the docs/ directory.

Disorder Handling: Explicit vs Implicit Path Compatibility

MolCrysKit resolves crystallographic disorder through two complementary paths that are designed to be fully compatible:

Explicit path (_add_explicit_conflicts + _add_conformer_conflicts): Processes atoms tagged with _atom_site_disorder_assembly / _atom_site_disorder_group in the CIF (e.g. SHELXL PART groups). Mutual-exclusion edges are added for atoms in different groups of the same assembly. Hydrogen atoms bonded to an explicit-disorder centre inherit the centre's group tag and are resolved together with it.

Implicit SP path (_add_implicit_sp_conflicts + _resolve_valence_conflicts): Processes partial-occupancy atoms on crystallographic special positions that carry no disorder tags (common in SHELX riding-H refinements). The algorithm clusters copies of each asymmetric-unit site by proximity and adds mutual-exclusion edges within each cluster. For heavy atoms (N, P, S) with sufficient copies around them, a tetrahedral/trigonal decomposition (_sp_tetrahedral_single) finds geometrically valid orientation combinations and adds cross-cluster compatibility constraints.

Motif merge post-pass (_merge_chemical_motifs): After the Maximum-Weight Independent Set is solved, isolated XH_n centres (e.g. NH4+, H2O) are reconstructed by a greedy distance- and angle-sorted H selection. Soft conflict edges (valence_geometry, implicit_sp, geometric) are ignored here so that the strongly disordered SP-position motifs are not excluded wholesale. A key guard enforces one H per crystallographic site (asym_id) when at least max_H distinct asym_ids are present: this prevents the greedy from picking multiple copies of the same SHELX H position (which point in nearly identical directions) before exhausting the other sites.

Known-compatible CIF styles for NH4+:

Style Example Tags Resolution path
Explicit PART groups (dg=-1) DAI-4 N4 disorder_assembly=A/B/C/D, disorder_group=-1 Explicit path
Implicit SHELX riding-H (dg=0) DAI-4 N1 No tags, occ=1/sso, sso>1 Implicit SP + motif merge
High-multiplicity SP (24 orientations) PAP-4 occ=1/24, no tags Implicit SP + motif merge

Both styles produce correct NH4+ (4 H per nitrogen) after resolution.

Replica generation modes: DisorderSolver.solve() and generate_ordered_replicas_from_disordered_sites() support three resolution modes. method="optimal" returns the single greedy maximum-weight independent set. method="random" samples one valid alternative from each independent PART/SP decision component, weighted by occupancy and reproducible with random_seed. method="enumerate" returns the Cartesian product of independent alternatives, sorted by joint occupancy probability; pass generate_count > 1 to cap enumeration to the top-N combinations.

Valence-completeness diagnostics: DisorderSolver.solve() automatically calls molcrys_kit.analysis.disorder.diagnostics.check_valence_completeness on each resolved structure. If an isolated N or O centre has an H count outside the expected range (N: 3–4; O: 0–2), a WARNING is emitted via the standard logging system. This catch-all does not modify the structure; it only makes potential resolution artefacts visible.

Project Structure

See the molcrys_kit/ directory for source code and the scripts/ directory for utility scripts (e.g. disorder diagnostics, molecule identification, CIF processing).

Contributing

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molcrys_kit-0.3.0.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molcrys_kit-0.3.0-py3-none-any.whl (167.4 kB view details)

Uploaded Python 3

File details

Details for the file molcrys_kit-0.3.0.tar.gz.

File metadata

  • Download URL: molcrys_kit-0.3.0.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for molcrys_kit-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7aaae7e5003d22c1ea5ef95ef91df12711e9e24de906cdb8e0905948313061aa
MD5 a4a3c13b21241aa068852a2d02460254
BLAKE2b-256 0beba312f114745572b3b50a6ba0c72fae9b940b0a7827fab7357d7ed66d2204

See more details on using hashes here.

Provenance

The following attestation bundles were made for molcrys_kit-0.3.0.tar.gz:

Publisher: publish-pypi.yml on SchrodingersCattt/MolCrysKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file molcrys_kit-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: molcrys_kit-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 167.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for molcrys_kit-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b24289962d6b20fa0c89dfb780d0d58e5ef2a2d7ea4b027f9bb4e4d7c66617af
MD5 f3fa091bae9d245ede92f65b3034e4df
BLAKE2b-256 d6d2967148db34c424b57e406411a3975ee69dbf1940c9238e8264c29540719c

See more details on using hashes here.

Provenance

The following attestation bundles were made for molcrys_kit-0.3.0-py3-none-any.whl:

Publisher: publish-pypi.yml on SchrodingersCattt/MolCrysKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page