Molecular Crystal Toolkit
Project description
MolCrysKit: Molecular Crystal Toolkit
Overview
MolCrysKit is a Python toolkit designed for handling molecular crystals, providing utilities for parsing crystallographic data, identifying molecules within crystals, and performing various analyses on molecular crystals using graph theory and the Atomic Simulation Environment (ASE).
Key Features
- Robust Molecule Identification: Identify individual molecules within a crystal structure using graph-based algorithms
- Disorder Handling: Process disordered structures with graph algorithms
- Topological Surface Generation: Create surface slabs while preserving molecular topology
- Hydrogen Addition: Automatically add hydrogen atoms with geometric placement rules, optionally constrained by CIF
_chemical_formula_moietyfragment formulas when available
Installation
To install MolCrysKit, you can use pip:
pip install .
Or for development purposes, install in editable mode:
pip install -e .
Quick Start
Here's a simple example of how to use MolCrysKit:
from ase import Atoms
from molcrys_kit.structures.crystal import MolecularCrystal
# 1. Create a toy system (e.g., 2 Water molecules in a unit cell)
# In practice, you would typically load this from a file: atoms = read('cif_file.cif')
atoms = Atoms(
symbols=['O', 'H', 'H', 'O', 'H', 'H'],
positions=[
[1.0, 1.0, 1.0], [1.8, 1.0, 1.0], [0.7, 1.6, 1.0], # Molecule 1
[5.0, 5.0, 5.0], [5.8, 5.0, 5.0], [4.7, 5.6, 5.0] # Molecule 2
],
cell=[10.0, 10.0, 10.0],
pbc=True
)
# 2. Initialize MolecularCrystal (Automatically identifies molecules via graph logic)
crystal = MolecularCrystal.from_ase(atoms)
# 3. Access Crystal & Molecular Properties
print(f"Lattice Parameters: {crystal.get_lattice_parameters()}")
print(f"Identified Molecules: {len(crystal.molecules)}")
mol = crystal.molecules[0]
print(f"Molecule 1 Formula: {mol.get_chemical_formula()}")
print(f"Molecule 1 Center of Mass: {mol.get_center_of_mass()}")
Citation
If you use MolCrysKit in academic work, please cite:
Guo, M.-Y.; Zhang, W.-X. MolCrysKit: A Topology-Aware Toolkit for Bridging Experimental Molecular-Crystal Structures and Simulation-Ready Modeling. J. Chem. Inf. Model. 2026, 66 (9), 4999-5007. https://doi.org/10.1021/acs.jcim.6c00168
For exact reproduction of the published JCIM results, use the archived
v0.1.0 release together with the versioned container image and the material
under paper/. The main branch may continue to evolve after publication.
Running with Docker (no local installation required)
Two Dockerfiles are provided for different environments:
| File | Base image | Use case |
|---|---|---|
Dockerfile |
python:3.10-slim |
Local use, reviewers, CI |
Dockerfile.bohrium |
registry.dp.tech/dptech/ubuntu:ubuntu24.04-py3.12 |
Bohrium cloud platform |
Both install MolCrysKit directly from the GitHub archive. A local Docker build context is still used to start the build, but the package, notebook assets, and helper scripts are fetched from the selected GitHub ref inside the image build.
Prerequisites
- Docker Desktop (macOS / Windows) or the Docker Engine (Linux).
Quick start (general use)
# 1. Clone the repository and enter the package directory
git clone https://github.com/SchrodingersCattt/MolCrysKit.git
cd MolCrysKit
# 2. Build the image (≈ 5–10 min on first run; subsequent builds use the cache)
docker build -t molcryskit:latest .
# 3. Run the smoke test to confirm everything works
docker run --rm molcryskit:latest python /opt/molcryskit/scripts/docker_smoke_test.py
# 4. Start the Jupyter notebook server
docker run -it --rm -p 8888:8888 molcryskit:latest
# Then open http://localhost:8888 in your browser.
# Example CIF files are available at /workspace/notebook/example/ inside the container.
One-step build + test helper
# From the MolCrysKit/ directory:
bash scripts/docker-test.sh
This script builds the image and runs the smoke test automatically, reporting
ALL CHECKS PASSED on success.
Bohrium cloud platform
# Build with the Bohrium-specific Dockerfile
docker build -f Dockerfile.bohrium -t molcryskit-bohrium:latest .
# Pin to an immutable Git tag instead of the moving main branch
# (recommended for archival/reviewer reproducibility)
docker build -f Dockerfile.bohrium \
--build-arg MOLCRYSKIT_REF=refs/tags/v0.2.0 \
-t molcryskit-bohrium:v0.2.0 .
The Bohrium image uses pip install from the GitHub archive zip (no git clone
required) and does not include Jupyter — Bohrium provides its own notebook
environment.
Permanent image publication with GHCR
The Bohrium registry is convenient for cloud execution, but it should not be the only archival location because image retention is controlled by the platform and project namespace. For a stable public anchor, publish immutable release images to GitHub Container Registry (GHCR).
The workflow publish-ghcr.yml pushes
Dockerfile images to ghcr.io/<owner>/molcryskit:
- pushing a Git tag such as
v0.2.0publishesghcr.io/<owner>/molcryskit:v0.2.0 - stable release tags also receive
latest - manual dispatch can publish a development snapshot from a chosen Git ref
Recommended archival pattern:
# 1. Create and push an immutable release tag
git tag v0.2.0
git push origin v0.2.0
# 2. GitHub Actions publishes the image automatically to GHCR
# ghcr.io/<owner>/molcryskit:v0.2.0
For Bohrium, keep using Dockerfile.bohrium as the
platform-specific runtime image, but cite the GitHub repository and GHCR image
as the permanent public anchor.
Mounting your own data
docker run -it --rm \
-p 8888:8888 \
-v /path/to/your/cif/files:/workspace/my_data \
molcryskit:latest
Your files will be accessible at /workspace/my_data/ inside the container.
Documentation
For detailed architecture, tutorials, and API reference, please see the docs/ directory.
Disorder Handling: Explicit vs Implicit Path Compatibility
MolCrysKit resolves crystallographic disorder through two complementary paths that are designed to be fully compatible:
Explicit path (_add_explicit_conflicts + _add_conformer_conflicts):
Processes atoms tagged with _atom_site_disorder_assembly / _atom_site_disorder_group
in the CIF (e.g. SHELXL PART groups). Mutual-exclusion edges are added for
atoms in different groups of the same assembly. Hydrogen atoms bonded to an
explicit-disorder centre inherit the centre's group tag and are resolved together
with it.
Implicit SP path (_add_implicit_sp_conflicts + _resolve_valence_conflicts):
Processes partial-occupancy atoms on crystallographic special positions that carry
no disorder tags (common in SHELX riding-H refinements). The algorithm
clusters copies of each asymmetric-unit site by proximity and adds mutual-exclusion
edges within each cluster. For heavy atoms (N, P, S) with sufficient copies
around them, a tetrahedral/trigonal decomposition (_sp_tetrahedral_single) finds
geometrically valid orientation combinations and adds cross-cluster compatibility
constraints.
Motif merge post-pass (_merge_chemical_motifs):
After the Maximum-Weight Independent Set is solved, isolated XH_n centres
(e.g. NH4+, H2O) are reconstructed by a greedy distance- and angle-sorted H
selection. Soft conflict edges (valence_geometry, implicit_sp, geometric)
are ignored here so that the strongly disordered SP-position motifs are not
excluded wholesale. A key guard enforces one H per crystallographic site
(asym_id) when at least max_H distinct asym_ids are present: this prevents
the greedy from picking multiple copies of the same SHELX H position (which point
in nearly identical directions) before exhausting the other sites.
Known-compatible CIF styles for NH4+:
| Style | Example | Tags | Resolution path |
|---|---|---|---|
Explicit PART groups (dg=-1) |
DAI-4 N4 | disorder_assembly=A/B/C/D, disorder_group=-1 |
Explicit path |
Implicit SHELX riding-H (dg=0) |
DAI-4 N1 | No tags, occ=1/sso, sso>1 |
Implicit SP + motif merge |
| High-multiplicity SP (24 orientations) | PAP-4 | occ=1/24, no tags |
Implicit SP + motif merge |
Both styles produce correct NH4+ (4 H per nitrogen) after resolution.
Replica generation modes:
DisorderSolver.solve() and
generate_ordered_replicas_from_disordered_sites() support three resolution
modes. method="optimal" returns the single greedy maximum-weight independent
set. method="random" samples one valid alternative from each independent
PART/SP decision component, weighted by occupancy and reproducible with
random_seed. method="enumerate" returns the Cartesian product of independent
alternatives, sorted by joint occupancy probability; pass generate_count > 1
to cap enumeration to the top-N combinations.
Valence-completeness diagnostics:
DisorderSolver.solve() automatically calls
molcrys_kit.analysis.disorder.diagnostics.check_valence_completeness on
each resolved structure. If an isolated N or O centre has an H count outside
the expected range (N: 3–4; O: 0–2), a WARNING is emitted via the standard
logging system. This catch-all does not modify the structure; it only makes
potential resolution artefacts visible.
Project Structure
See the molcrys_kit/ directory for source code and the scripts/ directory for utility scripts (e.g. disorder diagnostics, molecule identification, CIF processing).
Contributing
Contributions are welcome! Please fork the repository and submit a pull request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file molcrys_kit-0.4.0.tar.gz.
File metadata
- Download URL: molcrys_kit-0.4.0.tar.gz
- Upload date:
- Size: 4.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65edf0980f098669c8d4eb6f28bccbbb9530ce552c68631ed440d1065d760c02
|
|
| MD5 |
33872aacf57423daa80dabf4ca2e5eaa
|
|
| BLAKE2b-256 |
086fd70698716fde004aad53be12acb283b460f5ce59894d5f9512cf8220e8dd
|
Provenance
The following attestation bundles were made for molcrys_kit-0.4.0.tar.gz:
Publisher:
publish-pypi.yml on SchrodingersCattt/MolCrysKit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
molcrys_kit-0.4.0.tar.gz -
Subject digest:
65edf0980f098669c8d4eb6f28bccbbb9530ce552c68631ed440d1065d760c02 - Sigstore transparency entry: 1627320040
- Sigstore integration time:
-
Permalink:
SchrodingersCattt/MolCrysKit@a31daf481f1841c13f27a9542a8f0d43e105383e -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/SchrodingersCattt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a31daf481f1841c13f27a9542a8f0d43e105383e -
Trigger Event:
push
-
Statement type:
File details
Details for the file molcrys_kit-0.4.0-py3-none-any.whl.
File metadata
- Download URL: molcrys_kit-0.4.0-py3-none-any.whl
- Upload date:
- Size: 210.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41166c134f6b7f06c7036728277255c2c83c0cd3fe38952f6617d0f7f9441356
|
|
| MD5 |
99f664c085d27c7fed219d147681a9ae
|
|
| BLAKE2b-256 |
a4d2f1a45a45aa7b0f595a07a470fe80a594d6b455b4f260d5a56fd102f89487
|
Provenance
The following attestation bundles were made for molcrys_kit-0.4.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on SchrodingersCattt/MolCrysKit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
molcrys_kit-0.4.0-py3-none-any.whl -
Subject digest:
41166c134f6b7f06c7036728277255c2c83c0cd3fe38952f6617d0f7f9441356 - Sigstore transparency entry: 1627320108
- Sigstore integration time:
-
Permalink:
SchrodingersCattt/MolCrysKit@a31daf481f1841c13f27a9542a8f0d43e105383e -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/SchrodingersCattt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a31daf481f1841c13f27a9542a8f0d43e105383e -
Trigger Event:
push
-
Statement type: