Convert PDB Chemical Component Dictionary (CCD) files to RDKit molecules
Project description
ccd2rdmol
A lightweight Python library and CLI tool for converting PDB Chemical Component Dictionary (CCD) files to RDKit molecule objects.
This project is a simplified implementation inspired by pdbeccdutils, focusing solely on CCD to RDKit conversion with 3D conformer support.
Features
- Fast CIF parsing using gemmi
- Conversion to RDKit molecule objects
- Support for both Ideal and Model 3D conformers
- Automatic metal bond to dative bond conversion
- Stereochemistry assignment from 3D coordinates
- Deuterium isotope handling
- Degenerate conformer detection and rejection
- CLI tool with rich output
Installation
# Library only
uv add ccd2rdmol
# With CLI support
uv add ccd2rdmol[cli]
Or with pip:
pip install ccd2rdmol
pip install ccd2rdmol[cli]
For development:
git clone https://github.com/N283T/ccd2rdmol.git
cd ccd2rdmol
uv sync # CLI is included in dev dependencies
Quick Start
from ccd2rdmol import read_ccd_file
result = read_ccd_file("ATP.cif")
print(f"Atoms: {result.mol.GetNumAtoms()}")
print(f"Sanitized: {result.sanitized}")
Usage
Reading from a CIF File
from ccd2rdmol import read_ccd_file
# Default: sanitize, add conformers, remove hydrogens
result = read_ccd_file("ATP.cif")
mol = result.mol
print(f"Atoms: {mol.GetNumAtoms()}")
print(f"Bonds: {mol.GetNumBonds()}")
print(f"Conformers: {mol.GetNumConformers()}") # 2 (IDEAL + MODEL)
print(f"Sanitized: {result.sanitized}")
# With options
result = read_ccd_file(
"ATP.cif",
sanitize_mol=True, # Sanitize molecule (default: True)
add_conformers=True, # Add 3D conformers (default: True)
remove_hydrogens=True, # Remove hydrogens (default: True)
)
Reading from a gemmi CIF Block
import gemmi
from ccd2rdmol import read_ccd_block
doc = gemmi.cif.read("components.cif")
for block in doc:
result = read_ccd_block(block)
print(f"{block.name}: {result.mol.GetNumAtoms()} atoms")
Low-Level API: chemcomp_to_mol
import gemmi
from ccd2rdmol import chemcomp_to_mol
doc = gemmi.cif.read("ATP.cif")
block = doc.sole_block()
cc = gemmi.make_chemcomp_from_block(block)
result = chemcomp_to_mol(
cc, block,
sanitize_mol=False, # Skip sanitization
add_conformers=True,
remove_hydrogens=False, # Keep all hydrogens
)
Generating SMILES and InChI
from rdkit import Chem
from rdkit.Chem.inchi import MolToInchi
from ccd2rdmol import read_ccd_file
result = read_ccd_file("ATP.cif")
smiles = Chem.MolToSmiles(result.mol)
inchi = MolToInchi(result.mol)
print(f"SMILES: {smiles}")
print(f"InChI: {inchi}")
Accessing Conformer Coordinates
from ccd2rdmol import read_ccd_file
result = read_ccd_file("ATP.cif", add_conformers=True)
mol = result.mol
for conf in mol.GetConformers():
name = conf.GetProp("name") # "IDEAL" or "MODEL"
print(f"\n{name} conformer:")
for i in range(mol.GetNumAtoms()):
pos = conf.GetAtomPosition(i)
atom = mol.GetAtomWithIdx(i)
print(f" {atom.GetSymbol()} ({pos.x:.3f}, {pos.y:.3f}, {pos.z:.3f})")
Handling Conversion Errors
from ccd2rdmol import read_ccd_file
result = read_ccd_file("complex_molecule.cif")
if result.errors:
print("Errors:", result.errors)
if result.warnings:
print("Warnings:", result.warnings)
if not result.sanitized:
print("Sanitization failed — molecule may have valence issues")
API Reference
Functions
read_ccd_file(path, *, sanitize_mol=True, add_conformers=True, remove_hydrogens=True) → ConversionResult
Read a CCD CIF file and convert to RDKit molecule.
| Parameter | Type | Default | Description |
|---|---|---|---|
path |
str |
— | Path to CIF file |
sanitize_mol |
bool |
True |
Sanitize the molecule (fix valence, kekulize) |
add_conformers |
bool |
True |
Add IDEAL and MODEL 3D conformers |
remove_hydrogens |
bool |
True |
Remove hydrogen atoms from the molecule |
Raises FileNotFoundError if file does not exist.
read_ccd_block(cif_block, *, sanitize_mol=True, add_conformers=True, remove_hydrogens=True) → ConversionResult
Convert a gemmi.cif.Block to RDKit molecule. Same parameters as read_ccd_file except takes a pre-parsed CIF block.
chemcomp_to_mol(cc, cif_block, *, sanitize_mol=True, add_conformers=True, remove_hydrogens=True) → ConversionResult
Convert a gemmi.ChemComp and gemmi.cif.Block to RDKit molecule. Lowest-level API for maximum control.
Data Classes
ConversionResult
Frozen dataclass returned by all conversion functions.
| Field | Type | Description |
|---|---|---|
mol |
Chem.Mol |
RDKit molecule object |
sanitized |
bool |
Whether sanitization succeeded |
errors |
list[str] |
Errors encountered during conversion |
warnings |
list[str] |
Warnings (e.g., missing conformer data) |
SanitizationResult
Frozen dataclass returned by sanitize().
| Field | Type | Description |
|---|---|---|
mol |
Chem.Mol |
Sanitized molecule (always a copy) |
success |
bool |
Whether sanitization succeeded |
How It Works
The conversion pipeline:
- Parse CIF — gemmi reads the CIF file and creates a
ChemComp(atoms, bonds, charges) and acif.Block(coordinate data) - Build molecule — Atoms are added to an RDKit
RWMolwith element types, charges, and isotope labels (Deuterium → isotope 2). Bonds are mapped from gemmi bond types to RDKit bond types viaBOND_TYPE_MAP - Set hydrogen flags — Atoms without explicit hydrogen neighbors are flagged
NoImplicit=Trueto prevent RDKit from adding implicit hydrogens - Add conformers — IDEAL and MODEL 3D coordinates are read from the CIF coordinate columns. Conformers with all-missing coordinates or degenerate positions (>1 atom at origin) are rejected
- Sanitize — The sanitizer fixes valence errors caused by metal-ligand bonds by converting them to dative bonds. Uses
Chem.DetectChemistryProblems()to identify problematic atoms and iteratively fixes them (up to 11 attempts). The original molecule is never modified - Assign stereochemistry —
AssignStereochemistryFrom3Dis called using the IDEAL conformer (preferred) or MODEL conformer - Remove hydrogens — Optionally strips hydrogen atoms from the final molecule
Comparison with pdbeccdutils
| ccd2rdmol | pdbeccdutils | |
|---|---|---|
| Focus | CCD → RDKit conversion only | Full CCD processing toolkit |
| Dependencies | gemmi + rdkit | gemmi + rdkit + scipy + numpy + ... |
| Scope | Single molecules from CIF | Depictions, scaffolds, fragments, PDB integration |
| Install size | Minimal | ~50+ transitive dependencies |
| Use case | "I just need an RDKit Mol from a CCD entry" | Full cheminformatics pipeline |
If you only need to convert CCD entries to RDKit molecules, ccd2rdmol provides a simpler, lighter alternative.
CLI
Note: CLI requires extra dependencies. Install with
pip install ccd2rdmol[cli]
# Output SMILES to stdout
ccd2rdmol convert ATP.cif
# Write to MOL file
ccd2rdmol convert ATP.cif -o ATP.mol
# Write to SDF format
ccd2rdmol convert ATP.cif -o ATP.sdf
# Output InChI
ccd2rdmol convert ATP.cif -f inchi
# Keep hydrogen atoms
ccd2rdmol convert ATP.cif --keep-hydrogens
# Show verbose information
ccd2rdmol convert ATP.cif -v
# Show molecule information only
ccd2rdmol info ATP.cif
CLI Options
ccd2rdmol convert [OPTIONS] INPUT_FILE
Arguments:
INPUT_FILE Input CCD CIF file path [required]
Options:
-o, --output PATH Output file path (.mol, .sdf)
-f, --format TEXT Output format (mol, sdf, smiles, inchi)
--no-sanitize Skip sanitization step
--no-conformers Skip adding 3D conformers
-H, --keep-hydrogens Keep hydrogen atoms
-v, --verbose Show detailed information
--help Show help message
Development
This project uses poethepoet as a task runner.
# Install dev dependencies
uv sync
# Format code (ruff format)
uv run poe format
# Lint (ruff check)
uv run poe lint
# Lint and auto-fix
uv run poe fix
# Type check (ty)
uv run poe check
# Run tests
uv run poe test
# Run tests with coverage
uv run poe test-cov
# Multi-version testing with nox (3.10, 3.11, 3.12, 3.13, 3.14)
uv run poe nox
# Run all checks (format, lint, check, test)
uv run poe all
# Clean cache files
uv run poe clean
Acknowledgments
This project is inspired by and built upon concepts from pdbeccdutils by PDBe (Protein Data Bank in Europe). Test data files are derived from the pdbeccdutils test suite.
We thank the PDBe team for their excellent work on chemical component processing tools.
License
MIT License
Test data files in tests/data/ are from pdbeccdutils (Apache-2.0 License).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ccd2rdmol-0.2.2.tar.gz.
File metadata
- Download URL: ccd2rdmol-0.2.2.tar.gz
- Upload date:
- Size: 163.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08b0219402d6c53321929d5b1025a59260063b9efe30c6f945fa4e73ba449198
|
|
| MD5 |
48bdc862b981111076d3167d8e83dfbf
|
|
| BLAKE2b-256 |
39fa1b18191326c48811ca01acb59e055d74ef0cbddb99e3753c5187a5c5ba56
|
Provenance
The following attestation bundles were made for ccd2rdmol-0.2.2.tar.gz:
Publisher:
release.yml on N283T/ccd2rdmol
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ccd2rdmol-0.2.2.tar.gz -
Subject digest:
08b0219402d6c53321929d5b1025a59260063b9efe30c6f945fa4e73ba449198 - Sigstore transparency entry: 940441623
- Sigstore integration time:
-
Permalink:
N283T/ccd2rdmol@3a82f5e0741c53e16f8718bad5568f692e66ff07 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/N283T
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3a82f5e0741c53e16f8718bad5568f692e66ff07 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ccd2rdmol-0.2.2-py3-none-any.whl.
File metadata
- Download URL: ccd2rdmol-0.2.2-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75100d5f523ac3a77bb29fa4c85e52f2ba5aac99d2da70b12477e8e503398a7f
|
|
| MD5 |
04e2695b0c9fe12e15496e7c892abfd5
|
|
| BLAKE2b-256 |
b0fe485b7d5593a20a451fe4f628d19195d3baae0cdb71c369b2ac211c78e367
|
Provenance
The following attestation bundles were made for ccd2rdmol-0.2.2-py3-none-any.whl:
Publisher:
release.yml on N283T/ccd2rdmol
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ccd2rdmol-0.2.2-py3-none-any.whl -
Subject digest:
75100d5f523ac3a77bb29fa4c85e52f2ba5aac99d2da70b12477e8e503398a7f - Sigstore transparency entry: 940441626
- Sigstore integration time:
-
Permalink:
N283T/ccd2rdmol@3a82f5e0741c53e16f8718bad5568f692e66ff07 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/N283T
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3a82f5e0741c53e16f8718bad5568f692e66ff07 -
Trigger Event:
push
-
Statement type: