Skip to main content

Convert PDB Chemical Component Dictionary (CCD) files to RDKit molecules

Project description

ccd2rdmol

CI PyPI version License: MIT Python 3.10+

A lightweight Python library and CLI tool for converting PDB Chemical Component Dictionary (CCD) files to RDKit molecule objects.

This project is a simplified implementation inspired by pdbeccdutils, focusing solely on CCD to RDKit conversion with 3D conformer support.

Features

  • Fast CIF parsing using gemmi
  • Conversion to RDKit molecule objects
  • Support for both Ideal and Model 3D conformers
  • Automatic metal bond to dative bond conversion
  • Stereochemistry assignment from 3D coordinates
  • CLI tool with rich output

Installation

uv add ccd2rdmol

Or for development:

git clone https://github.com/N283T/ccd2rdmol.git
cd ccd2rdmol
uv sync

Usage

As a Library

from ccd2rdmol import read_ccd_file, read_ccd_block
import gemmi

# Read from file
result = read_ccd_file("ATP.cif")
mol = result.mol

print(f"Atoms: {mol.GetNumAtoms()}")
print(f"Bonds: {mol.GetNumBonds()}")
print(f"Conformers: {mol.GetNumConformers()}")  # 2 (IDEAL + MODEL)
print(f"Sanitized: {result.sanitized}")

# With options
result = read_ccd_file(
    "ATP.cif",
    sanitize_mol=True,      # Sanitize molecule (default: True)
    add_conformers=True,    # Add 3D conformers (default: True)
    remove_hydrogens=True,  # Remove hydrogens (default: True)
)

# From gemmi CIF block
doc = gemmi.cif.read("components.cif")
for block in doc:
    result = read_ccd_block(block)
    print(f"{block.name}: {result.mol.GetNumAtoms()} atoms")

As a CLI

# Output SMILES to stdout
ccd2rdmol convert ATP.cif

# Write to MOL file
ccd2rdmol convert ATP.cif -o ATP.mol

# Write to SDF format
ccd2rdmol convert ATP.cif -o ATP.sdf

# Keep hydrogen atoms
ccd2rdmol convert ATP.cif --keep-hydrogens

# Show verbose information
ccd2rdmol convert ATP.cif -v

# Show molecule information only
ccd2rdmol info ATP.cif

CLI Options

ccd2rdmol convert [OPTIONS] INPUT_FILE

Arguments:
  INPUT_FILE  Input CCD CIF file path [required]

Options:
  -o, --output PATH       Output file path (.mol, .sdf)
  -f, --format TEXT       Output format (mol, sdf, smiles, inchi)
  --no-sanitize           Skip sanitization step
  --no-conformers         Skip adding 3D conformers
  -H, --keep-hydrogens    Keep hydrogen atoms
  -v, --verbose           Show detailed information
  --help                  Show help message

Development

This project uses poethepoet as a task runner.

# Install dev dependencies
uv sync

# Format code (ruff format)
uv run poe format

# Lint (ruff check)
uv run poe lint

# Lint and auto-fix
uv run poe fix

# Type check (ty)
uv run poe check

# Run tests
uv run poe test

# Multi-version testing with nox (3.10, 3.11, 3.12, 3.13, 3.14)
uv run poe nox

# Run all checks (format, lint, check, test)
uv run poe all

# Clean cache files
uv run poe clean

Acknowledgments

This project is inspired by and built upon concepts from pdbeccdutils by PDBe (Protein Data Bank in Europe). Test data files are derived from the pdbeccdutils test suite.

We thank the PDBe team for their excellent work on chemical component processing tools.

License

MIT License

Test data files in tests/data/ are from pdbeccdutils (Apache-2.0 License).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccd2rdmol-0.2.0.tar.gz (135.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ccd2rdmol-0.2.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file ccd2rdmol-0.2.0.tar.gz.

File metadata

  • Download URL: ccd2rdmol-0.2.0.tar.gz
  • Upload date:
  • Size: 135.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ccd2rdmol-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cecb209595e8784e189939c1e48dca62eeae28cb2c7a25ded47a38e0950b9e59
MD5 93d42a8f42082374ed8ec7bed72429c4
BLAKE2b-256 73cbc924e80221a1d5e12d48aef756b1232dfadd8ab122c385bcc4c643d45fac

See more details on using hashes here.

Provenance

The following attestation bundles were made for ccd2rdmol-0.2.0.tar.gz:

Publisher: release.yml on N283T/ccd2rdmol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ccd2rdmol-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ccd2rdmol-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ccd2rdmol-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7098b9ea9b6579d70e5118313e5c71ce0592a4d5d082d4ad714624cd648baa44
MD5 8302aff11aa66836051f298040679a51
BLAKE2b-256 8cc1e995077429375bf8daa75904e69aa9e3c3d9e302eb45ba7ff83a1725ec24

See more details on using hashes here.

Provenance

The following attestation bundles were made for ccd2rdmol-0.2.0-py3-none-any.whl:

Publisher: release.yml on N283T/ccd2rdmol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page