Skip to main content

Convert PDB Chemical Component Dictionary (CCD) files to RDKit molecules

Project description

ccd2rdmol

CI PyPI version License: MIT Python 3.12+

A lightweight Python library and CLI tool for converting PDB Chemical Component Dictionary (CCD) files to RDKit molecule objects.

This project is a simplified implementation inspired by pdbeccdutils, focusing solely on CCD to RDKit conversion with 3D conformer support.

Features

  • Fast CIF parsing using gemmi
  • Conversion to RDKit molecule objects
  • Support for both Ideal and Model 3D conformers
  • Automatic metal bond to dative bond conversion
  • Stereochemistry assignment from 3D coordinates
  • CLI tool with rich output

Installation

git clone https://github.com/N283T/ccd2rdmol.git
cd ccd2rdmol
uv sync

Usage

As a Library

from ccd2rdmol import read_ccd_file, read_ccd_block
import gemmi

# Read from file
result = read_ccd_file("ATP.cif")
mol = result.mol

print(f"Atoms: {mol.GetNumAtoms()}")
print(f"Bonds: {mol.GetNumBonds()}")
print(f"Conformers: {mol.GetNumConformers()}")  # 2 (IDEAL + MODEL)
print(f"Sanitized: {result.sanitized}")

# With options
result = read_ccd_file(
    "ATP.cif",
    sanitize_mol=True,      # Sanitize molecule (default: True)
    add_conformers=True,    # Add 3D conformers (default: True)
    remove_hydrogens=True,  # Remove hydrogens (default: True)
)

# From gemmi CIF block
doc = gemmi.cif.read("components.cif")
for block in doc:
    result = read_ccd_block(block)
    print(f"{block.name}: {result.mol.GetNumAtoms()} atoms")

As a CLI

# Output SMILES to stdout
ccd2rdmol convert ATP.cif

# Write to MOL file
ccd2rdmol convert ATP.cif -o ATP.mol

# Write to SDF format
ccd2rdmol convert ATP.cif -o ATP.sdf

# Keep hydrogen atoms
ccd2rdmol convert ATP.cif --keep-hydrogens

# Show verbose information
ccd2rdmol convert ATP.cif -v

# Show molecule information only
ccd2rdmol info ATP.cif

CLI Options

ccd2rdmol convert [OPTIONS] INPUT_FILE

Arguments:
  INPUT_FILE  Input CCD CIF file path [required]

Options:
  -o, --output PATH       Output file path (.mol, .sdf)
  -f, --format TEXT       Output format (mol, sdf, smiles, inchi)
  --no-sanitize           Skip sanitization step
  --no-conformers         Skip adding 3D conformers
  -H, --keep-hydrogens    Keep hydrogen atoms
  -v, --verbose           Show detailed information
  --help                  Show help message

Development

This project uses poethepoet as a task runner.

# Install dev dependencies
uv sync

# Format code (ruff format)
uv run poe format

# Lint (ruff check)
uv run poe lint

# Lint and auto-fix
uv run poe fix

# Type check (ty)
uv run poe check

# Run tests
uv run poe test

# Run all checks (format, lint, check, test)
uv run poe all

# Clean cache files
uv run poe clean

Acknowledgments

This project is inspired by and built upon concepts from pdbeccdutils by PDBe (Protein Data Bank in Europe). Test data files are derived from the pdbeccdutils test suite.

We thank the PDBe team for their excellent work on chemical component processing tools.

License

MIT License

Test data files in tests/data/ are from pdbeccdutils (Apache-2.0 License).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccd2rdmol-0.1.0.tar.gz (113.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ccd2rdmol-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file ccd2rdmol-0.1.0.tar.gz.

File metadata

  • Download URL: ccd2rdmol-0.1.0.tar.gz
  • Upload date:
  • Size: 113.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ccd2rdmol-0.1.0.tar.gz
Algorithm Hash digest
SHA256 04ae874ea4d18acab5abb316bf5496f1320fad1a2c31a7b4cc418848fd1cc47d
MD5 41f212e3dfa3ef45cd12f34e17081f28
BLAKE2b-256 c52a176c301f26b4a4df7829461f3e4cfee87a97a413b8a09d13108334570f45

See more details on using hashes here.

Provenance

The following attestation bundles were made for ccd2rdmol-0.1.0.tar.gz:

Publisher: release.yml on N283T/ccd2rdmol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ccd2rdmol-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ccd2rdmol-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ccd2rdmol-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2b7cc51fe506ae718f4bf6c5e0ebaef6bd37493435abea0d4a813427cde1888a
MD5 d4e98ac111733e2460864095c2218095
BLAKE2b-256 28714a2774033f93c9671dc1b2650c5a36f127b9161e9ad3f8cdbef612e8df5c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ccd2rdmol-0.1.0-py3-none-any.whl:

Publisher: release.yml on N283T/ccd2rdmol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page