Skip to main content

Python bindings for COSMolKit

Project description

COSMolKit

COSMolKit is a Rust-backed Python package for cheminformatics and molecular graph workflows. The project is currently in early development: the Python package exposes a small, strict subset of the Rust core while the chemistry implementation is expanded against RDKit parity tests.

The current package is useful for basic SMILES parsing, molecule graph inspection, hydrogen expansion, Kekulization, and tetrahedral stereo experiments. Higher-level workflows such as SDF IO, conformer generation, substructure search, fingerprints, editing, and alignment are part of the planned public API but are not implemented yet.

Installation

pip install cosmolkit

Quick Start

from cosmolkit import Molecule

mol = Molecule.from_smiles("CCO")

print("atoms")
for atom in mol.atoms():
    print(atom.idx(), atom.atomic_num())

print("bonds")
for bond in mol.bonds():
    print(bond.begin_atom_idx(), bond.end_atom_idx(), bond.bond_type())

Add Explicit Hydrogens

from cosmolkit import Molecule

mol = Molecule.from_smiles("CCO")
mol_h = mol.add_hydrogens()

print(len(mol.atoms()))
print(len(mol_h.atoms()))
# Not implemented yet
mol_no_h = mol_h.remove_hydrogens()

Kekulization

Molecule.kekulize() is implemented for the currently supported aromatic systems.

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1")
kekulized = mol.kekulize()

for bond in kekulized.bonds():
    print(bond.begin_atom_idx(), bond.end_atom_idx(), bond.bond_type())

Tetrahedral Stereo

COSMolKit exposes an ordered tetrahedral stereo representation derived from the internal chiral tags. Each record is (center_atom_index, ordered_ligands). Implicit hydrogen is represented as None.

from cosmolkit import Molecule

mol = Molecule.from_smiles("[13CH3:7][C@H](F)Cl")

for center, ligands in mol.tetrahedral_stereo():
    print("center:", center)
    print("ordered ligands:", ligands)

You can also inspect RDKit-like chiral tags:

from cosmolkit import Molecule

mol = Molecule.from_smiles("F[C@H](Cl)Br")
print(mol.find_chiral_centers(include_unassigned=False))

Graph Data for ML Workflows

The current low-level API can be used to build simple graph tensors in user code. This example intentionally ignores 3D coordinates because COSMolKit does not implement conformer generation yet.

import torch
from cosmolkit import Molecule


def graph_from_smiles(smiles: str):
    mol = Molecule.from_smiles(smiles)

    x = torch.tensor([[atom.atomic_num()] for atom in mol.atoms()], dtype=torch.long)

    edges = []
    edge_attr = []
    for bond in mol.bonds():
        i = bond.begin_atom_idx()
        j = bond.end_atom_idx()
        kind = bond.bond_type()
        edges.extend([(i, j), (j, i)])
        edge_attr.extend([kind, kind])

    edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
    return x, edge_index, edge_attr


x, edge_index, edge_attr = graph_from_smiles("CCO")
print(x)
print(edge_index)
print(edge_attr)

Planned API Examples

The following examples show the intended direction of the public API. They are included to document the target design, but they are not implemented yet and will raise NotImplementedError in current releases.

SDF IO

from cosmolkit import Molecule

# Not implemented yet
mol = Molecule.read_sdf("ligand.sdf", sanitize=True)
mol.write_sdf("out.sdf")

Sanitization and Valence Checks

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1", sanitize=False)

# Not implemented yet
report = mol.check_valence()
mol = mol.sanitize(strict=True)
mol = mol.perceive_rings().perceive_aromaticity()

Substructure Search and Fingerprints

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1CCO")

# Not implemented yet
query = Molecule.query_from_smarts("c1ccccc1")
matches = mol.substructure_find(query)

# Not implemented yet
fp1 = mol.fingerprint_morgan(radius=2, n_bits=2048)
fp2 = Molecule.from_smiles("CCN").fingerprint_morgan(radius=2, n_bits=2048)
print(fp1.tanimoto(fp2))

3D Coordinates and Alignment

from cosmolkit import Alignment, Molecule

mol = Molecule.from_smiles("CCO")

# Not implemented yet
mol3d = mol.add_hydrogens().embed_3d(seed=42, num_conformers=20)

# Not implemented yet
result = Alignment.find_most_similar_fragment(
    reference=mol3d,
    candidates=[mol3d],
    mutate_reference=False,
    mutate_candidates=False,
)

Explicit Editing

from cosmolkit import Molecule

mol = Molecule.from_smiles("CCO")

# Not implemented yet
editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit(sanitize=True)

Development Status

COSMolKit is not a full RDKit replacement today. The implementation is being built through strict parity testing against RDKit behavior where compatibility is the goal.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosmolkit-0.0.7.tar.gz (114.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cosmolkit-0.0.7-cp39-abi3-win_amd64.whl (490.9 kB view details)

Uploaded CPython 3.9+Windows x86-64

cosmolkit-0.0.7-cp39-abi3-manylinux_2_34_x86_64.whl (660.3 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

cosmolkit-0.0.7-cp39-abi3-macosx_11_0_arm64.whl (576.1 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file cosmolkit-0.0.7.tar.gz.

File metadata

  • Download URL: cosmolkit-0.0.7.tar.gz
  • Upload date:
  • Size: 114.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.0.7.tar.gz
Algorithm Hash digest
SHA256 1c7e214f859104cecf4c75dfce992b078b95940ce4746a512bde27ad8bf45eac
MD5 986d860407648dbb502d40ca0e10f73e
BLAKE2b-256 13cc3a81bd708c682c4864690c9691c8285e123476d77f3df3372dec90427628

See more details on using hashes here.

File details

Details for the file cosmolkit-0.0.7-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: cosmolkit-0.0.7-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 490.9 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.0.7-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 91bef30c8a37416b7a55a181eb6f3367a85368c4d24d2d11c5685f8e80a78038
MD5 99230f8f4e0f47676428ade6e9ad0c29
BLAKE2b-256 04ca2ced44d75f94f412bf1640aadec8e9743291f16a259874daeb03d95e5158

See more details on using hashes here.

File details

Details for the file cosmolkit-0.0.7-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.0.7-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 52a47865cc446bc1b780aa46c68d781714a69defd5d3880e9a23ad74ea6092dd
MD5 256090145eef4a9ffa455dd2752c1c33
BLAKE2b-256 9cc3b97be1dfc7c67e695c0afa3811160bc0310d0b879f0c4b01e1a993905b8b

See more details on using hashes here.

File details

Details for the file cosmolkit-0.0.7-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.0.7-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 13fe7885c94f884e12d69bc6f5dfe3cc58d4a7391800f885716fda83bc348306
MD5 f604ab3e15a37a7040f7afaf5870ca52
BLAKE2b-256 3e99bbc2a6a199b591649f0fc4513ad098a410e220454e221b104deb41e81a9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page