Skip to main content

Python bindings for COSMolKit

Project description

COSMolKit

COSMolKit is a Rust-backed Python package for cheminformatics and molecular graph workflows. The project is currently in early development: the Python package exposes a small, strict subset of the Rust core while the chemistry implementation is expanded against RDKit parity tests.

The current package is useful for basic SMILES parsing, molecule graph inspection, hydrogen expansion, Kekulization, and tetrahedral stereo experiments. Higher-level workflows such as SDF IO, conformer generation, substructure search, fingerprints, editing, and alignment are part of the planned public API but are not implemented yet.

Installation

pip install cosmolkit

Quick Start

from cosmolkit import Molecule

mol = Molecule.from_smiles("CCO")

print("atoms")
for atom in mol.atoms():
    print(atom.idx(), atom.atomic_num())

print("bonds")
for bond in mol.bonds():
    print(bond.begin_atom_idx(), bond.end_atom_idx(), bond.bond_type())

Add Explicit Hydrogens

from cosmolkit import Molecule

mol = Molecule.from_smiles("CCO")
mol_h = mol.add_hydrogens()

print(len(mol.atoms()))
print(len(mol_h.atoms()))
# Not implemented yet
mol_no_h = mol_h.remove_hydrogens()

Kekulization

Molecule.kekulize() is implemented for the currently supported aromatic systems.

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1")
kekulized = mol.kekulize()

for bond in kekulized.bonds():
    print(bond.begin_atom_idx(), bond.end_atom_idx(), bond.bond_type())

Tetrahedral Stereo

COSMolKit exposes an ordered tetrahedral stereo representation derived from the internal chiral tags. Each record is (center_atom_index, ordered_ligands). Implicit hydrogen is represented as None.

from cosmolkit import Molecule

mol = Molecule.from_smiles("[13CH3:7][C@H](F)Cl")

for center, ligands in mol.tetrahedral_stereo():
    print("center:", center)
    print("ordered ligands:", ligands)

You can also inspect RDKit-like chiral tags:

from cosmolkit import Molecule

mol = Molecule.from_smiles("F[C@H](Cl)Br")
print(mol.find_chiral_centers(include_unassigned=False))

Graph Data for ML Workflows

The current low-level API can be used to build simple graph tensors in user code. This example intentionally ignores 3D coordinates because COSMolKit does not implement conformer generation yet.

import torch
from cosmolkit import Molecule


def graph_from_smiles(smiles: str):
    mol = Molecule.from_smiles(smiles)

    x = torch.tensor([[atom.atomic_num()] for atom in mol.atoms()], dtype=torch.long)

    edges = []
    edge_attr = []
    for bond in mol.bonds():
        i = bond.begin_atom_idx()
        j = bond.end_atom_idx()
        kind = bond.bond_type()
        edges.extend([(i, j), (j, i)])
        edge_attr.extend([kind, kind])

    edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
    return x, edge_index, edge_attr


x, edge_index, edge_attr = graph_from_smiles("CCO")
print(x)
print(edge_index)
print(edge_attr)

Planned API Examples

The following examples show the intended direction of the public API. They are included to document the target design, but they are not implemented yet and will raise NotImplementedError in current releases.

SDF IO

from cosmolkit import Molecule

# Not implemented yet
mol = Molecule.read_sdf("ligand.sdf", sanitize=True)
mol.write_sdf("out.sdf")

Sanitization and Valence Checks

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1", sanitize=False)

# Not implemented yet
report = mol.check_valence()
mol = mol.sanitize(strict=True)
mol = mol.perceive_rings().perceive_aromaticity()

Substructure Search and Fingerprints

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1CCO")

# Not implemented yet
query = Molecule.query_from_smarts("c1ccccc1")
matches = mol.substructure_find(query)

# Not implemented yet
fp1 = mol.fingerprint_morgan(radius=2, n_bits=2048)
fp2 = Molecule.from_smiles("CCN").fingerprint_morgan(radius=2, n_bits=2048)
print(fp1.tanimoto(fp2))

3D Coordinates and Alignment

from cosmolkit import Alignment, Molecule

mol = Molecule.from_smiles("CCO")

# Not implemented yet
mol3d = mol.add_hydrogens().embed_3d(seed=42, num_conformers=20)

# Not implemented yet
result = Alignment.find_most_similar_fragment(
    reference=mol3d,
    candidates=[mol3d],
    mutate_reference=False,
    mutate_candidates=False,
)

Explicit Editing

from cosmolkit import Molecule

mol = Molecule.from_smiles("CCO")

# Not implemented yet
editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit(sanitize=True)

Development Status

COSMolKit is not a full RDKit replacement today. The implementation is being built through strict parity testing against RDKit behavior where compatibility is the goal.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosmolkit-0.0.6.tar.gz (91.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cosmolkit-0.0.6-cp39-abi3-win_amd64.whl (279.1 kB view details)

Uploaded CPython 3.9+Windows x86-64

cosmolkit-0.0.6-cp39-abi3-manylinux_2_34_x86_64.whl (446.4 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

cosmolkit-0.0.6-cp39-abi3-macosx_11_0_arm64.whl (388.9 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file cosmolkit-0.0.6.tar.gz.

File metadata

  • Download URL: cosmolkit-0.0.6.tar.gz
  • Upload date:
  • Size: 91.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.0.6.tar.gz
Algorithm Hash digest
SHA256 3bea0a05388599134efea2dd54d6ea73849176e02da6404af2125a792dd17a2e
MD5 327356293bb66fde8ea3d3fe91bea692
BLAKE2b-256 205ddfd39c070b7941d146537b5e318ebcb75f4dad6fb79b17b0e50e815575ac

See more details on using hashes here.

File details

Details for the file cosmolkit-0.0.6-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: cosmolkit-0.0.6-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 279.1 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.0.6-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f78b8ecf831869ac64d3a8ce2c21a4b448c52262245a84cb1ba2ff6eb9569ff9
MD5 684033cd0861d3e1d745a1613193c46d
BLAKE2b-256 bc80918a9555b7451f3101ad06360f9857492504d215a50d50a9c7db2e645716

See more details on using hashes here.

File details

Details for the file cosmolkit-0.0.6-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.0.6-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e7a242ed0f878f2096e37e30befa6aeb7974225194bcaab965aa820cefec9b6d
MD5 1a415527df1b4312df47fe41ab259d02
BLAKE2b-256 56842b578a2b0a596f16333e126cfe740ce5c9d7b18e3b3660e1fab5d6a4b76d

See more details on using hashes here.

File details

Details for the file cosmolkit-0.0.6-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.0.6-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f0ef5e072fd3af5517f590fedb7d86a05abb8218724a00c45b506e8f7cbc7958
MD5 d04a02e984b16444986703264e8e0184
BLAKE2b-256 fe977267f47dd5a0111131405a9f61850efd2817235cc30212270642f727bc3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page