Python bindings for COSMolKit
Project description
COSMolKit
COSMolKit is a Rust-backed Python package for cheminformatics and molecular graph workflows. The project is currently in early development: the Python package exposes a small, strict subset of the Rust core while the chemistry implementation is expanded against RDKit parity tests.
The current package is useful for basic SMILES parsing, molecule graph inspection, hydrogen expansion, Kekulization, and tetrahedral stereo experiments. Higher-level workflows such as SDF IO, conformer generation, substructure search, fingerprints, editing, and alignment are part of the planned public API but are not implemented yet.
Installation
pip install cosmolkit
Quick Start
from cosmolkit import Molecule
mol = Molecule.from_smiles("CCO")
print("atoms")
for atom in mol.atoms():
print(atom.idx(), atom.atomic_num())
print("bonds")
for bond in mol.bonds():
print(bond.begin_atom_idx(), bond.end_atom_idx(), bond.bond_type())
Add Explicit Hydrogens
from cosmolkit import Molecule
mol = Molecule.from_smiles("CCO")
mol_h = mol.add_hydrogens()
print(len(mol.atoms()))
print(len(mol_h.atoms()))
# Not implemented yet
mol_no_h = mol_h.remove_hydrogens()
Kekulization
Molecule.kekulize() is implemented for the currently supported aromatic
systems.
from cosmolkit import Molecule
mol = Molecule.from_smiles("c1ccccc1")
kekulized = mol.kekulize()
for bond in kekulized.bonds():
print(bond.begin_atom_idx(), bond.end_atom_idx(), bond.bond_type())
Tetrahedral Stereo
COSMolKit exposes an ordered tetrahedral stereo representation derived from the
internal chiral tags. Each record is (center_atom_index, ordered_ligands).
Implicit hydrogen is represented as None.
from cosmolkit import Molecule
mol = Molecule.from_smiles("[13CH3:7][C@H](F)Cl")
for center, ligands in mol.tetrahedral_stereo():
print("center:", center)
print("ordered ligands:", ligands)
You can also inspect RDKit-like chiral tags:
from cosmolkit import Molecule
mol = Molecule.from_smiles("F[C@H](Cl)Br")
print(mol.find_chiral_centers(include_unassigned=False))
Graph Data for ML Workflows
The current low-level API can be used to build simple graph tensors in user code. This example intentionally ignores 3D coordinates because COSMolKit does not implement conformer generation yet.
import torch
from cosmolkit import Molecule
def graph_from_smiles(smiles: str):
mol = Molecule.from_smiles(smiles)
x = torch.tensor([[atom.atomic_num()] for atom in mol.atoms()], dtype=torch.long)
edges = []
edge_attr = []
for bond in mol.bonds():
i = bond.begin_atom_idx()
j = bond.end_atom_idx()
kind = bond.bond_type()
edges.extend([(i, j), (j, i)])
edge_attr.extend([kind, kind])
edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
return x, edge_index, edge_attr
x, edge_index, edge_attr = graph_from_smiles("CCO")
print(x)
print(edge_index)
print(edge_attr)
Planned API Examples
The following examples show the intended direction of the public API. They are
included to document the target design, but they are not implemented yet and
will raise NotImplementedError in current releases.
SDF IO
from cosmolkit import Molecule
# Not implemented yet
mol = Molecule.read_sdf("ligand.sdf", sanitize=True)
mol.write_sdf("out.sdf")
Sanitization and Valence Checks
from cosmolkit import Molecule
mol = Molecule.from_smiles("c1ccccc1", sanitize=False)
# Not implemented yet
report = mol.check_valence()
mol = mol.sanitize(strict=True)
mol = mol.perceive_rings().perceive_aromaticity()
Substructure Search and Fingerprints
from cosmolkit import Molecule
mol = Molecule.from_smiles("c1ccccc1CCO")
# Not implemented yet
query = Molecule.query_from_smarts("c1ccccc1")
matches = mol.substructure_find(query)
# Not implemented yet
fp1 = mol.fingerprint_morgan(radius=2, n_bits=2048)
fp2 = Molecule.from_smiles("CCN").fingerprint_morgan(radius=2, n_bits=2048)
print(fp1.tanimoto(fp2))
3D Coordinates and Alignment
from cosmolkit import Alignment, Molecule
mol = Molecule.from_smiles("CCO")
# Not implemented yet
mol3d = mol.add_hydrogens().embed_3d(seed=42, num_conformers=20)
# Not implemented yet
result = Alignment.find_most_similar_fragment(
reference=mol3d,
candidates=[mol3d],
mutate_reference=False,
mutate_candidates=False,
)
Explicit Editing
from cosmolkit import Molecule
mol = Molecule.from_smiles("CCO")
# Not implemented yet
editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit(sanitize=True)
Development Status
COSMolKit is not a full RDKit replacement today. The implementation is being built through strict parity testing against RDKit behavior where compatibility is the goal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cosmolkit-0.0.7.tar.gz.
File metadata
- Download URL: cosmolkit-0.0.7.tar.gz
- Upload date:
- Size: 114.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c7e214f859104cecf4c75dfce992b078b95940ce4746a512bde27ad8bf45eac
|
|
| MD5 |
986d860407648dbb502d40ca0e10f73e
|
|
| BLAKE2b-256 |
13cc3a81bd708c682c4864690c9691c8285e123476d77f3df3372dec90427628
|
File details
Details for the file cosmolkit-0.0.7-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: cosmolkit-0.0.7-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 490.9 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91bef30c8a37416b7a55a181eb6f3367a85368c4d24d2d11c5685f8e80a78038
|
|
| MD5 |
99230f8f4e0f47676428ade6e9ad0c29
|
|
| BLAKE2b-256 |
04ca2ced44d75f94f412bf1640aadec8e9743291f16a259874daeb03d95e5158
|
File details
Details for the file cosmolkit-0.0.7-cp39-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cosmolkit-0.0.7-cp39-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 660.3 kB
- Tags: CPython 3.9+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52a47865cc446bc1b780aa46c68d781714a69defd5d3880e9a23ad74ea6092dd
|
|
| MD5 |
256090145eef4a9ffa455dd2752c1c33
|
|
| BLAKE2b-256 |
9cc3b97be1dfc7c67e695c0afa3811160bc0310d0b879f0c4b01e1a993905b8b
|
File details
Details for the file cosmolkit-0.0.7-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: cosmolkit-0.0.7-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 576.1 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13fe7885c94f884e12d69bc6f5dfe3cc58d4a7391800f885716fda83bc348306
|
|
| MD5 |
f604ab3e15a37a7040f7afaf5870ca52
|
|
| BLAKE2b-256 |
3e99bbc2a6a199b591649f0fc4513ad098a410e220454e221b104deb41e81a9b
|